We use cookies

We use cookies to ensure you get the best experience on our website. For more information on how we use cookies, please see our cookie policy.

Back to home

Kimi Linear: An Expressive, Efficient Attention Architecture

Source

Hacker News

Published

TL;DR

AI Generated

Kimi Linear is a hybrid linear attention architecture that surpasses traditional full attention methods in various contexts, offering faster processing speeds and superior performance. The core of Kimi Linear is Kimi Delta Attention (KDA), which optimizes memory usage through a refined gating mechanism. It reduces the need for large KV caches by up to 75% and boosts decoding throughput by up to 6 times for long-context tasks. The model has been open-sourced with two versions available for download, showcasing its efficiency in handling tasks with long sequences.

Kimi Linear: An Expressive, Efficient Attention Architecture - Tech News Aggregator