User Tools

Site Tools


concepts:linear_attention

Linear Attention

Attention variants that replace the O(n²) softmax with a kernelized decomposition, reducing sequence-length cost to linear. Used in Kimi Linear to handle long contexts efficiently. Trades some expressiveness for computational savings.

See also: kimi_linear, softmax_attention, moe, attention_residuals

concepts/linear_attention.txt · Last modified: by aethersync

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki