concepts:linear_attention
Linear Attention
Attention variants that replace the O(n²) softmax with a kernelized decomposition, reducing sequence-length cost to linear. Used in Kimi Linear to handle long contexts efficiently. Trades some expressiveness for computational savings.
See also: kimi_linear, softmax_attention, moe, attention_residuals
concepts/linear_attention.txt · Last modified: by aethersync
