====== Kimi Linear ======

A Mixture-of-Experts architecture by Kimi Team with 48B total / 3B activated parameters. Uses [[concepts:moe|MoE]] with linear attention. The paper [[papers:attention_residuals|Attention Residuals]] integrates AttnRes into this architecture, pre-training on 1.4T tokens.

See also: [[concepts:moe]], [[concepts:linear_attention]], [[papers:attention_residuals]]