concepts:kimi_linear
Kimi Linear
A Mixture-of-Experts architecture by Kimi Team with 48B total / 3B activated parameters. Uses MoE with linear attention. The paper Attention Residuals integrates AttnRes into this architecture, pre-training on 1.4T tokens.
See also: moe, linear_attention, attention_residuals
concepts/kimi_linear.txt · Last modified: by aethersync
