User Tools

Site Tools


concepts:kimi_linear

Kimi Linear

A Mixture-of-Experts architecture by Kimi Team with 48B total / 3B activated parameters. Uses MoE with linear attention. The paper Attention Residuals integrates AttnRes into this architecture, pre-training on 1.4T tokens.

See also: moe, linear_attention, attention_residuals

concepts/kimi_linear.txt · Last modified: by aethersync

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki