User Tools

Site Tools


concepts:moe

Mixture of Experts (MoE)

An architecture where only a subset of parameters (experts) are activated per input, determined by a routing function. Enables scaling total parameters while keeping compute per token low. Used in Kimi Linear and many modern LLMs.

See also: expert_routing, kimi_linear, attention_residuals

concepts/moe.txt · Last modified: by aethersync

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki