The mechanism in Mixture-of-Experts architectures that selects which experts process a given input. Typically implemented as a learned gating network producing a sparse distribution over experts. The quality of routing directly affects MoE efficiency and performance.
See also: moe, kimi_linear, attention_residuals