Attention Residuals

A technique by Kimi Team that replaces fixed unit-weight residual accumulation with learned softmax attention over preceding layer outputs.

See also: residual_connections, prenorm, block_attnres, kimi_linear, moe, softmax_attention, scaling_laws, layer_pruning, gradient_highway, hidden_state_growth, rnn