papers:attention_residuals

Attention Residuals

A technique by Kimi Team that replaces fixed unit-weight residual accumulation with learned softmax attention over preceding layer outputs.

See also: residual_connections, prenorm, block_attnres, kimi_linear, moe, softmax_attention, scaling_laws, layer_pruning, gradient_highway, hidden_state_growth, rnn

papers/attention_residuals.txt · Last modified: 2026/04/19 03:18 by aethersync