User Tools

Site Tools


papers:attention_residuals

Attention Residuals

A technique by Kimi Team that replaces fixed unit-weight residual accumulation with learned softmax attention over preceding layer outputs.

See also: residual_connections, prenorm, block_attnres, kimi_linear, moe, softmax_attention, scaling_laws, layer_pruning, gradient_highway, hidden_state_growth, rnn

papers/attention_residuals.txt · Last modified: by aethersync

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki