concepts:residual_connections
Residual Connections
Skip connections that add a layer's input to its output: h_l = h_{l-1} + f(h_{l-1}). Enable gradient flow in deep networks but accumulate all prior outputs with fixed unit weights, causing dilution at depth.
See also: attention_residuals, prenorm, gradient_highway, hidden_state_growth, layer_pruning
concepts/residual_connections.txt · Last modified: by aethersync
