====== Residual Connections ====== Skip connections that add a layer's input to its output: h_l = h_{l-1} + f(h_{l-1}). Enable gradient flow in deep networks but accumulate all prior outputs with fixed unit weights, causing dilution at depth. See also: [[papers:attention_residuals]], [[concepts:prenorm]], [[concepts:gradient_highway]], [[concepts:hidden_state_growth]], [[concepts:layer_pruning]]