concepts:prenorm
PreNorm
Applying layer normalization before the sublayer transformation (attention or FFN). Dominant in modern LLMs for training stability, but its unweighted accumulation causes hidden-state magnitudes to grow as O(L) with depth, diluting each layer's contribution.
See also: residual_connections, layer_normalization, postnorm, hidden_state_growth, attention_residuals
concepts/prenorm.txt · Last modified: by aethersync
