PreNorm

Applying layer normalization before the sublayer transformation (attention or FFN). Dominant in modern LLMs for training stability, but its unweighted accumulation causes hidden-state magnitudes to grow as O(L) with depth, diluting each layer's contribution.

See also: residual_connections, layer_normalization, postnorm, hidden_state_growth, attention_residuals