====== PreNorm ======

Applying layer normalization **before** the sublayer transformation (attention or FFN). Dominant in modern LLMs for training stability, but its unweighted accumulation causes hidden-state magnitudes to grow as O(L) with depth, diluting each layer's contribution.

See also: [[concepts:residual_connections]], [[concepts:layer_normalization]], [[concepts:postnorm]], [[concepts:hidden_state_growth]], [[papers:attention_residuals]]