Layer Normalization

Normalizing activations across the feature dimension to stabilize training. Applied either before (PreNorm) or after (PostNorm) the sublayer. PreNorm dominates modern LLMs but causes hidden-state growth.

See also: prenorm, postnorm, hidden_state_growth, attention_residuals