PostNorm

Applying layer normalization after the sublayer and residual addition. Used in the original Transformer but largely replaced by PreNorm in modern LLMs due to training instability at depth. PostNorm does not exhibit hidden-state growth but is harder to train.

See also: prenorm, layer_normalization, hidden_state_growth, attention_residuals