concepts:layer_normalization
Layer Normalization
Normalizing activations across the feature dimension to stabilize training. Applied either before (PreNorm) or after (PostNorm) the sublayer. PreNorm dominates modern LLMs but causes hidden-state growth.
See also: prenorm, postnorm, hidden_state_growth, attention_residuals
concepts/layer_normalization.txt · Last modified: by aethersync
