User Tools

Site Tools


concepts:postnorm

PostNorm

Applying layer normalization after the sublayer and residual addition. Used in the original Transformer but largely replaced by PreNorm in modern LLMs due to training instability at depth. PostNorm does not exhibit hidden-state growth but is harder to train.

See also: prenorm, layer_normalization, hidden_state_growth, attention_residuals

concepts/postnorm.txt · Last modified: by aethersync

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki