User Tools

Site Tools


concepts:prenorm

PreNorm

Applying layer normalization before the sublayer transformation (attention or FFN). Dominant in modern LLMs for training stability, but its unweighted accumulation causes hidden-state magnitudes to grow as O(L) with depth, diluting each layer's contribution.

See also: residual_connections, layer_normalization, postnorm, hidden_state_growth, attention_residuals

concepts/prenorm.txt · Last modified: by aethersync

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki