concepts:hidden_state_growth
Hidden-State Growth
Under PreNorm residual connections, hidden-state magnitudes grow as O(L) with depth because each layer adds a roughly unit-magnitude output to the running sum. This progressively dilutes each layer's relative contribution and buries early-layer information.
See also: prenorm, residual_connections, attention_residuals
concepts/hidden_state_growth.txt · Last modified: by aethersync
