User Tools

Site Tools


concepts:hidden_state_growth

Hidden-State Growth

Under PreNorm residual connections, hidden-state magnitudes grow as O(L) with depth because each layer adds a roughly unit-magnitude output to the running sum. This progressively dilutes each layer's relative contribution and buries early-layer information.

See also: prenorm, residual_connections, attention_residuals

concepts/hidden_state_growth.txt · Last modified: by aethersync

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki