====== Hidden-State Growth ====== Under PreNorm residual connections, hidden-state magnitudes grow as O(L) with depth because each layer adds a roughly unit-magnitude output to the running sum. This progressively dilutes each layer's relative contribution and buries early-layer information. See also: [[concepts:prenorm]], [[concepts:residual_connections]], [[papers:attention_residuals]]