====== Layer Pruning ====== Removing entire layers from a trained network. Under standard [[concepts:residual_connections|residual connections]] with [[concepts:prenorm|PreNorm]], many layers can be pruned with minimal performance loss because their contributions are heavily diluted. This motivates [[papers:attention_residuals|Attention Residuals]], which gives each layer learned, content-dependent influence. See also: [[concepts:residual_connections]], [[concepts:prenorm]], [[concepts:model_pruning]], [[papers:attention_residuals]]