Removing entire layers from a trained network. Under standard residual connections with PreNorm, many layers can be pruned with minimal performance loss because their contributions are heavily diluted. This motivates Attention Residuals, which gives each layer learned, content-dependent influence.
See also: residual_connections, prenorm, model_pruning, attention_residuals