concepts:layer_pruning
Layer Pruning
Removing entire layers from a trained network. Under standard residual connections with PreNorm, many layers can be pruned with minimal performance loss because their contributions are heavily diluted. This motivates Attention Residuals, which gives each layer learned, content-dependent influence.
See also: residual_connections, prenorm, model_pruning, attention_residuals
concepts/layer_pruning.txt · Last modified: by aethersync
