====== Model Pruning ====== Removing parameters or structures from a trained network to reduce size or compute. Includes unstructured pruning (individual weights) and structured pruning (entire heads, layers, or experts). [[concepts:layer_pruning|Layer pruning]] is a form of structured pruning that is surprisingly benign under [[concepts:prenorm|PreNorm]] [[concepts:residual_connections|residual connections]]. See also: [[concepts:layer_pruning]], [[concepts:residual_connections]], [[concepts:prenorm]], [[papers:attention_residuals]]