Model Pruning

Removing parameters or structures from a trained network to reduce size or compute. Includes unstructured pruning (individual weights) and structured pruning (entire heads, layers, or experts). Layer pruning is a form of structured pruning that is surprisingly benign under PreNorm residual connections.

See also: layer_pruning, residual_connections, prenorm, attention_residuals