Scaling Laws

Empirical relationships describing how model performance (typically loss) scales with model size, data size, and compute. The paper Attention Residuals validates that its improvements hold consistently across model sizes via scaling law experiments.

See also: chinchilla_scaling, neural_scaling, attention_residuals