====== Scaling Laws ====== Empirical relationships describing how model performance (typically loss) scales with model size, data size, and compute. The paper [[papers:attention_residuals|Attention Residuals]] validates that its improvements hold consistently across model sizes via scaling law experiments. See also: [[concepts:chinchilla_scaling]], [[concepts:neural_scaling]], [[papers:attention_residuals]]