====== Vanishing Gradients ====== The problem where gradients shrink exponentially as they propagate through many layers during backpropagation, making deep networks difficult or impossible to train. [[concepts:residual_connections|Residual connections]] mitigate this by providing a direct gradient path — the [[concepts:gradient_highway|gradient highway]] — that preserves signal across arbitrary depth. See also: [[concepts:gradient_highway]], [[concepts:residual_connections]], [[papers:attention_residuals]]