The problem where gradients shrink exponentially as they propagate through many layers during backpropagation, making deep networks difficult or impossible to train. Residual connections mitigate this by providing a direct gradient path — the gradient highway — that preserves signal across arbitrary depth.
See also: gradient_highway, residual_connections, attention_residuals