concepts:vanishing_gradients
Vanishing Gradients
The problem where gradients shrink exponentially as they propagate through many layers during backpropagation, making deep networks difficult or impossible to train. Residual connections mitigate this by providing a direct gradient path — the gradient highway — that preserves signal across arbitrary depth.
See also: gradient_highway, residual_connections, attention_residuals
concepts/vanishing_gradients.txt · Last modified: by aethersync
