Vanishing Gradient
IntermediateGradients shrink through layers, slowing learning in early layers; mitigated by ReLU, residuals, normalization.
AdvertisementAd space — term-top
Definition
Full Definition
Gradients shrink through layers, slowing learning in early layers; mitigated by ReLU, residuals, normalization.