Results for "training loss"
The shape of the loss function over parameter space.
Penalizes confident wrong predictions heavily; standard for classification and language modeling.
A function measuring prediction error (and sometimes calibration), guiding gradient-based optimization.
A scalar measure optimized during training, typically expected loss over data, sometimes with regularization terms.
Minimizing average loss on training data; can overfit when data is limited or biased.
Iterative method that updates parameters in the direction of negative gradient to minimize loss.
Halting training when validation performance stops improving to reduce overfitting.
Cost of model training.
The learned numeric values of a model adjusted during training to minimize a loss function.
One complete traversal of the training dataset during training.
Assigning labels per pixel (semantic) or per instance (instance segmentation) to map object boundaries.
Visualization of optimization landscape.
Lowest possible loss.
Maximum expected loss under normal conditions.
A wide basin often correlated with better generalization.
Minimum relative to nearby points.
Two-network setup where generator fools a discriminator.
Applying learned patterns incorrectly.
End-to-end process for model training.
A preference-based training method optimizing policies directly from pairwise comparisons without explicit RL loops.
Ordering training samples from easier to harder to improve convergence or generalization.
Combining simulation and real-world data.
Measures divergence between true and predicted probability distributions.
A narrow minimum often associated with poorer generalization.
When information from evaluation data improperly influences training, inflating reported performance.
Training objective where the model predicts the next token given previous tokens (causal modeling).
Empirical laws linking model size, data, compute to performance.
Fabrication of cases or statutes by LLMs.
Uses an exponential moving average of gradients to speed convergence and reduce oscillation.
Matrix of second derivatives describing local curvature of loss.