Results for "training loss"
Optimization using curvature information; often expensive at scale.
Joint vision-language model aligning images and text.
Generates audio waveforms from spectrograms.
Direction of steepest ascent of a function.
Using production outcomes to improve models.
Choosing step size along gradient direction.
Asking model to review and improve output.
Learning by minimizing prediction error.
Systems where failure causes physical harm.
AI giving legal advice without authorization.
Agents have opposing objectives.
A mismatch between training and deployment data distributions that can degrade model performance.
When a model fits noise/idiosyncrasies of training data and performs poorly on unseen data.
A robust evaluation technique that trains/evaluates across multiple splits to estimate performance variability.
A gradient method using random minibatches for efficient training on large datasets.
Techniques that stabilize and speed training by normalizing activations; LayerNorm is common in Transformers.
A high-capacity language model trained on massive corpora, exhibiting broad generalization and emergent behaviors.
Training across many devices/silos without centralizing raw data; aggregates updates, not data.
Ability to replicate results given same code/data; harder in distributed training and nondeterministic ops.
PEFT method injecting trainable low-rank matrices into layers, enabling efficient fine-tuning.
Tendency to trust automated suggestions even when incorrect; mitigated by UI design, training, and checks.
A point where gradient is zero but is neither a max nor min; common in deep nets.
Limiting gradient magnitude to prevent exploding gradients.
Allows gradients to bypass layers, enabling very deep networks.
Prevents attention to future tokens during training/inference.
Capabilities that appear only beyond certain model sizes.
Controls amount of noise added at each diffusion step.
Ensuring learned behavior matches intended objective.
Maintaining alignment under new conditions.
Model trained on its own outputs degrades quality.