Results for "direct preference optimization"
Fast approximation of costly simulations.
Configuration choices not learned directly (or not typically learned) that govern training or architecture.
Uses an exponential moving average of gradients to speed convergence and reduce oscillation.
Controls the size of parameter updates; too high diverges, too low trains slowly or gets stuck.
Variability introduced by minibatch sampling during SGD.
Limiting gradient magnitude to prevent exploding gradients.
Matrix of second derivatives describing local curvature of loss.
Optimizing policies directly via gradient ascent on expected reward.
Matrix of curvature information.
Measure of vector magnitude; used in regularization and optimization.
Optimization under uncertainty.
Lowest possible loss.
Model optimizes objectives misaligned with human values.
The field of building systems that perform tasks associated with human intelligence—perception, reasoning, language, planning, and decision-making—via algori...
A subfield of AI where models learn patterns from data to make predictions or decisions, improving with experience rather than explicit rule-coding.
A branch of ML using multi-layer neural networks to learn hierarchical representations, often excelling in vision, speech, and language.
Training with a small labeled dataset plus a larger unlabeled dataset, leveraging assumptions like smoothness/cluster structure.
Learning where data arrives sequentially and the model updates continuously, often under changing distributions.
Methods that learn training procedures or initializations so models can adapt quickly to new tasks with little data.
Automatically learning useful internal features (latent variables) that capture salient structure for downstream tasks.
The learned numeric values of a model adjusted during training to minimize a loss function.
A scalar measure optimized during training, typically expected loss over data, sometimes with regularization terms.
Average of squared residuals; common regression objective.
Iterative method that updates parameters in the direction of negative gradient to minimize loss.
Popular optimizer combining momentum and per-parameter adaptive step sizes via first/second moment estimates.
A gradient method using random minibatches for efficient training on large datasets.
One complete traversal of the training dataset during training.
A parameterized function composed of interconnected units organized in layers with nonlinear activations.
Gradients grow too large, causing divergence; mitigated by clipping, normalization, careful init.
Nonlinear functions enabling networks to approximate complex mappings; ReLU variants dominate modern DL.