Results for "training loss"
Converting audio speech into text, often using encoder-decoder or transducer architectures.
A measure of a model class’s expressive capacity based on its ability to shatter datasets.
A theoretical framework analyzing what classes of functions can be learned, how efficiently, and with what guarantees.
Reduction in uncertainty achieved by observing a variable; used in decision trees and active learning.
Variability introduced by minibatch sampling during SGD.
Early architecture using learned gates for skip connections.
Built-in assumptions guiding learning efficiency and generalization.
Encodes token position explicitly, often via sinusoids.
Models trained to decide when to call tools.
Embedding signals to prove model ownership.
Probabilistic energy-based neural network with hidden variables.
Simplified Boltzmann Machine with bipartite structure.
Probabilistic graphical model for structured prediction.
Diffusion model trained to remove noise step by step.
Generative model that learns to reverse a gradual noise process.
Diffusion performed in latent space for efficiency.
Shift in feature distribution over time.
Increasing model capacity via compute.
Competitive advantage from proprietary models/data.
Declining differentiation among models.
Vectors with zero inner product; implies independence.
Matrix of first-order derivatives for vector-valued functions.
Model exploits poorly specified objectives.
Maximizing reward without fulfilling real goal.
Using limited human feedback to guide large models.
Task instruction without examples.
Assigning a role or identity to the model.
Enables external computation or lookup.
Required descriptions of model behavior and limits.
Requirement to inform users about AI use.