Results for "compute-data-performance"
Predicting case success probabilities.
A branch of ML using multi-layer neural networks to learn hierarchical representations, often excelling in vision, speech, and language.
Minimizing average loss on training data; can overfit when data is limited or biased.
Automatically learning useful internal features (latent variables) that capture salient structure for downstream tasks.
Forcing predictable formats for downstream systems; reduces parsing errors and supports validation/guardrails.
Learning from data generated by a different policy.
Models that learn to generate samples resembling training data.
Model that compresses input into latent space and reconstructs it.
Trend reversal when data is aggregated improperly.
Shift in model outputs.
Updated belief after observing data.
Probability of data given parameters.
Requirement to preserve relevant data.
Training one model on multiple tasks simultaneously to improve generalization through shared structure.
Configuration choices not learned directly (or not typically learned) that govern training or architecture.
A scalar measure optimized during training, typically expected loss over data, sometimes with regularization terms.
Randomly zeroing activations during training to reduce co-adaptation and overfitting.
Nonlinear functions enabling networks to approximate complex mappings; ReLU variants dominate modern DL.
Mechanism that computes context-aware mixtures of representations; scales well and captures long-range dependencies.
Achieving task performance by providing a small number of examples inside the prompt without weight updates.
Fine-tuning on (prompt, response) pairs to align a model with instruction-following behaviors.
Ordering training samples from easier to harder to improve convergence or generalization.
Time from request to response; critical for real-time inference and UX.
How many requests or tokens can be processed per unit time; affects scalability and cost.
Mechanisms for retaining context across turns/sessions: scratchpads, vector memories, structured stores.
AI subfield dealing with understanding and generating human language, including syntax, semantics, and pragmatics.
A model is PAC-learnable if it can, with high probability, learn an approximately correct hypothesis from finite samples.
Converting audio speech into text, often using encoder-decoder or transducer architectures.
A measure of a model class’s expressive capacity based on its ability to shatter datasets.
Measures a model’s ability to fit random noise; used to bound generalization error.