Results for "compute-data-performance"
Required descriptions of model behavior and limits.
Maximum system processing rate.
Dynamic resource allocation.
Unequal performance across demographic groups.
Learning structure from unlabeled data, such as discovering groups, compressing representations, or modeling data distributions.
Learning from data by constructing “pseudo-labels” (e.g., next-token prediction, masked modeling) without manual annotation.
Information that can identify an individual (directly or indirectly); requires careful handling and compliance.
Inferring sensitive features of training data.
A structured collection of examples used to train/evaluate models; quality, bias, and coverage often dominate outcomes.
A measurable property or attribute used as model input (raw or engineered), such as age, pixel intensity, or token ID.
When a model cannot capture underlying structure, performing poorly on both training and test data.
How well a model performs on new data drawn from the same (or similar) distribution as training.
Selecting the most informative samples to label (e.g., uncertainty sampling) to reduce labeling cost.
A narrow minimum often associated with poorer generalization.
Neural networks that operate on graph-structured data by propagating information along edges.
Centralized repository for curated features.
Competitive advantage from proprietary models/data.
Train/test environment mismatch.
Model trained on its own outputs degrades quality.
Startup latency for services.
Finding mathematical equations from data.
Fraction of correct predictions; can be misleading on imbalanced datasets.
Scalar summary of ROC; measures ranking ability, not calibration.
Plots true positive rate vs false positive rate across thresholds; summarizes separability.
Average of squared residuals; common regression objective.
Training a smaller “student” model to mimic a larger “teacher,” often improving efficiency while retaining performance.
A dataset + metric suite for comparing models; can be gamed or misaligned with real-world goals.
System for running consistent evaluations across tasks, versions, prompts, and model settings.
Assigning category labels to images.
Running new model alongside production without user impact.