Results for "data distribution"
The relationship between inputs and outputs changes over time, requiring monitoring and model updates.
Harmonic mean of precision and recall; useful when balancing false positives/negatives matters.
Scalar summary of ROC; measures ranking ability, not calibration.
Methods to set starting weights to preserve signal/gradient scales across layers.
A model that assigns probabilities to sequences of tokens; often trained by next-token prediction.
Feature attribution method grounded in cooperative game theory for explaining predictions in tabular settings.
When some classes are rare, requiring reweighting, resampling, or specialized metrics.
Samples from the k highest-probability tokens to limit unlikely outputs.
Samples from the smallest set of tokens whose probabilities sum to p, adapting set size by context.
Raw model outputs before converting to probabilities; manipulated during decoding and calibration.
Reduction in uncertainty achieved by observing a variable; used in decision trees and active learning.
Strategy mapping states to actions.
Average value under a distribution.
Approximating expectations via random sampling.
Assigning a role or identity to the model.
Sampling multiple outputs and selecting consensus.
Maximum expected loss under normal conditions.
No agent can improve without hurting another.
Designing efficient marketplaces.
Increasing performance via more data.
Protecting data during network transfer and while stored; essential for ML pipelines handling sensitive data.
Human or automated process of assigning targets; quality, consistency, and guidelines matter heavily.
When information from evaluation data improperly influences training, inflating reported performance.
Maliciously inserting or altering training data to implant backdoors or degrade performance.
Privacy risk analysis under GDPR-like laws.
Learning structure from unlabeled data, such as discovering groups, compressing representations, or modeling data distributions.
Learning from data by constructing “pseudo-labels” (e.g., next-token prediction, masked modeling) without manual annotation.
When a model fits noise/idiosyncrasies of training data and performs poorly on unseen data.
Information that can identify an individual (directly or indirectly); requires careful handling and compliance.
Training with a small labeled dataset plus a larger unlabeled dataset, leveraging assumptions like smoothness/cluster structure.