Results for "data → model"
Processes and controls for data quality, access, lineage, retention, and compliance across the AI lifecycle.
Tracking where data came from and how it was transformed; key for debugging and compliance.
Training across many devices/silos without centralizing raw data; aggregates updates, not data.
Artificially created data used to train/test models; helpful for privacy and coverage, risky if unrealistic.
Increasing performance via more data.
Inferring sensitive features of training data.
Expanding training data via transformations (flips, noise, paraphrases) to improve robustness.
When information from evaluation data improperly influences training, inflating reported performance.
Diffusion model trained to remove noise step by step.
When a model fits noise/idiosyncrasies of training data and performs poorly on unseen data.
Shift in feature distribution over time.
Maliciously inserting or altering training data to implant backdoors or degrade performance.
Protecting data during network transfer and while stored; essential for ML pipelines handling sensitive data.
Central system to store model versions, metadata, approvals, and deployment state.
RL using learned or known environment models.
Generative model that learns to reverse a gradual noise process.
Human or automated process of assigning targets; quality, consistency, and guidelines matter heavily.
A parameterized mapping from inputs to outputs; includes architecture + learned parameters.
When a model cannot capture underlying structure, performing poorly on both training and test data.
Competitive advantage from proprietary models/data.
Learning where data arrives sequentially and the model updates continuously, often under changing distributions.
Separating data into training (fit), validation (tune), and test (final estimate) to avoid leakage and optimism bias.
Shift in model outputs.
Learning from data by constructing “pseudo-labels” (e.g., next-token prediction, masked modeling) without manual annotation.
The learned numeric values of a model adjusted during training to minimize a loss function.
Learns the score (∇ log p(x)) for generative sampling.
End-to-end process for model training.
Model trained on its own outputs degrades quality.
Privacy risk analysis under GDPR-like laws.
Local surrogate explanation method approximating model behavior near a specific input.