Results for "data-driven"
Scaling law optimizing compute vs data.
Competitive advantage from proprietary models/data.
Belief before observing data.
Train/test environment mismatch.
Model trained on its own outputs degrades quality.
Startup latency for services.
Storing results to reduce compute.
Software pipeline converting raw sensor data into structured representations.
Models estimating recidivism risk.
Learning physical parameters from data.
Finding mathematical equations from data.
Learning a function from input-output pairs (labeled data), optimizing performance on predicting outputs for unseen inputs.
Reusing knowledge from a source task/domain to improve learning on a target task/domain, typically via pretrained models.
Methods that learn training procedures or initializations so models can adapt quickly to new tasks with little data.
Techniques that discourage overly complex solutions to improve generalization (reduce overfitting).
A conceptual framework describing error as the sum of systematic error (bias) and sensitivity to data (variance).
One complete traversal of the training dataset during training.
A parameterized function composed of interconnected units organized in layers with nonlinear activations.
Networks using convolution operations with weight sharing and locality, effective for images and signals.
Networks with recurrent connections for sequences; largely supplanted by Transformers for many tasks.
Converting text into discrete units (tokens) for modeling; subword tokenizers balance vocabulary size and coverage.
A high-capacity language model trained on massive corpora, exhibiting broad generalization and emergent behaviors.
Letting an LLM call external functions/APIs to fetch data, compute, or take actions, improving reliability.
A datastore optimized for similarity search over embeddings, enabling semantic retrieval at scale.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Model-generated content that is fluent but unsupported by evidence or incorrect; mitigated by grounding and verification.
Central system to store model versions, metadata, approvals, and deployment state.
Practices for operationalizing ML: versioning, CI/CD, monitoring, retraining, and reliable production management.
Ability to replicate results given same code/data; harder in distributed training and nondeterministic ops.
Observing model inputs/outputs, latency, cost, and quality over time to catch regressions and drift.