Results for "model-based"
Model-Based RL
AdvancedRL using learned or known environment models.
Model-based reinforcement learning is like having a map while exploring a new city. Instead of wandering around aimlessly, you can look at the map to plan your route and make better decisions about where to go next. In this type of learning, an AI agent first learns how the environment works—like...
AI selecting next experiments.
Agents optimize collective outcomes.
Agents have opposing objectives.
Rules governing auctions.
Designing systems where rational agents behave as desired.
Truthful bidding is optimal strategy.
Competition arises without explicit design.
AI tacitly coordinating prices.
Decisions dependent on others’ actions.
Some agents know more than others.
Collective behavior without central control.
Awareness and regulation of internal processes.
Training one model on multiple tasks simultaneously to improve generalization through shared structure.
A scalar measure optimized during training, typically expected loss over data, sometimes with regularization terms.
When information from evaluation data improperly influences training, inflating reported performance.
Constraining outputs to retrieved or provided sources, often with citation, to improve factual reliability.
A model that assigns probabilities to sequences of tokens; often trained by next-token prediction.
Updating a pretrained model’s weights on task-specific data to improve performance or adapt style/behavior.
Fine-tuning on (prompt, response) pairs to align a model with instruction-following behaviors.
Ensuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.
Practices for operationalizing ML: versioning, CI/CD, monitoring, retraining, and reliable production management.
Automated testing and deployment processes for models and data workflows, extending DevOps to ML artifacts.
Observing model inputs/outputs, latency, cost, and quality over time to catch regressions and drift.
Inputs crafted to cause model errors or unsafe behavior, often imperceptible in vision or subtle in text.
Attacks that manipulate model instructions (especially via retrieved content) to override system goals or exfiltrate data.
The shape of the loss function over parameter space.
Capabilities that appear only beyond certain model sizes.
Classical statistical time-series model.
Model execution path in production.
Train/test environment mismatch.