Results for "demonstration-based"
A datastore optimized for similarity search over embeddings, enabling semantic retrieval at scale.
Model trained to predict human preferences (or utility) for candidate outputs; used in RLHF-style pipelines.
Controlled experiment comparing variants by random assignment to estimate causal effects of changes.
Ordering training samples from easier to harder to improve convergence or generalization.
Artificially created data used to train/test models; helpful for privacy and coverage, risky if unrealistic.
Central system to store model versions, metadata, approvals, and deployment state.
Search algorithm for generation that keeps top-k partial sequences; can improve likelihood but reduce diversity.
Raw model outputs before converting to probabilities; manipulated during decoding and calibration.
Reconstructing a model or its capabilities via API queries or leaked artifacts.
A system that perceives state, selects actions, and pursues goals—often combining LLM reasoning with tools and memory.
Methods for breaking goals into steps; can be classical (A*, STRIPS) or LLM-driven with tool calls.
Constraining model outputs into a schema used to call external APIs/tools safely and deterministically.
Models that process or generate multiple modalities, enabling vision-language tasks, speech, video understanding, etc.
A measure of a model class’s expressive capacity based on its ability to shatter datasets.
AI subfield dealing with understanding and generating human language, including syntax, semantics, and pragmatics.
Limiting gradient magnitude to prevent exploding gradients.
Built-in assumptions guiding learning efficiency and generalization.
Prevents attention to future tokens during training/inference.
A single attention mechanism within multi-head attention.
Encodes token position explicitly, often via sinusoids.
Routes inputs to subsets of parameters for scalable capacity.
Strategy mapping states to actions.
Expected return of taking action in a state.
Extending agents with long-term memory stores.
Optimizing policies directly via gradient ascent on expected reward.
Coordination arising without explicit programming.
Learning from data generated by a different policy.
Categorizing AI applications by impact and regulatory risk.
Models trained to decide when to call tools.
Extracting system prompts or hidden instructions.