Results for "allocation systems"
AI applied to scientific problems.
Collective behavior without central control.
AI limited to specific domains.
Internal representation of the agent itself.
Risk threatening humanity’s survival.
Tradeoff between safety and performance.
Ensuring AI allows shutdown.
Intelligence and goals are independent.
Research ensuring AI remains safe.
Inferring and aligning with human preferences.
Training objective where the model predicts the next token given previous tokens (causal modeling).
Crafting prompts to elicit desired behavior, often using role, structure, constraints, and examples.
A high-priority instruction layer setting overarching behavior constraints for a chat model.
A datastore optimized for similarity search over embeddings, enabling semantic retrieval at scale.
Breaking documents into pieces for retrieval; chunk size/overlap strongly affect RAG quality.
Model-generated content that is fluent but unsupported by evidence or incorrect; mitigated by grounding and verification.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Ensuring model behavior matches human goals, norms, and constraints, including reducing harmful or deceptive outputs.
Automated detection/prevention of disallowed outputs (toxicity, self-harm, illegal instruction, etc.).
Rules and controls around generation (filters, validators, structured outputs) to reduce unsafe or invalid behavior.
Systematic differences in model outcomes across groups; arises from data, labels, and deployment context.
Standardized documentation describing intended use, performance, limitations, data, and ethical considerations.
Central system to store model versions, metadata, approvals, and deployment state.
Structured dataset documentation covering collection, composition, recommended uses, biases, and maintenance.
A broader capability to infer internal system state from telemetry, crucial for AI services and agents.
Systematic review of model/data processes to ensure performance, fairness, security, and policy compliance.
How many requests or tokens can be processed per unit time; affects scalability and cost.
Observing model inputs/outputs, latency, cost, and quality over time to catch regressions and drift.
Reconstructing a model or its capabilities via API queries or leaked artifacts.
Hidden behavior activated by specific triggers, causing targeted mispredictions or undesired outputs.