Results for "collective behavior"
Decisions dependent on others’ actions.
Research ensuring AI remains safe.
A mismatch between training and deployment data distributions that can degrade model performance.
The learned numeric values of a model adjusted during training to minimize a loss function.
Configuration choices not learned directly (or not typically learned) that govern training or architecture.
A table summarizing classification outcomes, foundational for metrics like precision, recall, specificity.
Number of samples per gradient update; impacts compute efficiency, generalization, and stability.
Crafting prompts to elicit desired behavior, often using role, structure, constraints, and examples.
Updating a pretrained model’s weights on task-specific data to improve performance or adapt style/behavior.
Fine-tuning on (prompt, response) pairs to align a model with instruction-following behaviors.
Reinforcement learning from human feedback: uses preference data to train a reward model and optimize the policy.
Techniques to understand model decisions (global or local), important in high-stakes and regulated settings.
Studying internal mechanisms or input influence on outputs (e.g., saliency maps, SHAP, attention analysis).
A broader capability to infer internal system state from telemetry, crucial for AI services and agents.
Techniques that fine-tune small additional components rather than all weights to reduce compute and storage.
Training a smaller “student” model to mimic a larger “teacher,” often improving efficiency while retaining performance.
Attacks that manipulate model instructions (especially via retrieved content) to override system goals or exfiltrate data.
Coordinating tools, models, and steps (retrieval, calls, validation) to deliver reliable end-to-end behavior.
The shape of the loss function over parameter space.
Adjusting learning rate over training to improve convergence.
Strategy mapping states to actions.
Learning only from current policy’s data.
Learning from data generated by a different policy.
Logged record of model inputs, outputs, and decisions.
Extracting system prompts or hidden instructions.
Models time evolution via hidden states.
Persistent directional movement over time.
Shift in feature distribution over time.
Interleaving reasoning and tool use.
Matrix of first-order derivatives for vector-valued functions.