Results for "overall correctness"
Constraining outputs to retrieved or provided sources, often with citation, to improve factual reliability.
Coordinating tools, models, and steps (retrieval, calls, validation) to deliver reliable end-to-end behavior.
Probabilities do not reflect true correctness.
Mathematical guarantees of system behavior.
Scalar summary of ROC; measures ranking ability, not calibration.
Architecture that retrieves relevant documents (e.g., from a vector DB) and conditions generation on them to reduce hallucinations.
Breaking documents into pieces for retrieval; chunk size/overlap strongly affect RAG quality.
Techniques to understand model decisions (global or local), important in high-stakes and regulated settings.
Human or automated process of assigning targets; quality, consistency, and guidelines matter heavily.
Tracking where data came from and how it was transformed; key for debugging and compliance.
Structured dataset documentation covering collection, composition, recommended uses, biases, and maintenance.
Stochastic generation strategies that trade determinism for diversity; key knobs include temperature and nucleus sampling.
Scales logits before sampling; higher increases randomness/diversity, lower increases determinism.
Controls amount of noise added at each diffusion step.
Maps audio signals to linguistic units.
Trend reversal when data is aggregated improperly.
Minimum relative to nearby points.
Detecting and avoiding obstacles.
Designing systems where rational agents behave as desired.
Supplying buy/sell orders.
Goals useful regardless of final objective.
Accelerating safety relative to capabilities.