Results for "scaling effects"
Scaling law optimizing compute vs data.
Increasing model capacity via compute.
Increasing performance via more data.
Empirical laws linking model size, data, compute to performance.
Models effects of interventions (do(X=x)).
Dynamic resource allocation.
The degree to which predicted probabilities match true frequencies (e.g., 0.8 means ~80% correct).
Techniques that stabilize and speed training by normalizing activations; LayerNorm is common in Transformers.
Expanding training data via transformations (flips, noise, paraphrases) to improve robustness.
Stochastic generation strategies that trade determinism for diversity; key knobs include temperature and nucleus sampling.
Scales logits before sampling; higher increases randomness/diversity, lower increases determinism.
Capabilities that appear only beyond certain model sizes.
Cost to run models in production.
Cost of model training.
Methods like Adam adjusting learning rates dynamically.
Probabilities do not reflect true correctness.
Framework for reasoning about cause-effect relationships beyond correlation, often using structural assumptions and experiments.
Controlled experiment comparing variants by random assignment to estimate causal effects of changes.
Methods for breaking goals into steps; can be classical (A*, STRIPS) or LLM-driven with tool calls.
Variability introduced by minibatch sampling during SGD.
Directed acyclic graph encoding causal relationships.
Formal model linking causal mechanisms and variables.
What would have happened under different conditions.
Probability of treatment assignment given covariates.
Minimum relative to nearby points.
Control that remains stable under model uncertainty.
Motion considering forces and mass.
Mathematical representation of friction forces.
Systems where failure causes physical harm.
Mechanics of price formation.