
By Nov 12 2025, will there be a model that meets all of these criteria:
- >84.6% on the Artificial Analysis Quality Index - ie the average of benchmark scores on - MMLU 
- GPQA 
- MATH 
- HumanEval 
- MGSM 
 
- with no regressions on any individual benchmark 
 
Note:
- does not need to be an OpenAI model 
- open weights or free models will count as cheaper 
- quantized/distilled versions count, as long as they also beat the same accuracy thresholds 
@JoshYou imo we’ve only just started realizing algorithmic speedups- still seems to be plenty of low hanging fruit, in fact if there /isn’t/ a reasoning model that is faster than that (regardless of acc) by this time next year I would be extremely surprised. Also whether speedups attributed to blackwell hw speedups or no, we can discuss, ie should it be measured wrt current h100s