By Nov 12 2025, will there be a model that meets all of these criteria:
>84.6% on the Artificial Analysis Quality Index
ie the average of benchmark scores on
MMLU
GPQA
MATH
HumanEval
MGSM
with no regressions on any individual benchmark
Note:
does not need to be an OpenAI model
open weights or free models will count as cheaper
quantized/distilled versions count, as long as they also beat the same accuracy thresholds
@JoshYou imo we’ve only just started realizing algorithmic speedups- still seems to be plenty of low hanging fruit, in fact if there /isn’t/ a reasoning model that is faster than that (regardless of acc) by this time next year I would be extremely surprised. Also whether speedups attributed to blackwell hw speedups or no, we can discuss, ie should it be measured wrt current h100s