Benchmark Gap #8: Once a single AI gets >= 80% on FrontierMath Tier 4, how long until an AI publishes a math paper?
3
Ṁ85
2040

Invalid contract

This question is meant to measure the gap between solving the main math-based benchmarks at the time of market creation, and contributing to real world mathematics.

FrontierMath Tier-4 is an even harder version of FrontierMath - do we need something even harder to fully close the benchmark gap?

I will accept the AI being a (co) first author, or an AI being credited with significant contributions to both deciding what to prove and the actual proof (merely contributing to the proof is not enough - I am trying to get at "the AI does the work of a mathematician" not "the AI does the work of a proof assistant"). I would also accept, for instance, the human author of the paper expressing that they would have named the AI as a co first author if it was human, or saying that the result could not have been obtained without the assistance of the AI.

If a model publishes a paper before it achieves this score, I'll resolve to the 0 bucket.

  • Update 2025-07-16 (PST) (AI summary of creator comment): In response to user feedback, the creator has acknowledged that the resolution criterion "or saying that the result could not have been obtained without the assistance of the AI" may be interpreted differently than its literal meaning implies.

Get
Ṁ1,000
and
S3.00
Sort by:

Hasn't an AI already been credited as an author (blocked by journal policy)? I get what you're trying to do here but feels like it needs a bit of work.

@WilliamGunn I'm not aware of any math paper where an AI has been a first author - do you have a link?

@vluzko Not first author but there have been authors that wanted to make AI a coauthor I think.

If you have a specific example in mind I can look at it and decide if I need to update the description, but I'm pretty sure no AI has actually been a first author or anything like it on a math paper.

@vluzko If what you care about is that an AI is first author, perhaps remove "... or saying that the result could not have been obtained without the assistance of the AI."

It's your market, do whatever you want, but I predict that phrase causing trouble.

@WilliamGunn Yeah I can see how that will get interpreted differently than I mean it.

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules