I (Tamay Besiroglu) bet the mathematician Daniel Litt that the best AI models in March of 2030 would be capable of generating math papers in Number Theory at the level of quality of papers published in Annals today (i.e. 2025). https://x.com/tamaybes/status/1899262088369106953?s=46
The AI would receive no detailed guidance relevant to the mathematics research, and is required to accomplish this task autonomously.
The AI system(s) are granted a total budget of $100k in inference compute per paper.
This bet would be resolved on the basis of Daniel Litt’s judgment.
Update 2025-03-21 (PST) (AI summary of creator comment): Novel Research Requirement Clarification:
For a YES resolution, the AI must perform novel research autonomously, not just produce a paper that could pass as research.
Update 2025-03-23 (PST): - Budget Currency: The $100k inference compute budget is expressed in nominal dollars (current currency) with no inflation adjustment. (AI summary of creator comment)
Update 2025-05-17 (PST) (AI summary of creator comment): The creator endorsed an interpretation (via a previously posted ChatGPT response to a user's question) regarding the market's resolution. This endorsement suggests:
The market generally requires demonstrating repeatable capability in generating Annals-quality math papers.
However, a single, exceptionally significant autonomous achievement by an AI (such as proving the Riemann hypothesis) before 2030 would also be considered sufficient for a YES resolution.
I think AlphaEvolve is positive news for this market.
I think one possible way of producing qualifying result is first finding (in similar fashion to AlphaEvolve) some construction that was previously considered to be likely impossible. And then writing a paper about implications of this new (not sufficient by itself, likely too mechanistic and not very insightful) construction on various things.
I am not mathematician so I am not sure if my idea makes sense. But in any case I feel like AlphaEvolve is progress towards YES.
@qumeric Definitely is a progress but the approach feels fundamentally expensive. 2030 is a long way away though.
@TamayBesiroglu Is this bet about a single occurrence, i.e. does a single paper having published resolve this market YES, or does it need to repeatable (and have been shown to be repeatable)? I suppose it's the latter since you wrote 'capable of generating math papers' (plural).
@CalibratedNeutral I understand it to be about a capabilities, so repeatable. The bet is meant to be a proxy for “do the best models have absolute advantage over top mathematicians in doing math research.” That said, if an AI autonomously proves the Riemann hypothesis or something before 2030 I think you should expect the market to resolve YES.
@LocalGlobal This feels to me a bit like "Those sea lions that were trained to build motorcycles a few years back can source parts and draft designs, sure, but their welding technique is still terrible!"
Without a deeper model of what drives AI progress, the phrase "currently models cannot" is just not predictive of anything.
@LocalGlobal https://sakana.ai/ai-scientist-first-publication/ Sakana
AI recently were able to produce a paper that passed peer review for an ICLR workshop
@jgyou I agree with this. The paper should not have passed the first round of peer review. I was most impressed by the fact that it could autonomously construct such a paper, graphics and all. This is another case where the exaggerated claim serves as a good prediction of near-future capabilities.
5 years is a very long time, even if the exponential improvement we have consistently seen for the past 6 years doesn't hold the whole way through.
@Haiku I think these responses might just be overestimating the standards that are expected of a workshop paper.
@SimoneRomeo 5 years ago was 2020 - year when GPT-3 came out. We had models writing poems in certain style in 2016 i believe.
@mathvc if it was the beginning of 2020, right before the release of gpt3, would you think that 5 years later a model would be able to write this:
https://chatgpt.com/share/680b6c9d-d638-8012-bc32-0cf36cae46eb
@SimoneRomeo I would not be surprised no. Most of the text is essentially copied from the training data and the rest is pure bluffing. For example, part 4 of proof sketches is nonsense, there is by definition no Siegel zero in a fixed finite range of moduli. There is certainly much more nonsense but don't want to bother checking. I have seen much more impressive LLM output tbh.
Also, it does not even look like a math paper, the introduction, background, main results, future work layout is quite common in experimental sciences but is never used in (pure) mathematics. There are also no actual proofs, which are the cactual content of any math paper. If this was written by a human just this would be enough to conclude it is obvious crankery. This is almost certainly fixable with better prompting and/or scale but it shows some issues.
The Sakana example actually increased my odds since it wrote a coherent multi-page technical text which is the first time I've seen something like this.
I stay with my belief that 80 % is crazy high. I guess the people betting like this think a singularity is very likely.