Will Al achieve 85% or higher on the Humanity's Last Exam benchmark before 2027?
Will Al achieve 85% or higher on the Humanity's Last Exam benchmark before 2027?
➕
Plus
5
Ṁ175
2027
76%
chance

Background

Humanity's Last Exam (HLE) is a benchmark designed to evaluate Al systems' reasoning and problem-solving capabilities across a wide range of academic disciplines, including mathematics, humanities, and natural sciences. Developed collaboratively by the Center for Al Safety and Scale Al, HLE comprises 3,000 unambiguous and verifiable academic questions contributed by nearly 1,000 subject-matter experts from over 500 institutions across 50 countries. The dataset is multimodal, with approximately 10% of the questions requiring both image and text comprehension, while the remaining 90% are text-based.

As of early 2025, state-of-the-art Al models have demonstrated limited success on the HLE benchmark. For instance, OpenAl's 03-mini (high) model achieved an accuracy of 13% when evaluated solely on text-based questions. OpenAl's Deep Research agent, which leverages the 03 model for extensive web browsing and data analysis, reached an accuracy of 26.6% on the HLE benchmark.

Resolution Criteria

This question resolves to YES if a fully automated Al system achieves an average accuracy score of 85% or higher on the Humanity's Last Exam benchmark before January 1, 2027.

• Verification: The score must be verified by credible sources such as peer-reviewed research papers, arXiv preprints, or independent evaluations from reputable Al research institutions.

• Autonomy: The Al must solve problems without any human intervention, external assistance, or reliance on pre-existing solution datasets.

• Compute Resources: There is no limitation on computational resources; Al systems can utilize unlimited resources to attempt solutions.

Get
Ṁ1,000
and
S3.00

What is this?

What is Manifold?
Manifold is the world's largest social prediction market.
Get accurate real-time odds on politics, tech, sports, and more.
Win cash prizes for your predictions on our sweepstakes markets! Always free to play. No purchase necessary.
Are our predictions accurate?
Yes! Manifold is very well calibrated, with forecasts on average within 4 percentage points of the true probability. Our probabilities are created by users buying and selling shares of a market.
In the 2022 US midterm elections, we outperformed all other prediction market platforms and were in line with FiveThirtyEight’s performance. Many people who don't like trading still use Manifold to get reliable news.
How do I win cash prizes?
Manifold offers two market types: play money and sweepstakes.
All questions include a play money market which uses mana Ṁ and can't be cashed out.
Selected markets will have a sweepstakes toggle. These require sweepcash S to participate and winners can withdraw sweepcash as a cash prize. You can filter for sweepstakes markets on the browse page.
Redeem your sweepcash won from markets at
S1.00
→ $1.00
, minus a 5% fee.
Learn more.
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules