Humanity’s Last Exam lists grok 4 at 45%+?
160
Ṁ130k
2 hours ago
3%
chance

For the first HLE score for grok 4 reasoning (or if multiple are released at once, the highest), unless it is a month+ after grok 4 release, will it show up on

https://scale.com/leaderboard/humanitys_last_exam

As a score of 45% or more, after rounding, for the reasoning version of grok 4?

  • Update 2025-07-06 (PST) (AI summary of creator comment): This market is about the model currently expected to be called grok 4, not strictly any model with that specific name.

  • Update 2025-07-11 (PST) (AI summary of creator comment): If the reasoning score reported on the linked website includes tool use, it will count for this market's resolution.

Get
Ṁ1,000
and
S3.00
Sort by:
bought Ṁ3 YES

As far as I understand this market may resolve YES or NO whether HLE decides to allow tool use or not. Sounds like a coin flip, why are odds so low?

it is known that 2.5 pro does 25%+ with tool use from the grok 4 livestream graph. It is not presently on the lb. Simple inference

it might end up wrong but coinflip seems more wrong

@SimoneRomeo even with coin flip, the number given was 44.4% on their slides. The 50.1% is a little misleading, and not actually the final value, I think. I guess there's like a 20% chance that the tool use version somehow qualifies, within that a 50% chance that the heavy version actually gets rated this month, and within that a 20% chance that by some luck, HLE calculates the final value as 45% rather than 44%. So... about 2% all together, maybe.

Does this include Tool use or no?

@KJW_01294 it's whatever they decide to put on their website at the link provided

I suspect the website doesn't allow tools but if it does, it counts!

bought Ṁ50 YES

@Trazyn grok 4 reasoning (or if multiple are released at once, the highest),

@SCS this one is base, not Heavy, and probably lines up with Grok's estimate of 25% with no tools. Ultimately this makes me slightly optimistic because it suggests there's some variability and perhaps when HLE tests it independently, they might get 45% instead of 44% for Heavy with tools

lmao this is cursed

@bens @jim congrats hahahahaah

@bens wow

@jim I feel like my live trading was rational, tbh, but kudos

@bens I'm mad

@bens feckin brutal!!!

bought Ṁ50 YES
opened a Ṁ5,000 NO at 40% order
opened a Ṁ11,000 NO at 45% order

@bens 11% seems low but IDK

opened a Ṁ7,000 NO at 43% order

@jim I feel like I'm being adversely selected on this market, but idk

No idea what this is. Rooting for Jim because he’s in Platinum.

@Panfilo I don't even know what's going on, tbh

@bens Would love to read a brief postmortem here or on discord!

bought Ṁ750 YES

Grok 4 Heavy 50.7% HLE

bought Ṁ50 NO at 4%
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules