Will an LLM be able to solve confusing but elementary geometric reasoning problems in 2024? (strict LLM version)

Plus

Ṁ23k

Jan 1

25%

chance

ALL

This is a variant of the following market:

https://manifold.markets/dreev/will-an-llm-be-able-to-solve-confus

In this version, the problem has to be solved purely by the LLM itself.

Possible clarification from creator (AI generated):

Market will be resolved based on whether ChatGPT-4 (or equivalent) can solve confusing but elementary geometric reasoning problems purely by itself
Creator will make the judgment call on whether solutions that involve ChatGPT-4 count as being solved 'purely by the LLM itself'

This question is managed and resolved by Manifold.

Get

1,000

and

3.00

9 Comments

29 Holders

222 Trades

Sort by:

sold Ṁ768 NO

I've just sold my position in this market. Hopefully that means I can just make the judgment call on, assuming ChatGPT o1 can do this, whether that counts as being solved "purely by the LLM itself". Can I hear people's arguments for and against?

Sanity check: if GPT-o1 were to pull this off in time, that still counts as a strict LLM, right?

@dreev i think so

bought Ṁ250 NO

@dreev this will be increasingly challenging as more and more models are integrated into a single system, which is in part why I don't bet much on the "Will LLMs do [task] by [future year]?" markets, but yeah, I think it's reasonable to call GPT-o1 an LLM for the rest of 2024.

@Jacy Thank you, that makes a ton of sense. I shall avoid trying to single out LLMs in the future and hope that this one won't turn out too painful to adjudicate over the remaining 3 months in 2024. If anyone has any counterpoints about GPT-o1, chime in! (Not that it matters so far, with GPT-o1 failing our flagship geometric reasoning problem so far, but it does seem to be getting closer... 😬)

bought Ṁ100 YES

@dreev bet up on this one since if o1 counts as an LLM I think this also resolves yes?

@JohnCarpenter Oh, yeah, great question. Does o1 count as purely an LLM?

@dreev /shrug idk what the word pure means. It's one model that queries itself a lot of times in a row

@JohnCarpenter Yeah this is very nonobvious to me. There's presumably additional code for constructing the chain-of-reasoning prompts.

Related questions

Related questions