Will I think that the top Chatbot Arena scores accurately reflect which LLMs are most capable and useful at EOY 2024? | Manifold

Will I think that the top Chatbot Arena scores accurately reflect which LLMs are most capable and useful at EOY 2024?

Basic

8

Ṁ383

Jan 1

41%

chance

1D

1W

1M

ALL

Many markets on capabilities are being resolved via Chatbot Arena, but I think that Chatbot arena scores might not be a very good measurement. See https://manifold.markets/DanielKokotajlo/gpt4-or-better-model-available-for#Cd09AsavvH6UVdLod4N6 for some discussion.

Another reason why Chatbot Arena could fail is that as models get more powerful, chat use cases are less representative. Minimally, a high fraction of chat use cases right now are very easy and thus saturate on performance.

Note that this market just refers to whether I think that the top few (e.g. top 10) Chatbot Arena scores reflect the actual capabilities reasonably well, not whether Chatbot Arena can't be gamed. So if (e.g.) the best chatbots don't game Chatbot arena (even if they could), then the scores could be sufficiently representative.

I'm open to comments trying to convince me either way, but I don't promise to keep up with this market.

This question is managed and resolved by Manifold.

Get

1,000

and

3.00

Sort by:

do you currently?

@jacksonpolack Yes, I think current scores track reasonably well. But not amazingly well. So I would resolve to yes if this was the current end date.

@RyanGreenblatt do you still hold the same opinion?

@Soli Seems more dismal every day lol. Note that this market was created prior to style control and just reflects the pre-style control results.

Related questions

Who will have the best LLM at the end of 2024 (as decided by ChatBot Arena)?

What organization will have the highest ELO score in the LMSYS Org Chatbot Arena Leaderboard at the end of Dec, 2024?

Who will have the best LLM at the end of 2024 (as decided by ChatBot Arena Ranks)?

Chatbot Arena - top 3 labs EOY 2024

Who will have the best Non-Tool Enabled LLM by the end of 2024 according to Chatbot arena?

Which LLM will be rank 1 at the end of 2025 on ChatBot Arena?

What organization(s) will be ranked #1 in the LMSYS Org Chatbot Arena Leaderboard at the end of December 2024?

At EOY 2024, who will have the best LLM?

Will any LLM outrank GPT-4 by 150 Elo in LMSYS chatbot arena before 2025?

Who will have the best LLM at the end of 2025 (as decided by ChatBot Arena)?

Related questions

Who will have the best LLM at the end of 2024 (as decided by ChatBot Arena)?

Which LLM will be rank 1 at the end of 2025 on ChatBot Arena?

What organization will have the highest ELO score in the LMSYS Org Chatbot Arena Leaderboard at the end of Dec, 2024?

What organization(s) will be ranked #1 in the LMSYS Org Chatbot Arena Leaderboard at the end of December 2024?

Who will have the best LLM at the end of 2024 (as decided by ChatBot Arena Ranks)?

At EOY 2024, who will have the best LLM?

Chatbot Arena - top 3 labs EOY 2024

Will any LLM outrank GPT-4 by 150 Elo in LMSYS chatbot arena before 2025?

Who will have the best Non-Tool Enabled LLM by the end of 2024 according to Chatbot arena?

Who will have the best LLM at the end of 2025 (as decided by ChatBot Arena)?

© Manifold Markets, Inc.•Terms + Mana-only Terms•Privacy•Rules