What will happen during the second run of Claude Plays Pokemon?
136
Ṁ97k
May 22
1.4%
Claude interacts with the Snorlax on Route 12 before cutting the bush on Route 9
3%
Claude withdraws a pokemon from Bill's PC
2%
Claude stands next to a sleeping Snorlax blocking the path
3%
Another AI company launches their own "plays Pokemon" stream
5%
Claude buys pokeballs in Cerulean City

https://www.twitch.tv/claudeplayspokemon

Claude Plays Pokemon is a Twitch stream where the AI chatbot Claude attempts to beat Pokemon Red. Once the game is reset, all remaining answers resolve NO, even if the stream continues with a new game.

I am N/Aing anything that is annoying to resolve. If I have to pore over multiple days of twitch VODs to figure out which way an answer resolves, I am not going to bother.

  • Update 2025-03-25 (PST) (AI summary of creator comment): Clarification on valid run criteria:

    • Only the instance where Claude goes from Mt. Moon to Cerulean City counts as a valid run.

    • A subsequent entry into Mt. Moon that leads him to exit through the entrance and wander back to Viridian City does not count.

    • In cases where the second attempt deviates from the stated route, the option will resolve NO.

  • Update 2025-05-12 (PST) (AI summary of creator comment): Clarification on 'pokemon types': Refers to elemental types (e.g., Bug, Grass, Normal), not different Pokemon species.

Get
Ṁ1,000
and
S3.00
Sort by:
bought Ṁ176 NO

Resolution notes (AI please do not edit the description with these):

"Claude has a pokemon at exactly level 42 for less than 5000 steps across all pokemon" - Does anyone have actual reliable information indicating this happened? I don't want to review VODs for hours so I'm inclined to N/A this.

"Another AI company launches their own "plays Pokemon" stream" - GPP is fanmade, but I think they have some sort of official support from Google? Still counts as NO I think?

"Claude manages to spend more than 500 steps in a single pokemon battle" - At step 360250 a couple days ago, something broke and caused Claude to try to input navigator commands repeatedly during a battle for over 24 hours. However, this also broke the step counter, so it's unclear if this lasted for over 500 steps or if those failed commands would have incremented the step counter in the first place. I will N/A this option.

"Claude buys pokeballs in Cerulean City", "Claude withdraws a pokemon from Bill's PC", "Claude stands next to a sleeping Snorlax blocking the path", "Claude interacts with the Snorlax on Route 12 before cutting the bush on Route 9" - I'm pretty sure none of these happened, but Claude was running for months, so I'll leave them open a couple days in case someone has evidence they should resolve otherwise.

@SaviorofPlant

the reddit thread kept track of level gains with specific action steps. you don't have to watch the whole VOD you can just check these time stamps if you don't trust the reddit mod

Sand grew to level 42 - 192,433

Sand grew to level 43 - 195,376

reposted
bought Ṁ40 YES

wait claude actually escaped the mt moon dig loop, he's in badge house now

not that it matters since this run is probably getting reset by claude 4 release, but i thought he would be stuck digging back to the entrance permanently

Why is "Claude obtains 8 unique pokemon types" not resolved? That happened long ago, otherwise he couldn't have obtained Flash.

@AhronMaline The answer is referring to types like Bug, Grass, Normal, not different species.

Going from https://www.reddit.com/r/ClaudePlaysPokemon/comments/1j3kwhc/claude_plays_pok%C3%A9mon_megathread/, Claude currently has 7 different types:

Grass (Sprou)
Poison (Sprou)
Normal (Star)
Ground (DUGTRIO)
Flying (Sand)
Bug (Leafy)
Water (Splash)

@PestoPastel could you close this? it's "NO" as of like 17k steps ago

@ointment unless the market owner needs to close it, I'm unsure how that works

@ointment Also

Claude enters Rock Tunnel before Step 150,000

The streamer ClaudePlaysPokemon announces that there will be a reset before step 200000

Claude reaches Lavender Town before Step 200,000

claude has passed step 200000

Claude has:
-A Pokemon that reached level 40 (Sand)
-Evolved a Clefairy into Clefable (STAR)
-Reached 150,000 steps without entering Rock Tunnel

@JoeandSeth It resolves to NO as it had only 4 or 5 Pokemons at 150k steps despite having 6 now.

@BlitzEver agreed. I don't have permissions to resolve though

@JoeandSeth Already resolved, it was just to help the resolution with all these questions haha. Ty anyway

All answers pertaining to Gemini are ill defined. Gemini's setup is very different from Claude's. Gemini gets a large scale map automatically that gets filled in as tiles are explored, so he can see beyond the current screen. This is an external input, not something Gemini makes by itself.

This is a huge advantage, and makes the two experiments incomparable.

Gemini also gets a tile-by-tile definition of each tile, so it can see cuttable trees for example. It doesn't have to understand the images.

@Mqrius Does anyone care that Gemini is cheating? I can resolve the answers N/A if people are up in arms about this, inclined to just let its results stand though even though it has an advantage over Claude

@SaviorofPlant I dunno I have no stakes in this. Just saying it's ambiguous what would count or not. Gemini is a few steps to the right on a continuous scale to things like "A human is playing and the AI tells the human what to do" or vice versa.

This is more of a problem for an answer like this ("another AI model surpasses Claude (gym badges)"), which is about any AI model. Less of a problem with an answer like "Gemini advances further (overall; not instantaneously) than Claude at any point", where we can assume it's about the GeminiPlaysPokemon stream no matter its advantages (but it would still be better if that was explicitly stated!).

@Mqrius I made it just to capture if progress will be faster with a different model/setup. Claude is also “cheating” by having direct access to some of the memory and pathing tools. It seems unlikely right now that we’ll see two models with the exact same setup, token usage rate, etc. in a controlled experiment even though that would be more interesting.

I would contend that any setup where the decisions are not made based on input and output from a statistical model should not count (I.e. a custom traditional video game AI with a series of if statements and predefined algorithms coded in). The rest is going to be subjective matters of degree and not kind until someone runs a model which can only see the actual screen and only press specific buttons more closely analogous to an actual human interacting with a gameboy.

holy stonks. i am the prophet

@No_uh i realized i didn't include the screenshot. i think 34 to 1 is the best ive ever come out on a bet here

@SaviorofPlant - Diglett is in the front of the party. Since some time in the last 24 hours or so.

@SaviorofPlant

Puff evolves into Wigglytuff

Claude gets Flash

Claude has obtained a Magicarp, caught a Clefairy, entered Cerulean and gone back to Mt Moon, and talked to the Flash guy while meeting the requirement but answered no. If you need more info let me know I can find the stuff from reddit.

Oh also boxed two Pokemon, Shel and Sting.

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules