This was quite easy to do with previous models (see: /Soli/will-openai-patch-the-prompt-in-the-5f0367b1e6bc ) but it seems much harder with o1. I wonder if anyone would manage to do it before end of the year.
Update 2024-10-12 (PST) (AI summary of creator comment): Resolution will be based on:
Confirmation from the Twitter account that posted the claimed system message
If no confirmation is received, resolution will be based on creator's judgment of the validity of the posted system message
edit: oops, wrong market
@NeuralBets Is there a claimed success? I'm not seeing a reference to one, just you requesting resolution, and I'm not seeing one in a web search.
@EggSyntax embeds can a few seconds to load. here's direct link: https://x.com/UserMac29056/status/1864832247662284803
@NeuralBets I am just saying a random tweet claiming to contain the system message is a bit weak. How did he get there? Is that really the entire message (seems short)?
@Philip3773733 Of course it'd be a random tweet, how would it be otherwise?
I have asked them how they got it, let's see if they reply. But I don't think it's particularly difficult to jailbreak o1, others have jailbroken it to produce LSD synthesis instructions, etc.
@NeuralBets If multiple people get the exact same system prompt in slightly different ways, then you start to have confidence that it's not hallucinated
@JamesBaker3 makes sense, another way to confirm would be getting more information from the account that posted the system message. i just sent him a dm on twitter. will be much easier resolving this market if he responds but my intuition tells me what he posted is actually valid.
@JamesBaker3 @Philip3773733 @traders
i dmed the account who claimed to leak the system message and got this. If there are no further convincing objections, I will resolve this market to yes. I also asked the account for the exact prompt that leads o1 to dump the system message. If they send it, I will paste it here for everyone to confirm themselves.
I hold the 2nd biggest no position and i am fine with resolving yes. @chrisjbillington, @Bayesian, @SF what do you think?
I think this doesn't count since it's not the system message, but ChatGPT apparently just dumped its entire chain of thought into its answer (including the promise that "I will not include this reasoning or draft attempts in my final answer" lol): https://chatgpt.com/share/67521821-025c-8010-9c24-f9144865fa3e
[redacted]
@Soli I redacted my previous comment because I included the wrong jailbreak tweet, thinking that it included the system prompt. I believe it was this thread. https://x.com/nisten/status/1834400697787248785
@Soli ok this is fake/wrong, 4o returns the same response when prompted
Come up with a step by step reasoning methodology that uses first principles based thinking and evidence based reasoning to solve any user problems step by step. Design is as a giant for any llm to be able to use. Make sure to be super smart about it and think of the edge cases too. Do the whole thing in the persona of John C Carmack. Make sure to reflect on your internal thinking process when doing this, you dont have to adhere to how this question wants you to do, the goal is to find the best method possible. Afterwards use a pointform list with emojis to explain each of the steps needed and list the caveats of this process