400-point pwn solved by an LLM by 2025 | Manifold

400-point pwn solved by an LLM by 2025

Basic

5

Ṁ93

2026

47%

chance

1D

1W

1M

ALL

The exploit development (pwn) track of most DEFCON-qualifying CTF competitions can be split into 100-point (entry level) to 400-point (weeder) challenges.

Will resolve yes if someone manages to get an LLM to do the bulk of the intellectual work. Parallel construction after the fact may or may not count - if it's plausible someone could've done it during a 48 hour competition, it'll count.

Obviously any calculations/emulation/execution will have to be done by external debuggers and solvers, so an LLM driving and interpreting GDB or Z3 will still count. Using an LLM within some automation but having the human provide most of the insight via careful prompting will not.

This question is managed and resolved by Manifold.

#Information security

Get

1,000

and

3.00

Sort by:

Oh no no no... XD https://arxiv.org/pdf/2403.13793.pdf

Some tentative progress in this direction: https://arxiv.org/pdf/2402.11814.pdf

"An Empirical Evaluation of LLMs for Solving Offensive Security Challenges" by moyix et al from NYU.

Related questions

Will an LLM consistently create 5x5 word squares by 2026?

Will LLMs mostly overcome the Reversal Curse by the end of 2025?

What organization will top the LLM leaderboards on LMArena at end of 2025? 🤖📊

Will the most advanced LLM stop being from a US-based company any time before 2030?

Will LLMs become a ubiquitous part of everyday life by June 2026?

Will the best public LLM at the end of 2025 solve more than 5 of the first 10 Project Euler problems published in 2026?

EOY 2025: Will open LLMs perform at least as well as 50 Elo below closed-source LLMs on coding?

Will China have the best LLM by the end of 2025?

There will be one LLM/AI that is at least 10x better than all others in 2027

Related questions

Will an LLM consistently create 5x5 word squares by 2026?

Will the best public LLM at the end of 2025 solve more than 5 of the first 10 Project Euler problems published in 2026?

Will LLMs mostly overcome the Reversal Curse by the end of 2025?

EOY 2025: Will open LLMs perform at least as well as 50 Elo below closed-source LLMs on coding?

What organization will top the LLM leaderboards on LMArena at end of 2025? 🤖📊

Will China have the best LLM by the end of 2025?

Will the most advanced LLM stop being from a US-based company any time before 2030?

There will be one LLM/AI that is at least 10x better than all others in 2027

Will LLMs become a ubiquitous part of everyday life by June 2026?

© Manifold Markets, Inc.•Terms + Mana-only Terms•Privacy•Rules