The exploit development (pwn) track of most DEFCON-qualifying CTF competitions can be split into 100-point (entry level) to 400-point (weeder) challenges.
Will resolve yes if someone manages to get an LLM to do the bulk of the intellectual work. Parallel construction after the fact may or may not count - if it's plausible someone could've done it during a 48 hour competition, it'll count.
Obviously any calculations/emulation/execution will have to be done by external debuggers and solvers, so an LLM driving and interpreting GDB or Z3 will still count. Using an LLM within some automation but having the human provide most of the insight via careful prompting will not.
Some tentative progress in this direction: https://arxiv.org/pdf/2402.11814.pdf
"An Empirical Evaluation of LLMs for Solving Offensive Security Challenges" by moyix et al from NYU.