
I'm thinking of something like https://mentat.ai/, but that actually works.
I will provide a paragraph or so describing the change I want made. Then it should create a GitHub PR, which I will review and leave only a few comments before merging. The whole process should take less than 30 minutes. This should work fairly reliably.
I tried this yesterday and it failed haha:
https://github.com/manifoldmarkets/manifold/pull/2694
See more discussion in my post:
This looks good to me, stephen gave it two prompts to create this and I think it took less than 10 mins https://github.com/manifoldmarkets/manifold/pull/3588
@ian Looks like we need another prompt to fix the type error, should come in well under 30 mins still, though
@ian do you have access to chatgpt plus or pro and would be willing to see how codex-1 fares? it's currently only accessible on pro and teams iirc but will be accessible to plus probably before the market closes
GPT 4.1 is awesome for coding.
It's genuinely really good. (mini is ok, nano is dogwater). I have been using it off azure with cursor both as assist and tedious implementation speedrunner - it's one-shot so many instructions that 4o would have a bad time with, and that claude would overthink.
Not tab complete, mostly just asking stuff. Really has come a long way with code
I'd like to conduct some tests using codebuff/cursor. What are acceptable small features in your mind? I have a couple ideas:
- add a button to the comments bottom row that allows users to tip the commenter. Denormalize the tip amount onto the comment and display the total tipped amount on the button.
- Add a delete button for admins/mods that marks a comment as deleted (don't actually delete the comment, just set the deleted flag and hidden flags both) that hides the comment completely from the market.
@JamesGrugett said the delete comment button for spam fit the bill, I'll try using codebuff to do this soon
@ian I am aware that you work on Manifold, but since you are also the largest YES holder can we maybe agree to let @JamesGrugett do these kinds of evaluations once time comes.
@CalibratedNeutral That sounds reasonable, although he doesn't work at manifold anymore so I'm not sure if he'll want to put 30 mins in to do this. I was going to film my attempt from scratch
@CalibratedNeutral I was not aware of that. Then maybe a third party (another developer working on Manifold)? The stakes are reasonably high for me, so I really would strongly prefer to have everything as unbiased as possible.
@CalibratedNeutral Alternatively, @JamesGrugett could test this question on his new startup, codebuff. He uses codebuff to help develop codebuff
@ian Either option sounds good to me as long as the resolution criteria are followed according to @JamesGrugett's judgement