[Carlini questions] Most improvements in best AI systems direct result of the prior generation of AI systems
[Carlini questions] Most improvements in best AI systems direct result of the prior generation of AI systems
3
Ṁ70
2030
66%
By Jan 1st 2030
41%
By Jan 1st 2027

Resolution Criteria:

Today, most of the reason the next AI systems are better is because we spent more money training a larger model, and because we as humans have learned more about how to train better models. This question asks if the main reason why future AI systems will be better is because we used the previous generation of AI systems to help us build the next generation. For example: if we're using the prior generation to design better model architectures, write faster training code, or otherwise discover new ideas, this counts. If we're using the prior generation to curate training data, synthetically generate more data, or use some form or reinforcement learning, this counts. (In both cases, as long as the improvement from doing this is significant.) But if the main reason why the next generation is better is because we spent more money training them on faster chips, this doesn't count.

Motivation and Context:

Some processes lead to recursive improvement, others do not. For example: if you're going to build faster computer chips, it really helps to be able to use the previous generation of computer chips to build the next set. This gives you Moore's Law. Or, if you're trying to design high-precision machinery, you need "mostly high-precision machinery" to build the "more high-precision machinery". But for other processes this is not the case. If you're making dinner plates, then your ability to make dinner plates is not significantly improved by having a bunch of other dinner plates. It's constrained by your plate-making machines. Which will be the case for AI systems? Will they be primarily improved because we've spent more money and thrown more data at them? Or because we've used the prior generation of AI systems to help us build the next generation?

Question copied from: https://nicholas.carlini.com/writing/2024/forecasting-ai-future.html

Get
Ṁ1,000
and
S3.00


Sort by:

"If we're using the prior generation to curate training data, synthetically generate more data, or use some form or reinforcement learning, this counts."

Synthetic training data was used for training LLMs for several years at least at this point. (Mostly to "generate more data".)

Here's a random paper about it. https://arxiv.org/abs/2410.15226

It's not the best or earliest resource on the topic but if you read through the beginning, you'll see how commonly this approach is used just from the references to other works.

What is this?

What is Manifold?
Manifold is the world's largest social prediction market.
Get accurate real-time odds on politics, tech, sports, and more.
Win cash prizes for your predictions on our sweepstakes markets! Always free to play. No purchase necessary.
Are our predictions accurate?
Yes! Manifold is very well calibrated, with forecasts on average within 4 percentage points of the true probability. Our probabilities are created by users buying and selling shares of a market.
In the 2022 US midterm elections, we outperformed all other prediction market platforms and were in line with FiveThirtyEight’s performance. Many people who don't like trading still use Manifold to get reliable news.
How do I win cash prizes?
Manifold offers two market types: play money and sweepstakes.
All questions include a play money market which uses mana Ṁ and can't be cashed out.
Selected markets will have a sweepstakes toggle. These require sweepcash S to participate and winners can withdraw sweepcash as a cash prize. You can filter for sweepstakes markets on the browse page.
Redeem your sweepcash won from markets at
S1.00
→ $1.00
, minus a 5% fee.
Learn more.
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules