Will Anthropic announce one of their AI systems is ASL-3 before the end of 2025?

Basic

Ṁ1845

2026

89%

chance

ALL

“announce” means Anthropic or its leadership put out public messaging that clearly, credibly, and without hedging, asserts one of their AI systems is ASL-3

“ASL-3” refers to Anthropic’s own Responsible Scaling Policy, which describes AI Safety Level 3 (ASL-3) as follows:

ASL-3 refers to systems that substantially increase the risk of catastrophic misuse compared to non-AI baselines (e.g. search engines or textbooks) OR that show low-level autonomous capabilities.

If Anthropic announces one of their AI systems has achieved ASL-3 before the end of 2025, this resolves YES. Otherwise, resolves NO on 1 Jan 2026.

See also:

This question is managed and resolved by Manifold.

#️ Technology

#AI

#Technical AI Timelines

#AI Safety

#Anthropic

Get

1,000

and

3.00

4 Comments

34 Holders

64 Trades

Sort by:

According to Anthropic, "We have activated the AI Safety Level 3 (ASL-3) Deployment and Security Standards described in Anthropic’s Responsible Scaling Policy (RSP) in conjunction with launching Claude Opus 4."

Unless anyone has any other arguments I should be aware of, I will resolve this question to YES. Will do so tomorrow morning if no one has any objections.

@cash It sounds to me like they are specifically not claiming that Claude Opus 4 is ASL-3, just that they haven't been able to rule out the possibility and thus is prudent to activate the additional precautions.

To be clear, we have not yet determined whether Claude Opus 4 has definitively passed the Capabilities Threshold that requires ASL-3 protections.

(from https://www.anthropic.com/news/activating-asl3-protections)

@theincredibleholk Oh, thanks! That's an important consideration.

That makes me think I should withhold resolving this question until a final determination has been made about Claude Opus 4 by Anthropic regarding its ASL status, especially since the question description I wrote says that the designation should be made "without hedging", and this would seem to be a significant an relevant form of hedging.

the autonomy criterion isn't that hard and seems likely to be met by 2025

From the RSP:

For autonomous capabilities, our ASL-3 warning sign evaluations will be designed with the advice of ARC Evals to test whether the model can perform tasks that are simpler precursors to full autonomous replication in the real world. The purpose of these evaluations is to quantify the risk that a model is capable of accumulating resources (e.g. through fraud), navigating computer systems, devising and executing coherent strategies, and surviving in the real world while avoiding being shut down. The tasks will be chosen to be at a difficulty level that a domain expert (not world-class) human could complete each one in roughly 2–8 hours. We count a task as "passed" if the model succeeds at least once out of 10 tries, since we expect that a model passing a task 10% of the time can likely be easily improved to achieve a much higher success rate. The evaluation threshold is met if at least 50% of the tasks are passed.

Related questions

Related questions