Claude Fable 5 Returns With a New Test for AI Jailbreak Rules

Anthropic is restoring Claude Fable 5 after U.S. export controls on Fable 5 and Mythos 5 were lifted. The redeployment brings a new cyber-safety classifier, fallback handling for blocked requests, and a proposed industry framework for scoring AI jailbreak severity.
Anthropic launch artwork used for coverage of Claude Fable and Mythos frontier AI models.
Image: Anthropic

Anthropic is bringing Claude Fable 5 back online on July 1 after the U.S. government lifted export controls on Fable 5 and Claude Mythos 5, ending an unusual model-access shutdown that had turned a frontier AI release into a direct test of government oversight.

The company said the controls were lifted on June 30, less than three weeks after a June 12 order forced it to restrict access to both models by foreign nationals. Because Anthropic said it could not reliably verify nationality in real time, it suspended access to both models for all users instead.

Fable 5 is scheduled to return globally across the Claude Platform, Claude.ai, Claude Code, and Claude Cowork. Anthropic said access through AWS, Google Cloud, and Microsoft Foundry will be re-enabled as quickly as possible, without giving a firm date. Mythos 5, the less-restricted version aimed at defensive cybersecurity work, has been restored only for a set of U.S. organizations after government approval on June 26, while broader Glasswing access remains under coordination.

The restart is not just a product availability update. Anthropic is also tying the redeployment to a new cyber-safety classifier, a proposed jailbreak-severity framework with major cloud and AI partners, and a deeper working relationship with U.S. government evaluators before future frontier model releases.

What Changed for Claude Users

For users on Pro, Max, Team, and select Enterprise plans, Anthropic says Fable 5 will count for up to 50% of weekly usage limits through July 7. After that, access shifts to usage credits. Standard Enterprise seats do not get an included allowance unless usage credits are enabled, while premium Enterprise seats get included access through July 7.

Developers also need to account for Fable 5’s refusal and fallback behavior. Anthropic’s developer documentation says Fable 5 includes safety classifiers that can decline certain requests, while Mythos 5 does not include those classifiers and is limited to approved Glasswing customers. A refused Fable 5 request returns a successful API response with a refusal stop reason rather than an HTTP error, so applications need to handle blocked requests as a normal model outcome.

Anthropic says refused Fable 5 requests can be retried on another Claude model through server-side fallback, client-side SDK middleware, or manual retry logic. The company says users are not billed for requests refused before output is generated, and a fallback credit is meant to avoid charging the prompt-cache cost twice when switching models.

The model itself remains positioned as Anthropic’s most capable widely released system for demanding reasoning and long-running agentic work. Fable 5 and Mythos 5 share a 1 million token context window by default, up to 128,000 output tokens per request, and listed API pricing of $10 per million input tokens and $50 per million output tokens. Anthropic says both models carry 30-day data retention and are not available under zero data retention.

The Jailbreak That Triggered the Shutdown

The export-control fight began after Amazon researchers reported a way to bypass Fable 5 safeguards so the model could identify software vulnerabilities and, in one case, produce code demonstrating exploitation of a vulnerability. Anthropic argues that the reported behavior did not reveal unique Mythos-level capability, because other models it tested could identify the same vulnerabilities and produce the same exploit demonstration.

That distinction matters because Fable 5 is the broadly available model, while Mythos 5 is the higher-risk defensive cybersecurity variant with fewer safeguards. Anthropic’s account is that the disputed bypass landed in a borderline safety area: not harmless enough to ignore, but not proof that Fable 5 had exposed a new class of offensive capability that only Mythos-level systems could produce.

Even so, Anthropic said it trained an improved classifier to target the behavior in the Amazon report. When the classifier blocks a Fable 5 request, users are notified and the request is routed to Opus 4.8. Anthropic says the new classifier blocks the specific technique in more than 99% of cases, while acknowledging the change may also flag more benign coding and debugging requests.

That tradeoff will be felt most by developers and security teams using Claude for code review, vulnerability research, debugging, and agentic engineering workflows. A stricter classifier may lower misuse risk, but it can also interrupt legitimate defensive work unless products are built to explain refusals, route fallback requests cleanly, and preserve enough context for users to continue.

Why the Industry Framework Matters

Anthropic is using the redeployment to push for a common way to score AI jailbreaks. The company says it is working with Amazon, Microsoft, Google, and other Project Glasswing partners on a proposed framework built around four factors: how much capability a jailbreak gives an attacker, how broad that capability gain is, how easy the technique is to weaponize, and how easy it is to discover or repeat.

That is a more concrete proposal than simply saying frontier models need more safety testing. It resembles a security-industry attempt to classify AI bypasses with something closer to vulnerability triage: not every jailbreak is equal, and not every bypass should trigger the same response from model labs, customers, or governments.

The hardest cases will be the ones Anthropic describes as narrow but harmful, where a prompt unlocks a specific dangerous behavior without becoming a universal jailbreak. Those incidents may be difficult to score consistently because they depend on model capability, attacker skill, public availability of the technique, and whether existing tools or weaker models can already reach the same result.

Anthropic says it is also launching a HackerOne program for researchers to submit potential cyber jailbreaks in Fable 5. The company plans 24/7 monitoring of key jailbreak submission channels and says the most severe findings would trigger rapid preliminary mitigations after severity is confirmed.

A New Pattern for Frontier Model Releases

The Fable 5 redeployment points to a model-release pattern that is likely to recur: frontier AI systems are becoming powerful enough that launch decisions can depend on cybersecurity testing, government review, cloud-provider distribution, and enterprise access controls, not just product readiness.

Axios reported that the Trump administration lifted the Fable 5 restrictions Tuesday evening and that U.S. officials still face an August deadline under a recent executive order to create standardized benchmarks for evaluating security risks from new AI models. Anthropic’s own plan now includes pre-release government access and evaluation for models that materially advance capabilities relevant to national security, rapid sharing of significant jailbreak and misuse patterns, and dedicated resources for joint AI-security research.

For enterprises, the practical lesson is immediate. Teams building on Claude Fable 5 should check whether their plan includes usage credits after the temporary allowance period, whether their cloud provider has restored access, whether zero data retention requirements conflict with the model’s covered-model status, and whether applications are ready for refusal responses and fallback flows.

For the AI industry, the bigger question is whether the Fable 5 episode becomes a one-off political fight or the template for frontier model governance. Anthropic wants the latter to become more systematic: clear severity scoring, repeatable evaluation standards, and rules applied across model providers. The next major model release will test whether that process is actually becoming durable, or whether each launch will still depend on ad hoc negotiation once a risky capability report lands.

Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post
Laptop with a padlock graphic representing credential theft, malware disruption, and enterprise data security risk

BlueHammer Ransomware Flag Puts Microsoft Defender Patching Back on the Clock

Next Post
Close-up of the camera on Ray-Ban Meta smart glasses

Meta One Puts AI Glasses’ Conversation Focus Behind a Usage Meter

Related Posts