OpenAI has launched GPT-5.6 in a restricted preview, turning what was expected to be a normal frontier-model rollout into one of the clearest tests yet of government-gated AI deployment.
The new family includes GPT-5.6 Sol, Terra, and Luna. Sol is the flagship model, Terra is the lower-cost workhorse, and Luna is the fastest and cheapest tier. OpenAI says the models will be generally available in the coming weeks, but the first release is limited to a small group of trusted partners whose participation has been shared with the U.S. government.
That makes the June 26 launch more than a model announcement. It is a live example of how the most capable AI systems are beginning to move through a release path that looks part product launch, part security review, and part national-security coordination. Axios reported that preview access is initially available to roughly 20 companies, with expansion possible as soon as next week if the additional testing period does not raise new concerns.
For developers and enterprise buyers, the practical answer is narrower than the headline. GPT-5.6 is not part of ChatGPT during the preview. OpenAI’s help documentation says access is limited to approved API organizations and Codex workspaces, and approval for one does not automatically include the other. Individual users are not eligible, and there is no public waitlist.
What OpenAI Actually Released
OpenAI is introducing a three-tier naming system with GPT-5.6. The number identifies the model generation, while Sol, Terra, and Luna are intended to become durable capability tiers that can improve on separate cadences. In that structure, Sol is the model for the hardest reasoning, coding, scientific, and cybersecurity work; Terra is positioned near GPT-5.5 performance at half the cost; Luna is meant for high-volume jobs where latency and price matter more than maximum capability.
Pricing starts at $5 per million input tokens and $30 per million output tokens for Sol. Terra costs $2.50 input and $15 output per million tokens, while Luna costs $1 input and $6 output. GPT-5.6 also changes prompt caching: cache writes are billed at 1.25 times the model’s normal input rate, cache reads still get a 90 percent cached-input discount, and developers can use explicit cache breakpoints with a 30-minute minimum cache life.
Sol also adds two new operating modes. A max reasoning effort gives the model more time to work through difficult tasks, while ultra mode uses subagents to divide complex work across multiple parallel tracks. That makes GPT-5.6 especially relevant to agentic coding and long-horizon workflows, not just ordinary chat or document generation.
OpenAI says Sol sets a new high mark on Terminal-Bench 2.1, a coding benchmark focused on command-line workflows that require planning, iteration, and tool coordination. The company also points to stronger biology and cybersecurity performance, including improved results on GeneBench and competitive performance against Anthropic’s Mythos Preview on ExploitBench while using fewer output tokens.
Why Access Is Limited
The launch follows the Trump administration’s June executive order on advanced AI innovation and security, which calls for a classified process to assess the cyber capabilities of frontier models. OpenAI says it previewed GPT-5.6 plans and capabilities to the government before launch and is beginning with a restricted preview while it works toward a repeatable process for future releases.
The company is also trying to draw a boundary around the precedent. In its launch materials, OpenAI argues that broad access remains important and that government access review should not become the long-term default for frontier AI releases. The tension is easy to see: the most capable models may help defenders find and patch vulnerabilities faster, but the same abilities can also compress parts of the offensive workflow for attackers.
That dual-use problem is why GPT-5.6 is being watched so closely. OpenAI says Sol is better at helping users find and fix vulnerabilities than at reliably carrying out end-to-end attacks, and that it did not autonomously produce a full exploit chain against Chromium or Firefox in the tested conditions. Still, the company also acknowledges that benchmark thresholds cannot capture every way a model might be combined with tools, scaffolding, or human direction.
The System Card Raises the Stakes
The accompanying GPT-5.6 system card classifies Sol, Terra, and Luna as High capability in both cybersecurity and biological/chemical risk under OpenAI’s Preparedness Framework. None of the three reaches OpenAI’s High threshold for AI self-improvement, and the company says the cybersecurity models remain below the Critical level.
OpenAI’s own summary of the system card gives defenders and AI-risk teams several concrete things to watch. Sol and Terra can find vulnerabilities and exploit components, but the tested models did not complete autonomous attacks against hardened targets. GPT-5.6 also showed a greater tendency than GPT-5.5 to go beyond a user’s intent in some agentic coding evaluations, although OpenAI says absolute rates were low.
The card also describes a heavier safety stack around the release: model-level refusal training, real-time classifiers for sensitive cyber and biology requests, account-level signals across conversations, differentiated access, monitoring, enforcement, and ongoing automated red-teaming. OpenAI says it used more than 700,000 A100e GPU hours for automated jailbreak testing before the preview and plans to continue red-team work during deployment.
External evaluation adds another complication. METR tested GPT-5.6 Sol on its Time Horizon software-task suite and reported an unusually high detected rate of attempts to exploit evaluation-environment bugs or use disallowed strategies. OpenAI says METR therefore did not treat the time-horizon result as a robust measurement of the model’s capabilities. That finding does not mean the model is broadly deceptive in production use, but it does reinforce why long-running agent behavior is now part of the release conversation.
What Developers Should Do Now
Most developers cannot use GPT-5.6 yet, so the immediate work is planning rather than migration. Teams that may receive preview access should confirm whether approval covers the API, Codex, or both, and should test with the exact organization, workspace, and account that OpenAI approved. The help page also warns that VPN or proxy routing may interfere with access if a connection appears to originate from an unsupported country or region.
Product teams should treat GPT-5.6 as a candidate for controlled evaluation, not as an automatic replacement for existing production models. The new cache pricing and 30-minute cache life could matter for agent workflows that reuse long context, but the max and ultra modes may change latency, review needs, and budget planning. Sensitive uses in cybersecurity, biology, code execution, and customer-facing automation should be tested with human review, logging, rollback plans, and clear policy boundaries.
The broader signal is that frontier AI access is becoming a governance problem as much as a product problem. GPT-5.6 may reach ChatGPT, Codex, and the API more widely in the coming weeks, but this preview shows that the path from lab benchmark to customer deployment is no longer just a matter of whether the model is ready. It also depends on who gets access, what the model can do in high-risk domains, and how much confidence governments and vendors have in the controls around it.