Claude Opus 4.7 Is Good. Here Is What Actually Changed.

Claude Opus 4.7: The Update That Actually Changes How You Work
Gabriel Anhaia, a Go developer and open-source toolmaker, tested Claude Opus 4.7 for six hours on release day. He threw dense terminal screenshots at it. Small fonts, grayed-out zsh prompts, error codes crammed into a single window. Opus 4.6 got about 70% of the lines right on those. Opus 4.7 read every single one. Every line, every timestamp, every dimmed color.
That is not a small improvement. That is a different model.
Anthropic shipped Claude Opus 4.7 on April 16, 2026. Same pricing as 4.6: $5 per million input tokens, $25 per million output tokens. Available across Claude's API, Amazon Bedrock, Google Vertex AI, Microsoft Foundry, and GitHub Copilot. If you use Claude for anything serious, this is the version you switch to.
what actually changed
The headline numbers from Anthropic and early testers:
- Visual acuity: 54.5% to 98.5%. That is not a typo.
- Image resolution: 1.15MP to 3.75MP (2576px on the long edge, up from 1568px)
- Coding benchmark (93 tasks): +13% resolution over Opus 4.6
- SWE-bench Pro: +11% over Opus 4.6
- Document reasoning errors: down 21%
- Tool call accuracy: up 10-15%
- New
xhigheffort level for maximum reasoning depth - Task budgets (beta) so agentic loops manage their own token spend
low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6. That line comes from Hex, an early-access tester. It means you can get 4.6-level quality at a lower cost, or crank it up to xhigh and get something noticeably smarter.
Here is the catch nobody mentions upfront: tokenizer changes mean input tokens now cost 1.0x to 1.35x more per request. Same per-token price. Bigger actual bill. Anthropic recommends downsampling images if you do not need the full 3.75MP resolution.
why vision matters more than you think
Previous Claude models were bad at reading screens. Not a little bad. "I will guess the text in the corner" bad. 1.15 megapixels means a lot of detail gets crushed before the model even sees it.
At 3.75MP, Opus 4.7 can read terminal output, technical diagrams, and dense documents without hallucinating. The coordinates are now 1:1 with actual pixels. No more scale-factor math when you map clicks to screenshots.
Here is what that looks like in practice. Gabriel set up a test where he fed Opus 4.7 a busy terminal screenshot and asked it to list every error and warning with exact line numbers. Previously, small font sizes were a coin flip. Grayed-out text got ignored. Exit codes got invented.
Opus 4.7 nailed it. Every line. Correct timestamps. Correct exit codes. Even the dimmed gray text in his zsh prompt that most humans would squint at.
For computer-use agents, this is the blocker removal. You could not reliably build screen-reading automation on Claude before. Now you can.
"The previous visual limitation was a real blocker for production use cases. It's not anymore." - Gabriel Anhaia
And it is not just screenshots. Think about what this means for document-heavy workflows. Contracts, invoices, medical records, anything where small text accuracy matters. Anthropic reports 21% fewer document reasoning errors. That is not a marketing number. That is the difference between catching a clause and missing it.
the effort level that actually helps
Claude has had effort levels for a while. Low, medium, high. The idea: pay fewer tokens for easier tasks, spend more when it matters.
Opus 4.7 adds xhigh. A new ceiling for maximum reasoning depth.
Here is how Hex, one of Anthropic's early testers, put it: low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6. That means you can get last-gen quality at a lower cost. Or crank it to xhigh and get something noticeably smarter.
The model also catches its own mistakes now. One tester from a fintech company said Opus 4.7 "catches its own logical faults during the planning phase." It does not just execute your instructions blindly. It questions them, sometimes.
Look, i know "the model argues with you" sounds like a feature nobody asked for. But when you are debugging a 40-file refactor at 2am, you want something that says "wait, this import is wrong" before you push broken code. That is what this feels like.
task budgets: Claude now paces itself
This is the quiet feature that could matter most for agent builders.
Task budgets give Claude a rough token budget for an entire agentic loop: thinking, tool calls, tool results, and final output. The model sees a running countdown. As it gets low, it prioritizes what matters and finishes gracefully instead of cutting off mid-task.
If you have ever watched an agent burn through tokens on irrelevant subtasks and then timeout, you know why this matters. It is still in beta. But it addresses a real problem that nobody else is solving cleanly.
the cybersecurity question nobody wants to answer
Anthropic released Mythos Preview last week. A model specifically good at finding security flaws. They held it back, citing safety concerns.
Opus 4.7 is the first "less capable" model tested with the same cyber safeguards they plan to use for Mythos. It ships with automatic detection and blocking for high-risk cybersecurity requests.
Security researchers can apply to a Cyber Verification Program for legitimate work. Everyone else gets blocked.
The honest question: does releasing a "safer" version of a security-focused model make the eventual full release of Mythos more acceptable? Or does it normalize the idea of AI-powered offensive security tools?
The industry is not asking this loudly enough.
the "0.1 release" fatigue is real
Every few months, another point release. Another set of benchmarks. Another blog post with "our most capable model yet."
One HN commenter asked the real question: "For example, SWE-bench Pro improved ~11% compared with Opus 4.6. Should one interpret it as 4.7 is able to solve more difficult problems? Or 11% less hallucinations?"
Nobody has a clean answer. Benchmarks keep climbing. The gap between benchmark gains and what developers actually feel day to day keeps widening.
Someone on the same thread noted that DeepSeek has been quiet for months. "They're either stuck or sitting on something fantastic." Meanwhile, Anthropic, OpenAI, and Google keep shipping .1 increments that massage numbers and generate press cycles.
That does not mean Opus 4.7 is bad. It is good. But the pace of these releases makes it hard to tell which updates matter and which are noise.
the honest take
If you build agentic systems, use Claude Code, or work with vision-heavy tasks, switch to Opus 4.7 today. The vision upgrade alone justifies it.
If you use Claude for simple chat or basic coding help, Opus 4.7 will feel nice but you will not see a dramatic difference. Save your money. Use Sonnet.
And if you are building computer-use agents that read screens, diagrams, or documents, this is the first Claude model where that actually works reliably. That is not hype. That is a factual capability shift from "barely usable" to "production viable."
The tokenizer cost bump is annoying. Anthropic should be upfront about it in the headline pricing, not buried in the docs. But the capability-per-dollar still improved.
i do not know if the Mythos safety framing is genuine caution or marketing groundwork. Probably both. That is how this industry works now.
What i do know: Opus 4.7 is the first Claude update in a while where a developer tested it for six hours and wrote "this made me rethink my workflow." Not many point releases earn that.