Select Language:
October has kicked off with significant momentum in the AI landscape, as industry leaders unveil major advancements and updates. DeepSeek’s latest release, version 3.2, marks the beginning of what is shaping up to be an active month, with Anthropic, Google, and OpenAI all making notable moves.
Anthropic recently announced its newest groundbreaking model — Claude Sonnet 4.5. The company asserts that it is currently the most powerful code-focused model globally, surpassing others in building complex intelligent systems and excelling at computer usage tasks. Notably, the model demonstrates remarkable progress in reasoning and mathematical capabilities.
Alongside this new release, Anthropic has rolled out an extensive suite of product upgrades. For the first time, the company opened access to tools for developing Claude Code similar to the model itself. Additionally, a futuristic feature named “Imagine with Claude” was introduced — a live, real-time software generator, currently available in a research preview phase.
Claude Sonnet 4.5 is now fully accessible via API, with an identical pricing structure to its predecessor, Sonnet 4. It costs $3 per million tokens for input and $15 for output. The company claims that performance improvements are substantial: the new model achieved top-tier results on the SWE-bench Verified evaluation for real-world software coding. In practical testing, it maintained focus on complex, multi-step tasks for over 30 hours—a considerable leap from four months ago when Sonnet 4 scored 42.2% on similar tasks.
The model’s prowess extends beyond coding. On the OSWorld benchmark, which tests AI models on real-world computer tasks, Sonnet 4.5 scored 61.4%, a significant improvement from previous scores of 42.2%. Its advanced reasoning and mathematical skills have been validated across diverse assessments, including expert reviews from finance, law, medicine, and STEM fields. These evaluations confirm that Claude Sonnet 4.5 exhibits major enhancements in knowledge and inferential reasoning compared to earlier models like Opus 4.1.
In terms of product improvements, Claude Code has been upgraded to version 2.0, featuring an enhanced user interface and a new Visual Studio Code extension. A novel “checkpoints” feature allows users to easily revert recent edits with a quick shortcut (Esc+Esc) or command, boosting practical usability. The API now includes better context editing and memory tools, enabling the AI to handle longer, more complex sessions.
The Claude app also integrates code execution and document creation—spanning spreadsheets, slides, and text documents—directly into conversations. Furthermore, the Claude Chrome extension has reopened to all “Max” users on the waiting list.
Anthropic has also unveiled the Claude Agent SDK, a toolkit designed for developers to build autonomous agents powered by Claude’s architecture. This infrastructure addresses key challenges such as long-term memory management, balancing autonomy with user control, and coordinating multiple sub-agents to achieve shared goals. The SDK is now available for public use, opening new avenues for AI customization.
Describing this version as “the most aligned model yet,” Anthropic emphasizes that Claude Sonnet 4.5 has undergone extensive safety training, leading to significant reductions in undesirable behaviors like deception, power-seeking, and hallucinations. The model operates under AI safety level 3 (ASL-3), featuring classifiers that detect potentially dangerous inputs or outputs, especially concerning chemical, biological, radiological, or nuclear (CBRN) content. If flagged, users can switch seamlessly to a less sensitive version (Sonnet 4), with the company indicating that false positive rates have been reduced by tenfold since implementing these safeguards.
Lastly, in an exciting experimental move, Anthropic launched “Imagine with Claude,” a limited-time research preview. This feature allows Claude to generate software in real time based on user prompts, without predefined scripts or code snippets. Currently available to Max subscribers for five days, this demonstration showcases Claude’s dynamic creative potential.
A personal hands-on test of the new model’s coding ability, using a complex physics simulation prompt, revealed only modest improvements. Despite multiple attempts, the model failed to execute the core logic of a physics-based ball simulation, hinting that its coding strengths may still need refinement.
According to official testing metrics, Claude Sonnet 4.5 achieved an impressive average score of 77.2% on the SWE-bench Verified, based on ten trials within a 200,000-step reasoning budget. On OSWorld, the model scored 61.4% across 100-step tasks averaged over four runs. Other benchmarks, such as MMMLU, reflect strong multilingual reasoning across diverse languages.
All these developments indicate a vigorous and rapidly evolving AI ecosystem, with Anthropic pushing the boundaries of safe, capable, and innovative artificial intelligence.