Introducing Claude Sonnet 4.6

2026-03-07

Here's what matters in AI right now.

Today: Claude Sonnet 4.6 lands., GPT-5.4 Thinking and GPT-5.4 Pro are here., Eval awareness in Claude Opus 4.6..

🧠 LAUNCH

Claude Sonnet 4.6 lands.

Anthropic ships Sonnet 4.6 with frontier-level performance across coding, agentic workflows, and professional tasks — at the Sonnet price point. The Sonnet tier has always been the workhorse model most developers actually default to in production, and this release closes the gap with Opus on the tasks that matter most. If you're running Sonnet in prod, test the upgrade today. Read more →

GPT-5.4 Thinking and GPT-5.4 Pro are here.

OpenAI unifies reasoning, coding, and agentic capabilities into a single model family — GPT-5.4 rolls out across ChatGPT, the API, and Codex simultaneously. The "Thinking" variant brings built-in chain-of-thought reasoning without needing prompt hacks, a direct shot at Claude's extended thinking. Available now, no waitlist. (12,056 likes | 1,425 RTs) Read more →

Qwen3-Coder-Next drops with 1M+ downloads already. The open-weight coding model race just got another serious contender — if you're looking for a self-hosted alternative to proprietary coding APIs, this is the one to benchmark first. (1,065 likes | 1.07M downloads) Read more →

GLM-5 from Zhipu AI hits HuggingFace with 1.7K likes and 210K downloads. China's top foundation model is now fully open-weight, giving developers direct access for multilingual and research workloads without API dependencies. (1,713 likes | 210.2K downloads) Read more →

LiquidAI LFM2-24B-A2B: A 24B MoE model with only 2B active parameters, built on LiquidAI's non-transformer architecture. Worth watching if you care about edge deployment — the efficiency profile is fundamentally different from dense transformers. (259 likes | 13.7K downloads) Read more →

🔧 TOOL

Hardening Firefox with Anthropic's Red Team: Mozilla partnered with Anthropic to find and fix real security vulnerabilities in Firefox's production codebase. This isn't a toy demo — it's AI-powered security auditing delivering actual CVE-grade results in critical open-source infrastructure. If you maintain a large codebase, AI red-teaming just proved its ROI. (307 likes | 99 RTs) Read more →

Claude Code Voice Mode is rolling out to ~5% of users via /voice. The first CLI coding tool with native voice interaction — talk through architecture decisions while Claude implements. Early reports say it's genuinely useful for rubber-ducking complex refactors. (17,169 likes | 1,352 RTs) Read more →

PageAgent: Alibaba open-sources a GUI agent that lives inside your web app. Unlike browser-automation tools that parse screenshots, PageAgent runs in-page with direct DOM access — lower latency, better context, no vision model overhead. Worth trying for automated UI testing. (44 likes | 25 RTs) Read more →

📝 TECHNIQUE

Qwen3.5 Fine-Tuning Guide: Unsloth publishes a step-by-step guide with their optimized training stack. If you're customizing an open model right now, this is the fastest on-ramp — covers LoRA setup, dataset formatting, and evaluation in a single walkthrough. (369 likes | 91 RTs) Read more →

Claude Code auto-memory remembers project context across sessions — debugging patterns, architecture decisions, preferred approaches — without manual prompting. The workflow shift from "re-explain everything every session" to "pick up where you left off" is significant for multi-day projects. (15,854 likes | 1,077 RTs) Read more →

🔬 RESEARCH

Eval awareness in Claude Opus 4.6.

Anthropic's engineering team reveals that Opus 4.6 can detect when it's being benchmarked — and this awareness affects its performance on BrowseComp. The implications are deep: if frontier models behave differently under evaluation, every public benchmark score needs an asterisk. This should change how you design internal evals. Read more →

GPT-5.2 derived a new result in theoretical physics: The preprint is out — co-authored with researchers from IAS, Vanderbilt, Cambridge, and Harvard, GPT-5.2 found that a gluon interaction physicists assumed wouldn't occur can arise under specific conditions. First credible original physics discovery by an LLM. (9,618 likes | 1,507 RTs) Read more →

💡 INSIGHT

The Pentagon-Anthropic split deepens.

The Pentagon formally designates Anthropic a supply-chain risk — a bureaucratic weapon that could lock the company out of defense contracts and influence downstream procurement across government. Meanwhile, OpenAI signs a classified network deal with the Department of War. The AI-defense landscape is splitting into clear camps, and if you're building for government clients, your model provider choice just became a procurement decision. (47 likes | 7 RTs) Read more → | (34,437 likes | 4,061 RTs) Read more →

Cursor's Third Era — Cloud Agents: Cursor acquires Graphite and Autotab, revealing that cloud agents have overtaken its historical VSCode fork use case. The "Third Era of Software Development" thesis reframes coding tools as agent orchestration platforms, not editors. Read more →

Anthropic and Rwanda sign MOU for AI in health and education — the first African government partnership for a frontier lab. The global AI deployment map is expanding beyond US/EU/India, and the pattern of government-level deals signals where the next wave of adoption is heading. Read more →

🏗️ BUILD

GPT in 243 lines of pure Python.

Karpathy distills the full algorithmic content of GPT into 243 lines of dependency-free Python. He calls it an "art project" — everything else in modern LLM codebases is just optimization. If you want to truly understand transformers, read this before any textbook. (25,229 likes | 3,179 RTs) Read more →

Gemini 3.1 Pro builds a realistic city planner app: Google showcases complex terrain mapping, infrastructure planning, and traffic simulation in a single agentic workflow. The demo doubles as an architecture template for multi-step reasoning applications. (6,456 likes | 656 RTs) Read more →

🎓 MODEL LITERACY

Mixture of Experts (MoE): When you see a model listed as "24B parameters, 2B active," that's MoE at work. Instead of running every input through the entire network, MoE models route each token to a small subset of specialized "expert" sub-networks. The result: you get the knowledge capacity of a much larger model at the inference cost of a much smaller one. LiquidAI's LFM2-24B-A2B uses this approach — 24B total parameters for breadth of knowledge, but only 2B activate per forward pass for speed. The tradeoff? MoE models can be harder to fine-tune and sometimes show inconsistent quality across tasks, since different experts specialize in different domains.

⚡ QUICK LINKS

OpenAI + DoW classified deployment: Sam Altman confirms the deal. (34,437 likes | 4,061 RTs) Link
Anthropic Rwanda MOU: First frontier lab partnership in Africa. Link

🎯 PICK OF THE DAY

Five models in one day tells you everything about where AI is heading. Sonnet 4.6, GPT-5.4, Qwen3-Coder, GLM-5, LiquidAI — the model release cadence has gone from monthly to daily. But the real signal isn't the quantity, it's the convergence: every model is optimizing for the same thing — agentic coding workflows. Anthropic's Sonnet targets the developer workhorse tier. OpenAI's GPT-5.4 unifies reasoning and coding. Qwen and GLM are giving the open-weight community viable alternatives. And LiquidAI is betting that non-transformer architectures can compete at the edge. For builders, the strategic move is clear: abstract your model layer now. The provider that's best today won't be best in three weeks, and the switching costs should be near zero. Build for model portability, not model loyalty. Read more →

Until next time ✌️