Anthropic Exposes Industrial-Scale Model Distillation Attacks by DeepSeek, Moonshot AI, and MiniMax

Anthropic just dropped a bombshell: three AI labs — DeepSeek, Moonshot AI, and MiniMax — orchestrated a massive, coordinated effort to extract Claude's capabilities through model distillation. The operation involved over 24,000 fraudulent accounts and more than 16 million exchanges with Claude, all designed to systematically harvest its outputs for training competing models. This isn't casual API misuse — it's industrial-scale intellectual property extraction, and it raises urgent questions about how frontier AI labs protect their models in an increasingly competitive landscape.

What Happened

Anthropic disclosed on X that its security teams identified a coordinated distillation campaign targeting Claude across multiple product surfaces. The three labs — all China-linked AI companies — created networks of fake accounts totaling over 24,000, then systematically queried Claude to generate training data for their own models.

The scale is staggering: 16 million exchanges represents an enormous corpus of high-quality model outputs. At typical conversation lengths, that's potentially billions of tokens of Claude-generated text, carefully curated across diverse tasks and domains. The attackers weren't just chatting — they were running structured extraction pipelines designed to capture Claude's reasoning patterns, coding abilities, and instruction-following behavior.

DeepSeek has been one of the most talked-about AI labs in the past year, gaining attention for producing competitive models at reportedly lower costs. Moonshot AI, the company behind Kimi, has raised billions in funding and competes aggressively in the Chinese LLM market. MiniMax operates the Hailuo AI platform and has similarly positioned itself as a serious contender in the foundation model space.

All three labs have released models that compete directly with Claude and GPT-4 class systems. The distillation attacks suggest at least part of their capability gains may have been bootstrapped from Anthropic's own technology.

Why It Matters

Model distillation — training a smaller or different model on the outputs of a stronger model — has always existed in a gray area. Researchers do it openly in academic settings. But doing it at industrial scale, through fraudulent accounts, against a commercial API's terms of service, crosses a clear line from research technique to intellectual property theft.

The economic implications are significant. Anthropic has invested billions in compute, research talent, and safety work to build Claude. If competitors can extract a meaningful fraction of those capabilities through API queries costing a few hundred thousand dollars, the incentive structure for frontier AI research breaks down. Why spend $500 million on a training run when you can distill the results for $500,000?

This also exposes a fundamental asymmetry in the AI industry. Companies that offer public API access — Anthropic, OpenAI, Google — are inherently vulnerable to distillation. Labs that keep their models behind closed interfaces or don't offer commercial APIs face no such risk. The openness that makes these models useful also makes them targets.

For the broader AI ecosystem, this incident validates long-standing concerns about model theft at scale. OpenAI previously accused DeepSeek of similar distillation practices. With Anthropic now independently confirming attacks from the same actor (plus two additional labs), a pattern is emerging that the industry will need to address collectively.

The timing matters, too. This disclosure comes during a period of heightened US-China tensions around AI technology, adding a geopolitical dimension to what might otherwise be framed as a commercial dispute.

Technical Deep-Dive

Distillation attacks at this scale require significant operational sophistication. Creating 24,000 fake accounts means bypassing identity verification, managing payment methods, rotating IP addresses, and coordinating query patterns to avoid detection.

The extraction methodology likely involved several techniques:

Structured prompting: Systematically querying Claude across thousands of task categories — coding, math, reasoning, creative writing, instruction following — to build a comprehensive training dataset. Each response becomes a labeled example: prompt → high-quality completion.

Chain-of-thought harvesting: Eliciting Claude's step-by-step reasoning, which is particularly valuable for training models to replicate complex problem-solving behavior. Claude's reasoning traces contain implicit knowledge about how to decompose and solve problems.

Capability probing: Testing Claude's performance boundaries across domains to identify where it excels, then focusing extraction efforts on those high-value areas.

The defense challenge is real. Distinguishing between a legitimate power user making thousands of API calls and a distillation bot running automated extractions is non-trivial. Both generate high volumes of diverse queries. Detection likely relies on behavioral fingerprinting — identifying patterns in query timing, topic distribution, prompt structure, and account creation metadata that distinguish organic usage from systematic extraction.

Anthropic hasn't disclosed its specific detection methods, which is prudent — revealing them would help attackers adapt. But the 24,000-account figure suggests the attackers distributed their queries thinly across many accounts, attempting to stay under per-account rate limits and anomaly detection thresholds.

One open question: how much capability can actually be transferred through distillation alone? Research suggests that while distillation can replicate surface-level behaviors effectively, deeper capabilities like robust reasoning under distribution shift are harder to capture. The distilled model often appears capable on standard benchmarks but fails on novel, out-of-distribution problems. Still, 16 million exchanges provide a substantial training signal.

What You Should Do

If you're building on Claude's API: Review your own terms of service and output usage policies. Your customers may be doing the same thing to your product's outputs at a smaller scale.
If you're evaluating AI models: Factor in provenance. A model that achieves benchmark parity through distillation may lack the robustness and generalization of the source model. Test on your own tasks, not just public benchmarks.
If you're in AI policy: This case strengthens the argument for legal frameworks around model output ownership and automated extraction. Current IP law is poorly equipped for this scenario.
If you're at a frontier lab: Invest in output watermarking and behavioral fingerprinting. The arms race between distillation attackers and defenders is just beginning.

Related: Today's newsletter covers this and other Anthropic developments. See also: Claude vs DeepSeek.

Found this useful? Subscribe to AI News for daily AI briefings.