Chinese Labs Flood the Open-Source Arena While DeepSeek Keeps Climbing
Week of 2026-03-02 to 2026-03-06
This was the week Chinese AI labs made their clearest statement yet: the open-weight frontier is theirs to contest. Three major model families dominated HuggingFace's charts, and the sheer breadth of releases — spanning reasoning, multimodal, coding, and even text-to-speech — signals a strategy shift from catching up to setting the pace.
1. Qwen3.5 Drops a Full Model Army — and It's Not Just About Size
Alibaba's Qwen team didn't release a model this week. They released an entire ecosystem. The Qwen3.5 family includes a 397B-parameter MoE flagship (with only 17B active parameters per forward pass), a lightweight 35B-A3B multimodal variant, a coding-specialized Coder-Next, and — in a move nobody expected — a text-to-speech model. Combined, the family has already pulled 1.03M downloads and 1,152 likes on HuggingFace. (1,152 likes | 1.03M downloads)
The MoE architecture is the real story here. At 17B active parameters, the 397B model can run on hardware that would choke on a dense model half its size. NVIDIA already dropped an FP4-quantized version, and the Unsloth community had GGUFs up within hours. The 35B-A3B variant — just 3B active parameters for multimodal image-text tasks — is targeting the edge deployment sweet spot that most Western labs have ignored.
Why it matters: This isn't a single model competing on a single benchmark. It's a full-stack play: reasoning, vision, code, voice. Alibaba is positioning Qwen as the default open-weight foundation for builders who need coverage across modalities without stitching together five different model providers. For startups building multimodal products, this family might be the most practical release of the quarter.
What's next: Watch whether Qwen3.5-397B-A17B holds up on independent community evals. The architecture is compelling, but MoE routing quality varies wildly — and self-reported benchmarks from Chinese labs have been generous before.
2. Moonshot AI's Kimi-K2.5 Quietly Racks Up 1.7 Million Downloads
While Qwen grabbed the headlines with breadth, Moonshot AI grabbed the download charts with depth. Kimi-K2.5, the latest multimodal model from the Beijing-based lab, crossed 1.7M downloads this week — making it one of the most downloaded new models on HuggingFace, period. (2,200 likes | 1.71M downloads)
Kimi-K2.5 builds on Moonshot's strength in long-context understanding and visual reasoning. The download-to-like ratio here is striking: 1.7M downloads against 2.2K likes suggests this model is being pulled into production pipelines, not just starred and forgotten. Developers are actually using it.
Why it matters: Moonshot AI has been the quieter of the Chinese frontier labs, overshadowed by DeepSeek's reasoning hype and Qwen's breadth. But raw adoption numbers don't lie. If Kimi-K2.5 is pulling 1.7M downloads in a week, it's solving a practical problem that existing models aren't — likely in multimodal workflows where vision quality matters more than chat fluency.
What's next: Moonshot has been hinting at an agent-focused release. If they pair Kimi-K2.5's multimodal capabilities with robust tool use, they could carve out a real niche in agentic vision tasks — think automated UI testing, document processing, and visual QA pipelines.
3. DeepSeek-R1 Won't Stop Growing — 13K Likes and Counting
DeepSeek-R1 isn't new. But it's still growing. The reasoning-focused model crossed 13,096 likes this week — a number that puts it in rare company on HuggingFace — with 928K downloads showing sustained adoption months after release. (13,096 likes | 928.3K downloads)
What's driving the continued traction? Community fine-tunes and distillations keep extending R1's reach into new use cases. The model's chain-of-thought approach to reasoning has become a de facto template for open-source reasoning research, with dozens of derivative models now referencing R1's architecture and training methodology.
Why it matters: In a week dominated by new releases, DeepSeek-R1's staying power is the contrarian signal. Most open-weight models spike and fade within two weeks. R1 is building a compounding ecosystem — and that's harder to replicate than a benchmark score. It suggests that reasoning-first architectures have genuine staying power, not just launch-day hype.
What's next: DeepSeek has been quiet about R2, but the sustained interest in R1 gives them pricing power and community leverage whenever they choose to drop the successor. The longer they wait, the more the ecosystem locks in around their approach.
4. Qwen3.5-35B-A3B: The Multimodal Model That Actually Fits on Your GPU
Buried in the Qwen3.5 family announcement is a model that deserves its own spotlight. Qwen3.5-35B-A3B is a 35B-parameter multimodal model with just 3B active parameters — meaning it handles image-text tasks at a fraction of the compute cost of comparable models. It's already at 680K downloads and 846 likes. (846 likes | 680.5K downloads)
The "A3B" designation is the key detail. With only 3 billion parameters active per inference, this model can run on a single consumer GPU while still leveraging the knowledge encoded in 35 billion total parameters. For developers building vision-language applications — product image understanding, document parsing, visual question answering — this is a practical breakthrough in cost-per-query.
Why it matters: The multimodal gap between closed and open models has been one of the last remaining moats for API providers. A 3B-active multimodal model that performs competitively demolishes the economics argument for paying per-token to GPT-4o or Claude for vision tasks. If you're processing thousands of images per day, the cost difference is orders of magnitude.
What's next: Edge deployment is the obvious play. A 3B-active model is small enough for on-device inference on high-end phones and embedded systems. Expect to see Qwen3.5-35B-A3B show up in mobile apps before the quarter ends.
5. TensorFlow Resurges on GitHub Trending — But Why Now?
In a week dominated by new model releases, an unexpected name climbed GitHub's trending charts: TensorFlow. Google's original ML framework, which many had written off in favor of PyTorch, hit 193K stars and showed renewed activity. (193,955 stars | 75,215 forks)
The timing isn't random. Google has been quietly rebuilding TensorFlow's relevance through its TPU ecosystem and on-device ML story. With the explosion of smaller, efficient models (like the Qwen 3B-active variants above), TensorFlow's strengths in production deployment and mobile inference — TensorFlow Lite, TensorFlow.js — are suddenly relevant again. The framework never lost its edge in serving; it lost the research community. But if the industry is shifting from training to deployment, TensorFlow's bet may finally pay off.
Why it matters: Framework choice matters less for research prototypes and more for production systems. If you're deploying multimodal models to mobile devices or edge servers, TensorFlow's mature serving infrastructure is a genuine advantage over PyTorch's still-maturing deployment story.
What's next: Watch for Google I/O. If Google pairs TensorFlow's deployment strengths with Gemma model optimizations, they could recapture a meaningful slice of the production ML workflow — even if PyTorch keeps the research crown.
Quick Takes
- NVIDIA Qwen3.5-397B-A17B-NVFP4: NVIDIA wasted no time quantizing Qwen's flagship to FP4 — making the 397B MoE model runnable on significantly less hardware. If you were waiting for a practical way to self-host a frontier-class model, this is it. Link
- Unsloth Qwen3.5-27B-GGUF: The Unsloth community delivered GGUF quantizations for local inference within hours of the Qwen release. Open-source velocity at its finest. Link
- Qwen Coder-Next: Alibaba's coding-specialized variant targets the Copilot and Cursor use case directly. Early reports suggest strong performance on multi-file editing tasks, but independent benchmarks are still pending. Link
That's the week in AI. Subscribe to AI News to get daily briefings.