My First China AI Article Turns Two Today: How the Consensus Finally Caught Up

David Borish
24 hours ago
6 min read

Two years ago today, I wrote about Alibaba's Qwen-2-72B topping the Hugging Face Open LLM Leaderboard and argued that the comfortable assumption of a two-year Chinese lag in AI was already wrong. The response at the time ranged from indifference to accusations that the piece was alarmist. Six months later, DeepSeek released a reasoning model that matched OpenAI's o1 on major math benchmarks at a fraction of the training cost, and the Nasdaq lost close to a trillion dollars in tech valuation in a single day. Two years on from that first article, the pattern has repeated a third time, and this one comes with a deadline attached.

On June 16, 2026, the Beijing-based lab Z.ai, formerly Zhipu AI, released GLM-5.2 under an MIT license with no usage restrictions and no regional locks. That was the same day the Trump administration's export control directive on Anthropic's Fable 5 and Mythos 5 took effect, barring foreign nationals, including Anthropic's own employees, from accessing either model. Anthropic determined it could not guarantee compliance with the order and pulled both models worldwide rather than risk violating it. Z.ai co-founder Tang Jie called the shutdown "deeply regrettable" in a post on X and used the moment to argue that frontier intelligence should be open and freely downloadable.

The timing gave GLM-5.2 an opening that benchmarks alone would not have. On Arena.ai's Code Arena, GLM-5.2 now ranks first among all currently sampled models, a position it holds partly because Fable 5 was removed from the leaderboard entirely once the export order hit. On Design Arena's website generation leaderboard, GLM-5.2 knocked Fable 5 from a top spot the Claude family had held for months. Independent analysis from Artificial Analysis puts the model ahead of Google's Gemini 3.1 Pro on agentic tasks, and its price, roughly $1.40 per million input tokens against $10 to $15 for Claude Opus 4.8, undercuts every closed US alternative by a wide margin.

The model still trails the closed frontier on the hardest long-horizon coding benchmarks, and researchers at the security firm Graphistry have raised the possibility that GLM-5.2 may be a distillation of GPT-5.5 and Opus 4.8 rather than an independent breakthrough, a claim Z.ai has not addressed publicly. Even accounting for that uncertainty, the general capability gap that once looked comfortable now looks like a matter of months.

What the Six-Month Number Actually Rests On

The six-month figure gets repeated so often now that it is worth tracing where it comes from. Kai-Fu Lee, the former head of Google China and founder of 01.AI, has been making a version of this argument since 2024, when he said China had closed the gap from six or seven years to six to nine months in roughly fourteen months of catch-up work. In a Capgemini interview earlier this year, Lee reaffirmed the same historical pattern, noting that Chinese labs are training models at under ten percent of the cost of their American counterparts while landing models that perform at ninety to ninety-five percent of US capability, consistently six to nine months behind.

Former Google CEO Eric Schmidt updated his own estimate at the AI+Expo for National Competitiveness in May 2026, saying that a year earlier he believed China was one to two years behind and now believes the gap has closed to within six months, which he called a nanosecond in this industry. That view lines up with Stanford's 2026 AI Index, which found the overall performance gap between the best US and Chinese frontier models had narrowed to roughly 2.7 percentage points on general benchmarks, though a meaningfully larger gap persists on the hardest reasoning evaluations designed to resist data contamination. Epoch AI's longitudinal tracking puts the average lag at about seven months since 2023.

Perplexity CEO Aravind Srinivas has pushed a more specific and more uncomfortable version of the argument on the 20VC podcast with Harry Stebbings. He puts the open-source to frontier gap at roughly twelve months and argues that US export controls are the main reason that gap exists at all, not a natural technology lag. His argument is that by cutting China off from advanced chips, the restrictions pushed Chinese labs to become world-class at the physical layer of AI instead, building data centers faster because permitting, power access, and skilled labor are not the bottlenecks there that they are in the US. Power is not a problem, permits are not a problem, and labor is not a problem, he has said, and he warns that this dynamic could hand China a durable infrastructure advantage that outlasts any individual export rule.

The Founder Who Isn't Waiting for the Consensus

What makes the current moment different from past six-month estimates is that the company most directly positioned to close the remaining gap has told the world its own timeline, unprompted. When Elon Musk predicted on X that China would reach Fable 5-class capability "probably Q1" of next year, Tang Jie replied publicly that it "won't take that long." Axios separately reported that Tang Jie has said Z.ai will likely release an open-source model rivaling Fable before the end of this year. Z.ai's next model, GLM-5.5, is already slated for release in August, and the company has said it will target long-horizon, self-evolving autonomous agent systems as the next milestone.

If Tang Jie's public timeline holds even loosely, and if the historical six-to-nine-month lag that Lee, Schmidt, and Epoch AI all independently describe continues to apply, the math points toward a Fable-class, fully open-weight model landing somewhere around December, measured from Fable 5's June launch. That is not a certainty. Company founders have every incentive to talk up their own roadmap, benchmark claims from any single lab deserve the same skepticism whether they come from Beijing or San Francisco, and a model matching Fable 5 on capability benchmarks is not the same as matching it on every dimension that matters for security. But the direction of travel, and the fact that the target date now comes from the company building the model rather than from outside analysts guessing, is what separates this from earlier rounds of "China is catching up" commentary.

Why an Open-Weight Version of This Changes the Threat Model

The part of this story that goes beyond a capability race is what security researchers are already finding in GLM-5.2 itself. Axios reported that two independent security evaluations, from Graphistry and Semgrep, found GLM-5.2 performing on par with leading US models on cybersecurity investigation and vulnerability discovery benchmarks. Graphistry called it the first open-weight model it has tested that it would recommend for what it described as a frontier-like cybersecurity experience. Hackers on Russian-language forums are reportedly already discussing how easily the model can be jailbroken for offensive work, and some have found that framing a malicious request as a defensive one, such as asking how to protect a company from brute-force attacks, is enough to get the model to comply.

The structural problem is that none of the usual safeguards apply to an open-weight release. A closed model like Claude or ChatGPT can detect and ban an account behaving suspiciously. An open-weight model downloaded and run locally has no equivalent kill switch, no telemetry back to the provider, and no way to stop someone from fine-tuning it against a specific target. Travis Lanham of the security firm Armadin told Axios that an attacker running GLM-5.2 locally can personalize an attack after breaching a system, chaining exploits and moving laterally the way an elite human attacker would, with zero visibility to any defender. Ransomware analyst Roye Bass noted that the same openness lets attackers build their own version of tools they previously had to buy from other criminals, generating phishing content and fraud scripts on demand.

None of that requires a future, more capable model to be a live concern. It describes what GLM-5.2 can already do today, fifteen days after release. What a Fable-class open-weight successor would add is the difference between a model that is comparable to current frontier systems and one that matches whatever Anthropic's most capable public model looks like by the time it arrives, distributed the same way, with the same absence of guardrails, and available to anyone with a GPU rather than to a set of accounts a provider can monitor and cut off.

The Pattern Underneath the Headlines

I did not expect the Hugging Face leaderboard story from two years ago to look prophetic. I did not expect DeepSeek's cost efficiency to trigger a market crash within six months of that. What connects all three of these moments, and what I have argued throughout the research behind my book, is that capability shows up first in constrained settings, a benchmark, a leaderboard, a sandboxed evaluation, well before anyone has decided whether the world is ready for it to operate in the open.

GLM-5.2 topping Arena.ai's coding leaderboard the same week it is generating jailbreak discussion on criminal forums is that pattern compressed into a single news cycle. The six-month gap between US and Chinese frontier capability is not just a research statistic at this point. It is closer to a countdown, and the company on the other end of it has already told us roughly when it expects to reach zero.

DAVID BORISH

My First China AI Article Turns Two Today: How the Consensus Finally Caught Up

What the Six-Month Number Actually Rests On

The Founder Who Isn't Waiting for the Consensus

Why an Open-Weight Version of This Changes the Threat Model

The Pattern Underneath the Headlines

Comments

JOIN THE AI SPECTATOR MAILING LIST

Back to top