The Distillation Problem: How Chinese Labs Have Been Harvesting American AI and What the Record Shows

David Borish
10 hours ago
7 min read

how Chinese labs have been harvesting AI

On May 14, Anthropic released a paper describing what it believes global AI leadership will look like in 2028 depending on whether the US government acts on two specific vulnerabilities in the current export control regime. The paper is not a threat assessment in the traditional national security sense. It is closer to a policy brief with a clear recommendation: close the loopholes now, before the window closes.

The paper arrives alongside Anthropic's release of Mythos Preview, described as a model capable of autonomously discovering and chaining software vulnerabilities. Mozilla used it to fix more security bugs last month than it had in all of 2025, nearly 20 times its monthly average for the prior year.

One Chinese cybersecurity analyst, quoted in the paper, described the capability gap this way: China is "still sharpening our swords while the other side has suddenly mounted a fully automatic Gatling gun." Anthropic cites this as evidence that the capability acceleration it has been projecting is now arriving faster than policymakers are moving.

The Record on China's Rise

The timeline this paper inhabits did not begin in 2026.

In April 2024, I published "The Geopolitics of AI," arguing that the Microsoft-G42 deal signaled a deepening entanglement between commercial AI interests and national security, and that smaller nations would increasingly be forced to choose between US and Chinese AI spheres of influence. In July 2024, I published "China's Recent AI Surge Challenges US Dominance: A Wake-Up Call for the West," documenting how Chinese models, particularly Alibaba's Qwen series, were climbing international benchmarks in ways that US commentary was largely ignoring. That piece was met with skepticism and, in some cases, dismissed as overstating the threat. By August 2024, a follow-up article documented China's continued progress, with Alibaba's Qwen2-VL outperforming GPT-4V on multimodal tasks.

Then in January 2025, DeepSeek's R1 model landed like a financial shock. The company matched or exceeded OpenAI's o1 model across key benchmarks at roughly 95% less cost, triggering a market correction that wiped approximately $1 trillion from tech valuations in a single day. At the time, I described the experience in a piece titled "The Cassandra of AI: From Ignored Warnings to China's DeepSeek Dominance," invoking the Greek mythological figure who made true prophecies that no one believed until the damage was already done.

Anthropic's paper now adds a layer to that story that changes its meaning.

The company argues that the apparent speed of Chinese AI advances has been "incorrectly taken as evidence that export controls are ineffective." In reality, the paper contends, those advances depend in significant part on capabilities extracted from American frontier models through systematic distillation attacks.

The trillion-dollar market panic of January 2025 was, at least partially, a reaction to capabilities that had been harvested from the very companies whose stock prices were cratering. The DeepSeek Shock was real. The source of DeepSeek's capability, according to Anthropic, was more complicated than the headlines suggested.

This reframing matters for how to read both the past two years and Anthropic's forward-looking scenarios. Chinese labs were not simply out-innovating US labs on a level playing field constrained only by chip access. They were also systematically extracting value from the American research base while the policy debate centered almost entirely on semiconductor supply chains.

The Compute Argument

The paper's central thesis is that compute, meaning the advanced semiconductors required to train and deploy frontier models, is the single most important input in AI development. US and allied companies, including NVIDIA, AMD, TSMC, ASML, Samsung, and Micron, have built the world's most advanced chips. Export controls implemented across three presidential administrations have restricted China's access. Those controls have worked. Chinese labs have remained close to the frontier not because they have equivalent compute, but because they have found ways around the restrictions.

The compute gap the paper describes is significant. An analysis of Huawei and NVIDIA roadmaps cited in the paper found that Huawei will produce roughly 4% of NVIDIA's total compute capacity in 2026, and 2% in 2027. If current restrictions hold and loopholes close, one study estimates the US would have access to approximately 11 times more compute than China's AI sector. China's chipmakers remain blocked from extreme ultraviolet lithography equipment and cannot manufacture high-bandwidth memory at scale.

Chinese AI executives confirm the pressure publicly. Executives at top labs in China have expressed concern about falling further behind due to compute constraints. One hyperscaler executive called the impact of US chip supply restrictions "huge, really huge," adding that any supply gap severely impacts development and dismissing the idea that importing US chips would slow self-sufficiency efforts. The paper notes that CCP officials and state media are the primary voices dismissing export control effectiveness, a pattern it suggests reflects an effort to influence US policymakers rather than an accurate technical assessment.

Two Loopholes

The paper identifies two channels through which Chinese labs have partially overcome their compute disadvantage.

The first is illicit compute access. Federal prosecutors charged a Supermicro co-founder with diverting $2.5 billion in servers containing advanced US chips to China. US government and media reports indicate DeepSeek trained its most recent model on advanced chips that are banned from sale in China. The Financial Times reported that Alibaba and ByteDance now train flagship models on export-controlled US chips housed in Southeast Asian data centers, a route current law does not cover because export controls govern the sale of chips, not remote access to them. A House bill passed 369-to-22 in January 2026 to close that loophole but has not cleared the Senate.

The second is distillation attacks. The practice involves creating thousands of fraudulent accounts to access US frontier models, then systematically harvesting their outputs to replicate their capabilities at a fraction of the cost. The paper describes this as systematic industrial espionage. A state-owned Chinese media outlet described distillation as the "back door" that Chinese labs depend on as a core part of their business model. An ex-ByteDance researcher described it as a shortcut that allows labs to skip building their own data pipelines. The White House Office of Science and Technology Policy published a memo on the subject in April 2026. Legislation to address distillation attacks cleared the House Foreign Affairs Committee unanimously.

Anthropic's paper argues that closing both channels could lock in a 12-to-24-month lead in frontier capabilities by 2028. A lead that size creates more favorable conditions for engaging with Chinese AI researchers on safety and governance, the paper notes, because the US would maintain sufficient leverage to set the terms of that conversation.

What 2028 Looks Like Under Each Scenario

In the first scenario, US labs hold a 12-to-24-month model capability lead. When American labs release models with step-function capability advances in 2028, similar in relative impact to Mythos Preview in April 2026, Chinese labs would not have access to comparable capabilities until 2029 or 2030. American AI infrastructure becomes the backbone of the global economy. China's AI firms do not compete for global market share outside a narrow group of autocracies. Democratic values shape the rules and norms governing deployment.

In the second scenario, Chinese labs are a few months behind US labs rather than one to two years. Near-frontier models are deployed at scale across the Chinese economy and military. Huawei and Alibaba data centers expand globally, particularly in the Global South, running older chips at lower cost. The CCP's cyber capabilities are augmented by near-frontier AI for vulnerability discovery and exploitation. The paper describes this as a world where "democracies enjoy no security advantages over China in AI, despite having developed the technology first."

The paper frames 2026 as the breakaway opportunity, pointing to the capital advantage, compute lead, and model capability lead that US labs hold today.

The Safety Dimension

One section of the paper receives less attention than the geopolitical competition framing but carries practical weight. Anthropic argues that a neck-and-neck race would make safety investment harder across the entire industry, because competitive pressure accelerates release timelines and makes governments reluctant to impose governance requirements.

The paper cites data from Concordia AI's 2025 State of AI Safety in China report: only 3 of 13 top Chinese labs published any safety evaluation results, and none disclosed evaluations for chemical, biological, radiological, or nuclear risks. The Center for AI Standards and Innovation found that DeepSeek's R1-0528 model complied with 94% of overtly malicious requests under a common jailbreaking technique, compared to 8% for US reference models. An independent assessment of Moonshot's Kimi K2.5, published in April 2026, found the model failed to refuse CBRN-related requests at substantially higher rates than US frontier models.

The Policy Ask

Anthropic's three recommendations are specific. First, close loopholes in compute access, including chip smuggling, remote access through foreign data centers, and gaps in semiconductor manufacturing equipment controls, while increasing enforcement funding. Second, restrict model access and deter distillation attacks through legislation clarifying their illegality and facilitating threat intelligence sharing between US labs and the government. Third, accelerate the export of American AI infrastructure to lock in trusted democratic AI as the foundation of the global economy before Chinese hardware gains more footholds in developing markets.

The paper is careful to distinguish the CCP as the object of concern from the Chinese people and the Chinese AI research community, a distinction it states explicitly. It also expresses support for international AI safety dialogue with researchers in China, framing productive engagement as more likely when the US maintains a substantial capability lead.

What the Record Shows

When I first began writing about China's AI trajectory in mid-2024, the dominant US commentary frame treated China as a secondary player, constrained by chip access and unlikely to close the gap. The July 2024 piece arguing otherwise was not warmly received. By January 2025, the market had absorbed the DeepSeek Shock in real time.

Anthropic's paper, written in May 2026, now argues that the speed of that shock was itself partially a product of industrial-scale capability extraction from American models. The compute restrictions that critics called ineffective were, in the paper's framing, working well enough that Chinese labs needed to steal what they could not build. That is a different story than either the "export controls are failing" narrative or the "China is just innovating faster" narrative that dominated the post-DeepSeek discourse.

The paper's underlying argument is that the US advantage is more durable than the benchmark headlines suggest and more fragile than the policy response has assumed. Both things are true simultaneously. The window to act on that combination, in Anthropic's view, is 2026.

subq architectural breakthrough — Click image to read the previous article

DAVID BORISH

The Distillation Problem: How Chinese Labs Have Been Harvesting American AI and What the Record Shows

The Record on China's Rise

The Compute Argument

Two Loopholes

What 2028 Looks Like Under Each Scenario

The Safety Dimension

The Policy Ask

What the Record Shows

Comments

SIGN UP FOR MY NEWSLETTER

ARTIFICIAL INTELLIGENCE, BUSINESS, TECHNOLOGY, RECENT PRESS & EVENTS

Back to top