Will China Surpass the US in 2025?

In partnership with

Need a personal assistant? We do too, that’s why we use AI.

Ready to embrace a new era of task delegation?

HubSpot’s highly anticipated AI Task Delegation Playbook is your key to supercharging your productivity and saving precious time.

Learn how to integrate AI into your own processes, allowing you to optimize your time and resources, while maximizing your output with ease.

FUTURE
Will China Surpass the US in 2025?

This year could be the year that China surpasses the US in AI.

Worse, prominent figures in the space are convinced that this will happen soon. For instance, Clement Delangue, the CEO of HuggingFace, the leading open-source AI company, has predicted the inevitable ‘sorpasso’.

The panic is so real that recent US Government reports suggest a Manhattan Project-type effort to develop AGI before China, signaling that AI is viewed as a matter equal to losing the race to build nuclear weapons, adding a new dimension to the growing Cold War between the two superpowers.

But is this just based on vibes or real data?

Today, we're delving into the AI Cold War between the US and China to see whether the predictions are based on facts, not fearmongering. We will study the war from all crucial angles (data, compute, and technology), unveil the historical precedent (not really good-looking for the US), and reveal China’s AI strategy through key strategic levers designed to make this precise vision a reality.

How Each Superpower Stands on Each Pillar

As you probably know by now, AI has three pillars: datacompute, and technology (the algorithms). However, if we view AI as a matter of strategic importance to any country, we must also include government readiness.

And how strong or weak are both superpowers in each category?

The Data & Government Readiness Pillar

Without data, there’s no AI. And with garbage data, the resulting AI will be garbage, too, so high-quality data is the most important asset.

Western think tanks like Oxford Analytics show that the US is ahead of China in data availability, according to their annual AI government readiness report.

The researchers examine the availability of high-quality and representative data, which is crucial for avoiding bias and error.

  • On the one hand, the report portrays the US as a country with enviable access to high-quality data and the clear overrepresentation of English as the most commonly used language in data as a key advantage.

  • On the other hand, China is shown as a country that, due to its obvious lack of freedom of press and speech, has less access to its people's thoughts.

The amount of text in Chinese available is indeed considerably small to that of the US, representing just over 5% of the data in CommonCrawl, the largest public dataset of Internet data and one of the most common datasets used to train AIs, while English data represents around 46%, which gives light to this stark contrast.

However, in my opinion, the report is heavily ‘Western-biased,’ not because it’s biased toward giving higher scores to Western countries, but because it focuses too much on things like data privacy, which, although very important, are of little relevancy when arguing which country can take the lead in the AI Cold War (the end justifies the means).

If we are talking about raw access to data to achieve the best possible outcomes, things might not be as straightforward because we also need to factor in synthetic data. Today, AI-generated data represents a larger percentage of the total AI training data. And this number is growing.

In other words, China can—and does—close this data gap by simply creating entire training datasets, sampling frontier AI models like GPT-4o or Claude 3.5 Sonnet, and using that data to train their frontier models.

We also know that post-training data, created by humans to refine the performance of AI models further, has gained asymmetric importance lately despite being smaller than the pre-training data (the massive datasets used to train the base AI model). AI labs have to curate these datasets themselves, and they require extensive human effort from top experts in their fields, making them wildly expensive.

Here, China has an edge, as they have two factors in its favor:

  1. Access to cheap expert data. Even at the PhD level (the cohort of contractors that AI labs prefer to assemble these datasets), Chinese wages are much, much smaller.

  2. Talent pool. While the US can proclaim to have the largest amount of top researchers, we can’t forget that most of these AI researchers are, in fact, Chinese or Indian. If we look at the country of origin, China generates almost a third of the total top global AI researchers, according to the Financial Times, an interesting point to mention that was also echoed by the New York Times.

Chinese researchers play a crucial role in AI. Source

The Compute Pillar

Next, we have compute. While data and algorithms are great, without compute you aren’t building—or serving—your AI models.

This pillar seems to be the most straightforward, but it’s tricky, to say the least. At first, regarding access to compute, the US seems head and shoulders above China.

The US has a prominent lead in chip design, with companies like NVIDIA, Qualcomm, AMD, Groq, Apple, and Amazon, among others. Many of these companies are already well into the 4-nanometer process node and below (a measure of the compute density of the chips, based on the size of the transistors in the chip; the smaller, the more transistors you can put as a fraction of the chip surface area), while China is still—officially at least—in the 7nm range through Huawei’s designs.

However, while the US leads in design, it relies on Taiwan for manufacturing (Taiwanese companies represent over 60% of total chip production and +90% of advanced chips).

As the US lacks control over chip manufacturing directly, through global sanctions it has basically banned China from accessing state-of-the-art GPUs (although smuggling of this hardware to mainland China has happened) by prohibiting Taiwanese companies like TSMC from manufacturing the chips that Chinese companies like Huawei design.

This has forced Huawei to rely on SMIC (Semiconductor Manufacturing International Corporation), a Chinese foundry, to build its chips, a company that is nowhere close to the quality and production capabilities offered by TSMC.

From this perspective, it seems that the US can choke China indefinitely, but two nuances quickly emerge to suggest otherwise:

  1. The People’s Republic of China (mainland China) has been clear on its intentions of invading Taiwan (Republic of China), as they consider it part of the ‘One China,’

  2. China is the world’s biggest producer of Rare Earth materials necessary to build the chips; it’s not like China doesn’t have an equally strong grip on the semiconductor supply chain.

All things considered, the fight for compute represents the main battle in the AI cold war, with the US recently escalating the restrictions on China (this time, on HBM memory chips, another critical piece of GPUs mainly sourced from South Korea, another US ally), with China retaliating by sanctioning access to Rare Earth materials like gallium, germanium or antimony.

The problem for the US is that time is running out. ASML, the world’s leading manufacturer of EUV equipment, an essential component of the chip supply chain, discounts the inevitable in its business’ forecasts: China will eventually become semiconductor independent sooner rather than later, which the US wants to prevent.

For instance, many insiders firmly believe that Huawei’s latest chip, the Ascend 910C, is on par with the Blackwell B20, the ‘worse’ version of NVIDIA’s new GPU platform. Nonetheless, according to Huawei themselves, this chip is ‘as good’ as NVIDIA’s H100, the current SOTA GPU. If this is true (with Chinese propaganda, you never know), China can already self-manufacture GPUs that are only one generation behind the best-in-class.

Moving on, we get to the pillar that has Silicon Valley's incumbents most worried: Are Chinese LLMs/LRMs taking the lead over the multi-billion efforts of OpenAI, Anthropic, and Google?

The Algorithmic Pillar, the Holy Moat That is No More?

Not so long ago, when GPT-4 launched in March 2023, China was considered years behind US companies. Their models were rarely discussed, and some of today’s most important Chinese AI companies, like 01.ai, were just being born.

Today, several Chinese models paint a completely different picture at the Large Language Model (LLM) level:

  • Yi Lightning, an LLM from 01.ai, sits only behind GPT-4o, o1, Grok-2, and Gemini in the overall lmsys LLM leaderboard, even being ahead of Claude 3.5 NewSonnet (informally known as Claude 3.6). That makes this Chinese model a top-5 overall LLM.

  • GLM-4 Plus, from Zhipu AI, is almost a top-10 model, ahead of all Llama 3.1 and 3.2 models (although Llama 3.3 is yet to be classified at the time of writing).

  • In the Aider benchmark, considered the best for coding tasks, Alibaba's Qwen 2.5 Coder 32B cracks the top five, surpassing GPT-4o. DeepSeek Coder V2, from another Chinese AI lab, DeepSeek, is also ahead of GPT-4o.

  • Tencent, another Chinese megacorp, recently released a fully open-source video generation model, HunyuanVideo, that is on par with, if not better than, state-of-the-art models like OpenAI’s Sora or Google’s Veo. Their Hunyuan-Large LLM is also fully open-source and performs better in benchmarks than Llama 3.1 405B, considered the best open-source model (again, Llama 3.3’s results are pending).

But if you think this is impressive, brace yourself, as things have gotten even crazier (and scarier) in the large reasoner model (LRM) category, to the point that some firmly believe China is already ahead.

And they have plenty of reasons for it.

Subscribe to Full Premium package to read the rest.

Become a paying subscriber of Full Premium package to get access to this post and other subscriber-only content.

Already a paying subscriber? Sign In.

A subscription gets you:

  • • NO ADS
  • • An additional insights email on Tuesdays
  • • Gain access to TheWhiteBox's knowledge base to access four times more content than the free version on markets, cutting-edge research, company deep dives, AI engineering tips, & more