Silicon Tears, A Preventable Crash, New Rounds, & More

In partnership with

An entirely new way to present ideas

Gamma’s AI creates beautiful presentations, websites, and more. No design or coding skills required. Try it free today.

THEWHITEBOX
TLDR;

This week, you’ll understand why DeepSeek’s cost efficiencies are massively overblown {🥵 OpenAI Accuses DeepSeek of Stealing} but also how SV incumbents are nevertheless panicking and openly asking the US Gov to save them {🌏 Anthropic CEO Calls For Cold War Escalation}.

We will expose the hypocrisy some tech tycoons have regarding AI safety while actively helping the DoD make war with AI {🫣 Pentagon Claims AI Speeding its ‘Kill Chain’}, and I’ll also give you my reasons as to why OpenAI’s latest valuation feels very hard to justify {🌚 OpenAI Eyeing a $300 Billion Valuation}.

Finally, we will see how the open-source community, both Chinese and from the US, continue to push the entire space forward {📦 Alibaba Ups the Ante} and how, with just $30, you can prove that DeepSeek’s recipe just works {🤩 Building Zero at Home for $30}.

Enjoy!

CHINA
OpenAI Accuses DeepSeek of Stealing

The word is out. OpenAI claims to have proof that DeepSeek used distillation to train v3, which would explain the extremely low training budgets. David Sachs, the Trump Administration’s AI and Crypto Tzar, made this very explicit accusation on live TV based on the SV start-up's alleged proof.

TheWhiteBox’s takeaway:

How does one word, distillation, change the entire picture? NVIDIA’s crash can be summarized as 'China has built a model as good as Western ones for 20x less money, aka GPU overspending is finally proven. Sell NVIDIA!'

First and foremost, even if distillation weren't in the picture, that statement would still be false due to inference-time compute, but I digress. But this turn of events takes the opinions of AI influencers and investors who compared both models to a new level of public embarrassment because it's an apples-to-oranges comparison.

But why? And what is distillation?

Also known as the teacher-student method, it introduces a new term to the loss formula (KL Divergence) that 'forces' the model (the student) to imitate the teacher's behavior by cloning its output distribution (accessible through the OpenAI API top_logprobs parameter) while still learning to predict the next word in a sequence (the standard cross-entropy objective).

And what does all that even mean?

As you can see below, we can access the ‘top k most likely words’ at every sequence step. Thus, what the distillation method does is to ensure that (using the example below) not only ‘measuring’ is chosen by v3 if ChatGPT is likely to choose it, but also to ensure that ‘standing’ comes second, ‘with’ third, and so on, and with as similar probabilities as possible.



In other words, it teaches the student model to handle tasks like the teacher would, to generate sequences 'in the style' of the teacher model. And why is this a training shortcut?

Think of it this way: Comparing the training of v3 with GPT-4 is like comparing the learning curve of a student learning by itself (GPT-4) with one that is using a teacher (ironically, GPT-4) to teach the student (v3).

Thus, while DeepSeek's results are remarkable and clearly innovative (especially regarding memory management during inference), you cannot train a GPT-4-level model with just $6 million without resorting to tricks like distillation.

You just can’t with today’s hardware.

Presenting this possibility is even more embarrassing for these people when you realize most of these "experts" are confusing DeepSeek's V3 and R1 models, as if R1 hadn't endured a massive GRPO training on top of V3 (the $6 million model). They also conveniently ignored other costs (dataset building, human rejection sampling, ablation studies, failed runs, labor costs, the Zero model... heck, even having the data center to run the GPUs is part of the TCO).

Funnily enough, all these AI influencers had to do was read the paper, as DeepSeek researchers were clear on this: “Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data.

But does this invalidate DeepSeek's results?

No. DeepSeek R1 is still a clear blow to OpenAI, Anthropic, and Google, which approached reasoning models as a system of various components (generators, verifiers, Monte Carlo search, symbolic engines, and so on). They may be overspending in that regard.

However, DeepSeek's results do not attack GPU spending. In fact, they tell everyone to focus solely on ways to unlock more compute, the most prominent bullish signal for NVIDIA you could ever wish for if you were Jensen.

On the topic of DeepSeek R1, Jay Alammar has released an insightful and visual explanation of how R1 was trained. Please feel free to check it here.

CHINA
Anthropic CEO Calls for Cold War Escalation

Dario Amodei, Anthropic’s CEO, has published a blog to expose his thoughts on the DeepSeek topic. In essence, he claims Chinese models are not on par with Western ones, putting them at least 7 months behind at best. Yet, he demands much stricter GPU controls to prevent these companies from competing, as he considers compute as the key to determining who prevails.

TheWhiteBox’s takeaway:

If I were an Anthropic investor, I would be very worried because this blog post is just pathetic and should raise all the alarms about the CEO’s confidence in their company's future without the power of the US Government behind it.

In simple terms, while he officially says that we need to increase GPU export controls to prevent China from beating the US, he’s really saying, "As long as China manages to release models almost as good as ours for free, we don’t have a business model.

But here’s the thing: I could not care less. It’s not my problem if you can’t make money based on your models because you decided to closed-source them in the first place. US companies should be much more thoughtful about their AI strategy and, just like Meta, should be releasing models always in open-source form so that the open-source ecosystem in the US flourishes and prevents China from being the main source of innovation in the industry, which is unequivocally the case right now and, of course, it’s the main driver behind China’s progress.

That’s not China’s fault, my friend, that’s on you.

You can’t beat open-source, and if you’re an Anthropic investor, congratulations because you have invested in a company that, in the words of the CEO, 95% of its value is access to compute.

In a way, you are paying a huge markup for a bunch of GPUs with an Anthropic sticker. Instead, what Anthropic should be doing (and so should OpenAI) is releasing their models in open-source form and luring people into their platform, creating goodwill around them.

However, they are playing a game they can’t win without regulatory capture or playing geopolitcs, as open-source will take their billion-dollar efforts and make them free in months before they see any meaningful return. I mean, even Microsoft is doing them dirty, releasing o1 models for free!

Similarly, Mark Chen, OpenAI’s Frontier Research Lead (the man behind o-type models, basically) congratulated DeepSeek but also argued that the results were exaggerated (and quite frankly, they are) and that they were “happy” to see how most of the innovations they had found internally DeepSeek had realized and published.

Now, my dear reader, do you realize what he’s implying? He’s saying that, despite these innovations existing in the US already, it’s thanks to a Chinese company that such knowledge has been democratized to the world, not to them. OpenAI, a company that exists thanks to open-source (embeddings, attention, Transformers, mixture-of-experts, speculative decoding, and a long tail of other crucial aspects of current LLMs/LRMs, are ALL contributions from the open-source community), has decided the world should not progress at the level they do.

I never thought I would say this, but it’s becoming clear who this story's villains are. And it’s not the Chinese.

WARFARE
The Pentagon Claims AI is Speeding its ‘Kill Chain’

A TechCrunch article describes how the Pentagon has reported that artificial intelligence (AI) is accelerating its “kill chain,” the process by which military targets are identified and engaged.

By integrating AI into various stages of this process, the Department of Defense (DoD) aims to enhance decision-making speed and precision, thereby increasing operational effectiveness.

TheWhiteBox’s takeaway:

It’s just funny how the same closed-source labs are not opening their research because “it’s too dangerous” (in reality, it’s all about competitiveness) while collaborating with the DoD to turn AI into a tool of war.

Listen, I’m all for it; I genuinely believe AI could prevent countries from deploying humans into the fighting lines, but I just can’t bear with the extreme hypocrisy and word salads tech incumbents go to to justify the unjustifiable; you not making money is not China’s fault, and the AI existential threats are basically zero, so stop gaslighting people and tell everyone that you are simply trying to make money and prevent China from competing because you can’t compete with open-source.

OPENAI
OpenAI Eyeing a $300 Billion Valuation

According to The Wall Street Journal, OpenAI is currently discussing raising up to $40 billion in a new funding round, potentially valuing the company at approximately $300 billion. SoftBank is expected to lead this investment, contributing between $15 billion and $25 billion, with the remaining funds anticipated from other investors. This development would significantly increase OpenAI’s valuation from its previous $157 billion in October 2024, when it raised $6.6 billion. 

If successful, this funding round would make OpenAI the second-most valuable startup globally, behind only SpaceX. As sources clarify, the discussions are ongoing and could still change.

TheWhiteBox’s takeaway:

If these numbers are true, this would introduce OpenAI to a completely new dimension. According to internal sources provided by The Information, OpenAI is projected to generate $12 billion in 2025, assuming it manages to triple its revenues from the previous year.

If that number turns out to be true, that means that OpenAI’s new valuation sets them at 25 times those revenues, which seems kind of acceptable. However, my problem here is answering the question:

What is the OpenAI investor really buying?

From this week’s Chinese drama and Anthropic’s CEO aftermath panic, we can confirm that there is no such thing as a technological moat, and companies like DeepSeek or Meta will continue to play the cat-and-mouse game.

Sure, the $20/month ChatGPT subscription still has appeal, but you must be a diehard OpenAI fan to consider paying $200/month for o1-pro if you can get DeepSeek R1 for free. Sure, R1 is not at the o1-pro level, but it’s 80% of the way by being at least three orders of magnitude cheaper; there are really few use cases that R1, with enough patience, can’t do that o1-pro can.

So, if OpenAI can’t develop a long-lasting tech moat, what moat justifies the investment?

Based on where the money is going, it seems that the moat must be compute, and OpenAI is essentially becoming an infrastructure company like other Hyperscalers. But that’s a hell of a business to be in, competing with behemoths like Microsoft or Amazon.

Unlike these, this company doesn’t have +$100 billion-plus revenues from its non-AI business, generating billions of dollars in free cash flow to sustain the insane investments. Thus, OpenAI’s only way to handle those investments is through never-ending fundraising.

But if so, what’s the appeal for the investor?

The OpenAI investor could simply buy Hyperscaler stocks at a similar Forward PE with better cash flows or directly buy NVIDIA stock and avoid the enormous markup this company charges for its hardware. And all of this would be possible without having to deal with OpenAI’s current transition to a for-profit business, a journey packed with challenges ahead.

Put another way, if AI is a game where the winner has the most access to compute, investing in OpenAI/Anthropic makes no sense. Heck, invest in Google, which has the largest compute available by far and a top-3 AI research lab in Deepmind.

Could talent be the moat, then?

Certainly, talent is critical, but if this week has proven anything, talent is very diversified across at least six or seven well-funded labs. Key top talent is not a property of OpenAI; far from it. Most of their original star researchers have left (Ilya Sutskever, Edward Hu, John Schulman, Andrej Karpathy, Mira Murati, Jan Leike, Tim Brooks, Alec Radford…, etc.), some of them to rival companies. I wouldn’t call this a long-lasting moat.

Therefore, in my view, all roads lead to two options that may justify investing in OpenAI:

  1. Brand: If OpenAI maintains its leadership (at least to the public because its technological leadership is more than contended by now), investors could still see a very successful IPO from this company that grants them a good exit.

  2. AGI singularity: If AGI is truly a singular moment, a moment where the creation of a model leads to the AGI era instead of being a progressive arrival, then the company that builds that ‘thing’ wins, period. OpenAI has a decent chance of being that company, that’s for sure.

However, neither of these two sounds practical.

As mentioned, the first is already highly disputed. The sentiment toward OpenAI is not good based on its closed nature, and its leadership is very much in question among tech-savvy groups. As for the latter, I don’t think AGI will emerge from a singularity moment; it will be a progression of improved models that slowly but steadily lead to a new world.

All in all, this is a very long way of saying that it’s not about the investment number, but it appears to me that investors in AI frontier labs are simply subsidizing massive GPU CAPEX investments at a hefty premium.

I just don’t get it. And I think no one better than DeepSeek’s CEO to summarise this:

"In the face of disruptive technologies, moats created by closed source are temporary. Even OpenAI’s closed-source approach can’t prevent others from catching up. So we anchor our value in our team — our colleagues grow through this process, accumulate know-how, and form an organization and culture capable of innovation. That’s our moat."

OPEN-SOURCE
Alibaba Ups the Ante

While DeepSeek is stealing the spotlight, one of the best Chinese AI labs continues to deliver impressive models, and that’s Alibaba. Now, they have presented Qwen2.5-Max, their state-of-the-art open LLM that reaches comparable or superior performance to that of GPT-4o, Claude 3.5 Sonnet (so much for Dario’s cries that this is not the case as seen above), Llama 3.1 405B, and, of course, DeepSeek v3.

The great thing about Alibaba’s models is that they are accessible through their cloud (provide data at your own risk with any cloud platform you interact with) and their APIs and libraries are OpenAI-compatible, meaning they mimic the structure of the latter’s APIs. In other words, switching from your OpenAI models to Qwen is very easy code-wise.

TheWhiteBox’s takeaway:

No matter how much Silicon Valley tries to gaslight us, the truth is that, in terms of LLMs, open-source is on par with closed models. While this is more debatable regarding reasoning models, as R1 isn’t quite as ‘smart’ as o1, let alone o3 (although we can make a powerful case that it’s more efficiently intelligent), there are really no doubts at the LLM level.

Thanks to DeepSeek V3, this was already well-known, and Qwen’s new model confirms it further.

Who would have thought three years ago that open-source could remain competitive only thanks to China (and honorable mentions in the US like Meta or the Allen Institute)?

OPEN-SOURCE
Replicating Zero for $30

A group of researchers from UC Berkeley has replicated the results DeepSeek obtained on Zero on a countdown game (a game where players combine numbers with basic arithmetic to reach a target number).

The model autonomously learned several reasoning priors like self-verification or search, all just for under $30, proving that DeepSeek’s recipe works.

TheWhiteBox’s takeaway:

So, you may ask, what’s DeepSeek’s recipe?

The biggest takeaway from DeepSeek’s recent research was that we do not need fewer GPUs; on the contrary, compute is what really moves the needle.

In layman’s terms, instead of adding several expensive post-training stages on top of the LLM to teach it the necessary reasoning priors, DeepSeek built a dataset of verifiable tests (tests where the answer can be verified as correct/incorrect) and just let compute do the talk.

In their R1 research, DeepSeek proves that, with enough compute, models develop the reasoning priors like self-correction, verification, search, and thinking for longer, all staples of what human reasoning is by itself (again, this is why DeepSeek’s research is bullish on NVIDIA, not vice versa).

Moreover, the overarching conclusion here is that there are no tech moats in AI and that what really sets you apart is how much compute you have.

THEWHITEBOX
Premium

If you like this content, by joining Premium, you will receive four times as much content weekly without saturating your inbox. You will even be able to ask the questions you need answers to.

Until next time!