In partnership with

THEWHITEBOX
TLDR;

Welcome back! This week, we have news across a wide spectrum of topics, including NVIDIA’s record-setting milestone, GSI Technology’s historical stock surge, and the first-ever Western commercially available humanoid, Neo.

We’ll also take a look at a recipe that might solve continual learning, a giant step by two very hot AI startups, and consulting’s “Kodak moment”.

Enjoy!

THEWHITEBOX
Things You’ve Missed for not being Premium

On Tuesday, we covered interesting news from OpenAI, NVIDIA, Qualcomm’s new AI servers (which hide a surprise), and Alibaba, the latter of which may not be welcome news for investors because they claim a huge 82% drop in GPU count for the same performance. Problems for NVIDIA?

Additionally, we took a look at Elon Musk’s new tool in its crusade against the left, Anthropic’s new Excel add-in, and the next big potential AI IPO, this time being the first AI model company.

Simplify Training with AI-Generated Video Guides

Simplify Training with AI-Generated Video Guides

Are you tired of repeating the same instructions to your team? Guidde revolutionizes how you document and share processes with AI-powered how-to videos.

Here’s how:

1️⃣ Instant Creation: Turn complex tasks into stunning step-by-step video guides in seconds.
2️⃣ Fully Automated: Capture workflows with a browser extension that generates visuals, voiceovers, and call-to-actions.
3️⃣ Seamless Sharing: Share or embed guides anywhere effortlessly.

The best part? The browser extension is 100% free.

FUNDING
Consulting’s Kodak Moment?

Public consulting companies are having one hell of a year, but not in a good way. With a severely flawed AI narrative and the US Government going berserk and canceling public contracts (particularly at Booz Allen, which is leading to layoffs), the markets are starting to price in the new reality of consulting: they are being devoured by AI.

This is something we have touched on multiple times in this newsletter: knowledge-based white-collar jobs are less valuable with AI, they just are. That is, if your job is to productionize knowledge —i.e., make knowledge reach your client faster —that’s precisely where AI delivers.

AI doesn’t shine in many enterprise areas these days, but this is certainly one (alongside software, as we discussed on Tuesday with Bain’s very impressive due diligence story).

TheWhiteBox’s takeaway:

I’m not the type of guy who will tell you ‘post-labor economics is coming’ or ‘AI will destroy all jobs’ because I don’t believe that at all. Time and time again, industrial revolutions don’t eliminate jobs; they transform them.

And one of the jobs about to be transformed is management consulting. Now, does that mean we won't need human consultants?

Of course not, but this industry has been on a steady decline in ‘provided value’ over the years. Once crucial for insights, it’s now a suit-up scam where fresh-out-of-college 20-year-olds from Ivy League schools tell Norah, a top executive with 30 years of experience in the field, what she needs to do.

Both counterparts know this is a scam, but these young management consultants can process a lot of information in a very short amount of time by being modern slaves (trust me, I was one), so you get the value from the exploitation that comes from hiring these guys, not because they have valuable things to say.

Thus, consulting has become a pay-for-hours game more than a pay-for-insights game.

And that’s the issue: AI makes the value of hours in knowledge-based tasks worth little because GPT-5 Pro will do a better job searching the Internet in ten minutes than Jamie from MIT working at McKinsey in a day.

Funnily enough, McKinsey consultants are the first to leverage AI to do the dirty work, but the client knows that, so the actual value the client perceives they are getting is much smaller because, well, “it’s all mostly ChatGPT wrapped in a Brioni suit”.

To put it another way, consultants aren’t being disrupted just because, but mainly because their value has long not been about providing industry value, but putting in endless hours for a low-paid job (low when you factor in real paid hours).

With all said, there will be room for consultants with insightful things to say, boutique firms that know what they are talking about, and can search beyond what ChatGPT can see, but that’s clearly not the current state of the industry, and they are paying the price.

HARDWARE
GSI Technology Stock Surge, Explained

A few days ago, a lesser-known AI company, GSI Technology, had one of the best recorded days for a stock in history (percentage-wise). At one point, the stock had tripled in value, and it is up 215% for the month and 255% for the year.

The question is: why?

And the answer is a study from Cornell that independently validates the thesis of their proprietary APU chip as an alternative to NVIDIA’s GPUs, achieving a 98% decrease in energy costs, an incredible result. The short answer to why this happens is that they employ a technology known as “in-memory computing.”

In GPUs, compute and memory are totally separated. So, once the processing cores have calculated a value, such as a mathematical multiplication, this value may have to travel to memory for future use, which takes some time.

The amount of data you can transfer depends on the memory bandwidth, measured in Terabytes per second for modern GPUs (or trillions of bytes per second, thanks to HBM).

Thus, unless you somehow overlap compute processing and memory transfers (which you can and do, especially during training, but it is very hard), GPU cores are essentially idle during these transfers.

Importantly, during inference, these memory transfers become the main bottleneck, and you essentially see GPU utilization (GPUs don’t make money when they are moving data) fall off a cliff.

Therefore, as the name suggests, “in-memory computing” intends to avoid this travel by computing in the memory cell itself. For those of you who have been around for a little longer, they may sound very similar to the analog computer we discussed a few months ago, where we could manipulate the properties of wires and resistors to perform computations in the same place where we store the answers.

For example, using Kirchhoff’s Law, which states that the total current entering a wire junction equals the total current leaving it, you can get an automatic summation by simply connecting the multiplication cells to the same wire.

In this case, however, we don’t quite know how they are doing the multiplications in memory. Still, the point is that, by arranging memory cells in specific ways, they can create logic circuits for addition and multiplication inside the memory itself, and this is cheaper and faster than modern GPUs because it saves energy by not having to move data at all.

TheWhiteBox’s takeaway:

This is pretty fascinating stuff, and the narrative is excellent: “We know AI is facing the inevitability of an energy bottleneck, and a public company offers a way out.” But one thing is to say something, the other is to actually sell the thing.

Also, one of the main areas where this architecture shines is retrieval, whichi is unclear whether it will become a key component of modern inference workloads (I think it will, but there’s a non-zero chance models simply have the entire context in the context window, requiring no retrieval).

Beyond this, does this warrant the stock almost quadrupling in value? Obviously not; this is just a technological promise that has yet to deliver not only revenues, but also a commercially available product. Thus, this is just pure speculation, but what isn’t speculation in the market these days?

On a final note, yes, there’s clearly a lot to be said about energy in AI, especially when trying to guess where we’ll stand in a few years. On Sunday, we’ll talk more about it.

HARDWARE
NVIDIA Makes History… Again

For a short while, NVIDIA crossed another historical milestone: the $5 trillion market cap, the first time in history (they were also the first to cross the $4 trillion mark).

For reference, it took NVIDIA 25 years to reach the $1 trillion mark and only around three months to go from $4 trillion to $5 trillion, representing truly unprecedented levels of shareholder value creation. At the time of writing, the company sits right below that historical mark.

TheWhiteBox’s takeaway:

But is this just unprecedented bubbly behavior? As incredible as it may sound, if you look a little bit into the future, the company doesn’t even look overvalued at all compared to the other members of the S&P500 index.

I’m not kidding.

The reason is the $500 billion in bookings over the next five quarters that Jensen mentioned during the GTC Keynote. He’s essentially announcing that the Blackwell and Rubin (next-generation) GPUs will generate $400 billion in revenue alone in 2026, putting NVIDIA at the same revenue levels as Apple, but with significantly better margins.

They currently sit at ~50% net margin, meaning they would generate $200 billion in profits in 2026, which would put the $5 trillion at a forward P/E of 25, just slightly higher than the index’s average.

I’m reasonably certain margins with Rubin will fall below 50% unless NVIDIA passes the higher HBM and packaging costs to clients.

So, as incredible as that sounds, NVIDIA may be as expensive or cheap as any other average S&P stock, even while commanding a higher value than all economies on the planet besides the US and China.

The funny thing is that everyone puts NVIDIA at the epicenter of the AI bubble when, in fact, if there’s a company that is not in a bubble, it’s these guys. Sure, they are ripping the fruit of bubbleness from the extreme CAPEX of other Big Tech companies, but they are at least generating the cash, so don’t blame the player, blame the game.

What I will say is that the concentration is very, very concerning. For instance, the top ten companies by market cap in the S&P500 weigh around 40% of the index, which means every ten dollars you invest in your Vanguard S&P500 index, four go into those ten companies.

Worse still, yesterday we had our worst day since 1990 in terms of breadth, a measure of how many individual stocks are advancing versus declining, in this case, on a daily basis. 104 were up, 398 were down:

At this point, are we index investors or Big Tech investors pretending to be diversified?

MODELS
Has Thinking Machines Lab solved continual learning?

Thinking Machines Lab, a superstar-packed AI Lab based in the US, has published a blog post describing a potential new approach to continual learning.

Continual learning, considered one of the holy grails of AI, is a method by which AIs could theoretically learn forever. This is not the case today, as most AIs have a clear separation between training and inference; once trained, the model does not learn anything new unless it is retrained.

To mitigate this problem, we rely on in-context learning, which enables AI models to learn on the spot from data provided as part of the user’s prompt. The issue is that this learning is not parametric; the model does not encode it, so the moment that information is no longer in the model’s context window (the data it can process at any given time), it immediately forgets it.

Of course, the solution would be to retrain the model repeatedly, right? Sure, but this isn’t free. Whenever you retrain a model on new data, it’s guaranteed to ‘forget’ data that is not present in the new training distribution.

In layman’s terms, if the model’s initial training included basketball, baseball, and soccer, and the new training distribution does not mention basketball at all, the model will likely forget it over time unless basketball is reintroduced to the training menu.

This is known as ‘catastrophic forgetting’ in AI parlance.

So, while there’s a clear incentive to retrain models so that they can acquire new skills, or even a model that is continuously updating its knowledge and “beliefs” as Rich Sutton would tell you, this usually comes at a deal-breaking effect of not only forgetting important factual stuff, but the models can forget even basic skills like instruction-following.

For example, a simple fine-tuning of a model on a particular dataset increases the performance on that data tremendously (left)… at the expense of a total collapse in the model’s capacity to follow instructions, something that wouldn’t have been obvious prior to training.

So, we are at a crossroads: we want to increase the model’s knowledge over time, but we can’t without often killing performance to unjustifiable lengths. Here’s where TML proposes ‘on-policy distillation’, a result of two things: training ‘on-policy’ and using distillation.

Nowadays, it’s increasingly popular to post-train LLMs using Reinforcement Learning (RL), which is essentially a trial-and-error learning approach. Instead of making a model imitate a specific behavior, we incentivize it to find it, leading to better performance.

However, RL also implies two ‘bad’ things:

  1. What I call ‘catastrophic narrowing’, where the model becomes “perfect” at the task it was trained on and much worse everywhere else (as shown in the graphs above).

  2. It’s a very sparse training regime, the model takes forever to train because its learning has very low signal-to-noise, meaning it has to allocate an enormous amount of compute for every bit of learning it gets (e.g., an LLM trained with RL may generate thousands of tokens before it reaches one single correct answer).

RL takes the most GPU hours of training by far

On the other hand, distillation is the process by which a student model (the one we are training) imitates a teacher model. Hence, we ‘distill’ the teacher’s skills and knowledge into the student.

This is easier for the model, so the model learns worse (in the same way a student trying and retrying a problem until they solve it is better than that same student copying the solution from the teacher’s blackboard), but it learns faster; we get ‘good responses’, or teacher-like responses from the student model faster.

And what about the term ‘on-policy’? What does that mean?

On-policy training occurs where the model learns from its own predictions. That is, the model itself generates the data by which we update our model (think of this as learning from your own mistakes).

Traditionally, AI has been off-policy, training models either on the training data itself or on data generated by other AIs, but never on the generations executed by the model itself.

So, to summarise:

  1. On-policy training is the most effective way to teach AIs, as they can learn from their own mistakes. We use RL for it, but RL implies catastrophic forgetting and narrowing.

  2. Off-policy learning is faster and cheaper (more feedback for the model), and the model learns new skills, but it’s basically imitating rather than actually “thinking”, which results in models that imitate well but can’t really go beyond the data they’ve imitated.

Thus, TML proposes on-policy distillation as the best of both worlds: a procedure that allows models to learn better (on-policy) while also benefiting from faster learning and retaining important information.

But how? It’s a very simple procedure:

  1. Departing from the base model we want to train, we train it on new data with RL.

  2. This provides a smarter new model, but one that has catastrophically overlooked other aspects. But here’s the key: this is not our model, it’s the teacher.

  3. We then execute a distillation training of the original model (not trained on RL), but trained to imitate the RL-trained version.

As distillation prevents catastrophic forgetting, while imitating a “smarter” model leads it to smarter responses, we can essentially build a continual learning flywheel where we protect the model’s natural knowledge while still allowing it to ‘inherit’ new skills from the teacher.

The progression can be observed in the table below, where we see the model’s performance on the desired benchmark increase from 18% to 41% while maintaining its performance on IF-eval (instruction following), the skill we aimed to protect.

TheWhiteBox’s takeaway:

Continual learning is considered by many, including myself, to be absolutely key to driving further AI progress. So, if TML’s proposal truly scales, we could soon see one of AI’s largest mysteries solved.

Next step? Learning from experience, incorporating real-time feedback, and model updates to AI inference, but boy is that problem still far from being solved.

ROBOTICS
1X Presents Neo

Finally, the US has a commercial robotics humanoid. Called Neo, you can order it for $20,000 (surprisingly cheap considering it’s a US product) or for $500/month.

The video is incredibly impressive, and this “thing” runs on just 200 Watts (consumes 200 Wh with a four-hour battery life). The purpose? It’s a chores robot, a robot that can help you with all things around the house, from washing dishes and doing laundry to simply tidying up your living room.

Additionally, it features a teleoperating system, allowing you to control it (or it can be remotely operated by the company itself, too, not scary at all having some random dude teleoperating a robot in your home, right?).

TheWhiteBox’s takeaway:

Sounds like the most impressive release, probably ever. The problem? It’s simply not real. As Marques Brownlee covers in his personal take on the matter, most of the robot's movements are teleoperated, either by you or by someone else.

Because let’s be serious for a moment. AIs, even frontier ones (they use OpenAI), aren’t remotely general enough to handle this type of open-ended world your home is. They just aren’t there yet. And as Marques mentions, “there seems to be a lost art in finishing products before actually shipping”.

Then, what’s the point?

Simple, data gathering. Generalization emerges from data. They need your home to gather data and eventually train better models. You’re being taken advantage of, with all due respect.

As Andrej Karpathy explained last week, there’s a vast gap between demos and reality in most of AI applications, and no other application signifies this better than robotics. Robotics is, by far, the most complex problem in AI; it’s not even close.

And my gut tells me that we are still several software breakthroughs away from generalizable robots. Is that a year from now? Ten years? Or ten months? I don’t know, but what I can tell you is that it’s certainly not today.

On a final note, however, nobody seems to be addressing another significant concern: cybersecurity. Personally, I’m not buying one of these (being brutally honest, I’m yet to be convinced of the need for a humanoid robot at home) until the AI runs locally.

Having a 100-pound AI robot in your home that can be hijacked over the Internet is not something I’m willing to deal with.

And it’s not only that, it’s the fact that the AI itself, the LLM, is also one of the easiest software to hack.

In this regard, I believe Unitree (which excels particularly on quadrupeds) and Figure AI’s approach, particularly in highly predictable factory/manufacturing settings, seem to be a much more promising avenue than trying to solve the most complex problem I know of in AI today: chores.

SPECIALIZED MODELS
When Deep Beats Breadth

In a single day, two of the hottest application-layer startups, companies that were built on top of AI models from OpenAI, Anthropic, or Google, have released proprietary models.

Cursor has released Cursor 2, the new, more agentic version of the product (less about typing code, more about asking an AI to do it), and Windsurf has released SWE 1.5.

The former includes a Composer model, which they describe as “frontier” and presents pretty compelling benchmark results, while the latter is also a coding model, one that is almost as good as Claude 4.5 Sonnet on SWE Pro (the toughest coding benchmark).

We’ll let the community be the test of whether the claims are valid, but the reason I’m discussing both of these is that I believe they are starting to illustrate what I envision as a true AI startup: one that trains its own models.

TheWhiteBox’s takeaway:

Most current AI startups are just wrappers around third-party models, a recipe for guaranteed disaster. No moat, no competitive advantage, no nothing, just a startup riding rapid initial growth to land valuations from VCs that know no better and that they’ll never grow into.

However, building powerful models for your task is very hard, something other companies can’t simply replicate that easily. Both companies, Anysphere and Cognition (creators of Devin, which acquired Windsurf a couple of months ago), have gathered trillions of data points from users over the past year or so, and they can now use that to build an actual moat.

But the question is: can’t competitors compete by using Anthropic models that offer similar performance?

Sure, but now Cursor and Cognition will provide comparable performance at much better margins, as these models are much more task-specific and thus smaller and cheaper to run, which means that all these two companies need to do next is to drop prices and outcompete rivals when the time comes.

The reason for this is that every industry AI touches will eventually become commoditized, whether that's consulting, as we discussed above, or coding IDEs.

In commoditized industries, price is the name of the game, so companies running specialized, small, and fast models can drop prices significantly more than those using third-party providers.

I was once skeptical of Cursor’s survival chances, but unless OpenAI and Anthropic start dropping prizes like crazy, the data that Cursor has, with the added newfound independence of running internal models, might be enough to warrant them, at the very least, a healthy acquisition.

Closing Thoughts

A week of records this is. NVIDIA’s claim over the $5 trillion mark is certainly hard to fathom for a company once considered a gaming hardware company.

However, this is also a week that helps us see changes taking place right before our eyes: the AI companies, rising above everything and everyone, while the rest, exemplified by management consulting companies, lag behind. The breadth data point, which shows a growing consolidation of the US’s most extensive index into a handful of companies, is much more concerning than I believe most realize.

This has also been a week of hope, hope that we might have found a way to solve the issue of continual learning. The guys at TML are some of the smartest folks on the planet, so their making such a claim is worth noting. Furthermore, companies like Cursor and Cognition are going from future meme-companies to business that could actually endure.

But this has also been a week of hype, with the flamant release of the first Western commercially available humanoid, Neo. But behind all the bells and whistles, I can only see a cool robot that needs a person somewhere in the US to teleoperate it in my personal space. I’m not having it.

Neo seems to be the latest exemplar of this industry’s unwavering commitment to appearing what it’s not. This robot is not autonomous; it’s a long way from that. So why all the hype for it? If it’s a pure data play, just give it to researchers, not scam people into paying $20k for something they are not getting. This performative tendency of most AI startups screams bubble like nothing else.

And it’s making me worry.

For business inquiries, reach me out at [email protected]

Keep Reading

No posts found