THEWHITEBOX
TLDR;

Welcome back! Today, we take a look at announcements from OpenAI and Anysphere, but this week is less about names and more about trends, such as:

  • Leading labs embracing adaptive-compute models,

  • progress is increasingly tied to human-in-the-loop effectiveness,

  • And venture investment is flowing toward teams that train their own systems rather than wrap others.

At the same time, frontier economics look more strained than ever, with rising inference costs and aggressive price pressure from China, which isn’t making things look any better.

ANNOUNCEMENT
Coaching Calls

Additionally, I’m officially launching one-on-one calls to discuss any AI topics you desire, scheduled at your preferred time. These calls cover a range of topics, from product strategy to company and sector due diligence, tailored to your specific needs.

Please note that my time zone is GMT+1, so US mornings and evenings in Asia are ideal.

Book your call today below!

THEWHITEBOX
Things You’ve Missed for not being Premium

On Tuesday, we took a look at all the recent drama Burry and other AI hawks are causing, how JPMorgan is adding fuel to the fire, some very impressive results by AMD, Anthropic’s first infrastructure deal, and an analysis of a group of investable companies that play a crucial role in AI but almost no one knows about.

MODELS
OpenAI’s Release of GPT-5.1 Holds some Surprises

In a surprisingly low-key fashion, with very little benchmark data and a very small model card, OpenAI launched GPT-5.1 yesterday. The biggest improvements come from better instruction following and, more importantly, better adaptive thinking.

As you can see in the above graph, the GPT-5.1 models (Instant and Thinking) have an improved capacity to identify request complexity and think for longer or not on the task. They think much faster for more manageable tasks, and will think for much longer on the more complex tasks.

In case you don’t know, much of the improvement in AI models over the last year has been an increase in ‘test-time compute’, the amount of compute allocated to the task. Models remain largely unchanged in structure, they just “think for longer” on the task.

TheWhiteBox’s takeaway:

We still have no updates for Pro and Codex (no GPT-5.1 Pro, no GPT-5.1 Codex model), which I assume will come soon and could be the actual release (this looks more like a small taste update to improve their non-thinking models, which had become way worse than competitors).

Moreover, I believe OpenAI is abandoning the smart router (or, more accurately, not being the sole method for model variations) and moving towards hybrid models, such as those developed by Anthropic or Google.

Hybrid models, not to be confused with hybrid architectures, are models that autonomously decide how long to think on a task. Previously, OpenAI had a smart router that would process your request and decide whether it should be answered by thinking or non-thinking models.

Based on the fact that they refer to GPT-5.1 Instant (a singular model) as an adaptive model, it suggests that they are also using hybrid models (or adaptive models, as they are called). It turns out, OpenAI was wrong.

AGENTS
Claude’s Robot Effort

I think this is a video everyone should watch, because it perfectly illustrates what AI is today and, perhaps more importantly, what it isn’t.

In this one-day experiment, two teams, one with access to Claude and the other as “Claude-less,” were tested to see which one fared better in configuring and running a quadruped robot to perform a fetch task.

The differences are quite palpable, and the Claude team outperformed the Claude-less team, especially in areas like setting up the connection to the robot, where AI excels, being more than two hours faster in total:

However, as you can also see, in some cases the Claude-less team was faster, which serves as a reminder that sometimes AIs can send you in the wrong directions (especially as models are incentivized to optimize toward high frequency responses, which doesn’t mean better—I’m a profound believer in restarting tasks with models if we run into rabbit holes).

But the overall trend is clear: The Claude-less team was, in general, significantly behind and even required external help to decide on the right strategy to continue.

Thus, the reasons this video is important are that it perfectly defines what AI is today: an accelerator tool, one that augments humans, not substitutes for them.

TheWhiteBox’s takeaway:

The bottom line is dead obvious to me:

  • Is pretending current AIs aren’t closer to tools than actual autonomous software stupid? Yes.

  • But is pretending that these tools aren’t great enhancers for humans even more stupid? Absolutely yes.

Pretending that AI is something that isn’t, vastly exaggerating what it can really do (in fact, the Claude team itself couldn’t help but pregonate this hyped future in the video), is common these days because, well, giga-scale data centers aren’t built for free.

But acting as a Luddite and ignoring AI progress, at a time when it’s absurdly clear that using AIs on the right tasks is much better than not using them, won’t bode well for your future.

As the mantra goes, you aren’t being substituted by an AI, but you’re definitely getting substituted by a human using AI.

Book a 1:1 Coaching Call with Me

Book a 1:1 Coaching Call with Me

This is a 1:1 coaching session in which we'll cover your concerns over the AI industry, on your terms and on the topics of your choice. The only thing you have to do is book the call on my calen...

$249.99 usd

FUNDING
A New Wave of Funding for AI Labs

If you thought OpenAI et al had dried interest in VC funding for new AI Labs, think again, as we are still seeing new AI Labs popping up almost monthly and raising staggeringly large rounds.

The issue is that I believe the name ‘AI Lab’ is a misnomer, and the AI Labs were just ‘AI startups’ all along. My point is that there was never going to be a place for non-AI Lab startups, the famous AI wrappers building on top of models from other companies.

Those ‘AI application’ startups have the same ‘moat’ as me having the capacity to breathe oxygen. What builds your moat is owning the AI and building models that outcompete others in your area.

That is why all these ‘new AI Labs’ are not generalistic approaches like OpenAI or Anthropic, but instead focus on particular niches. Even traditional “AI application startups” like Anysphere (Cursor) or Cognition Labs (Devin and Windsurf) are essentially AI Labs at this point, releasing custom, proprietary models based on the data they’ve gathered over the previous two years.

Of course, these labs are built on top of open-source foundation models originating from generalist Labs (mainly in China), but what makes them unique is that they actually train AI and run their own AI models.

This is why I believe we are experiencing the arrival of the real AI startup ecosystem, the one that will actually create long-lasting businesses, as the first generation gets obliterated (unless they manage to make the transition to the second generation, as Anyshere or Cognition).

Importantly, this also applies to AI adopters. Amongst those companies and employees that are leveraging AI tools to be better, we also have first and second-generation approaches:

  1. First generation: Prompt engineers and companies that simply build on top of proprietary models, and mostly fail.

  2. Second generation: Employees and companies that actively fine-tune models on their given task, and will succeed.

As I always say, the first generation is like hiring a junior worker, giving them a book of instructions, a pat on the back, and letting them run into a wall immediately. On the other hand, the second generation of adopters is onboarding their AIs to automate the tasks they want them to perform, giving them actual ‘on-the-job’ training.

Guess who’ll be more successful?

Bottom line, if you’re a VC funding startups with no intention of fine-tuning models internally, you’re going to lose everything.

FRONTIER LABS
OpenAI is Setting Cash on Fire

OpenAI reportedly spent $3.77 billion on inference (running its models) during calendar year 2024, and had already spent $8.67 billion by September 2025 for that year, according to Edward Zitron, who claims access to leaked documents.

Meanwhile, according to documents viewed by the author, OpenAI’s revenue for 2024 is implied to be at least $2.47 billion, considerably below earlier public estimates of ~$3.7 billion.

The article argues these figures suggest that running large-scale AI models is far more expensive than generally recognized, and that the cost burden for frontier AI developers may be heavier than their current revenues allow.

TheWhiteBox’s takeaway:

Nothing new under the sun, but something that helps us remind ourselves that many of the numbers these companies share aren’t to be trusted, and we should instead look at the funding rounds to realize the obscene amounts of cash these companies go through.

The problem remains the same: where are the revenues?

VENTURE CAPITAL
AnySphere Lands Huge Funding Round

Anysphere, the company behind the very popular coding tool Cursor, has announced today a new funding round, raising $2.3 billion at a valuation of $29 billion.

Interestingly, the new round includes the participation of Coatue, NVIDIA, and Google, while announcing they’ve officially crossed the $1 billion in ARR (Annualized Recurring Revenue) milestone.

TheWhiteBox’s takeaway:

I once called for Cursor getting acquired (and from what I’ve heard, it was a close call with OpenAI), but it seems these guys have managed to go from a pure application-layer company to a fully-fledged AI Lab with a strong coding product.

Well done, that is the way.

Additionally, their newest model, Composer 1, is actually delightful to use. It really feels frontier and *cough* trained on Chinese foundation models *cough*.

It’s also interesting to see NVIDIA and Google in the picture. As for the former, with frontier labs investing in getting their own ASICs (ad hoc chips), it’s crucial for them that new AI Labs like Anysphere emerge as valid counterparts that could eventually even build their own large-scale infrastructure, as OpenAI, xAI, and Anthropic are doing.

In fact, don’t be surprised if a decent portion of these newly raised funds goes into NVIDIA chips, closing ol’reliable ‘NVIDIA circle’ where NVIDIA gives money to company x, and company x buys NVIDIA chips with that money.

As for Google, they now own DeepMind, a significant portion of Anthropic, and are also Anysphere investors. However, the deal doesn’t make much sense at all, unless… Anysphere is going to start using TPUs, too.

It’s now clear as day that Google is launching a TPU business (letting third parties buy TPUs), so it doesn’t seem too far-fetched to assume that they are leveraging this deal and Anthropic’s recent $50 billion buildout to start generating demand for these chips. We even have rumors that Meta’s Hyperion giga-scale data center will be largely built with TPUs.

I don’t want to get ahead of myself, but you guys love predictions, so here’s one:

If Google lands on its feet in the TPU business, Alphabet will be the most valuable company in the world by the end of 2026, if not earlier.

I’m probably wrong of course (predictions are rarely correct), but this is how I see it:

  • Totally verticalized from chip to application,

  • best AI Lab on the planet (not even close, actually) in DeepMind. Frontier LLMs, best video and image models. Much more exposed to other areas of AI (climate, biology,… you name it). They have projects that no one else has, like Sima 2 (see below) or Genie 3.

  • Big bets on Anthropic, now Anysphere, and others, hedging their bets.

  • Largest compute pool on the planet, and most efficient data center management systems (lowest PUE by far).

  • High-margin cash cows in search, which is surprisingly growing, with AI search gaining a share to rival ChatGPT and YouTube.

  • The cloud division is expected to achieve a very high CAGR, and margins may improve significantly if more customers start utilizing TPUs.

  • Huge moonshots with a strong likelihood in Waymo, Isomorphic Labs (AlphaFold), and now a chip-selling business to compute with NVIDIA

  • Even owns ~7-10% on SpaceX, valued at $400 billion

Name me a better outlook for a company; I’ll wait.

WORLD MODELS
Google Launches Sima 2

DeepMind has introduced SIMA 2, a new AI agent designed to operate and learn within complex 3D virtual environments.

Powered by the Gemini family of models (I assume Gemini 3.0), Google claims the system marks a shift toward agents capable of reasoning, planning, and collaborating with users rather than simply following scripted instructions.

The agent processes visual scenes (generated by Genie 3, the ‘world creator’ we discussed in the past) and natural-language commands, converts them into actionable plans, and executes those plans through a virtual controller.

Its training combines human demonstration data with synthetic labels produced by Gemini itself. After this initial training, SIMA 2 continues to refine its abilities through self-directed exploration, generating its own experiences and using them to improve its performance.

TheWhiteBox’s takeaway:

The primary purpose of this is to increase the amount of data available for training models, particularly in areas such as robotics.

The key is that by dropping agents into a ‘world’, they can experience the consequences of their actions and learn from them (something that would be too dangerous and expensive in the real world, especially when considering the complexities of humanoids).

The next step is what we call the simulation-to-real transition, in which we ‘drop’ the model trained in simulation into its embodiment and into the real world. Of course, this will only work if and only if the simulated world shares a resemblance with the real world.

Sim-to-real is the standard approach in robotics today.

Furthermore, Google isn’t stupid and is also positioning itself to be a leader in industries where video generation will be important, such as Hollywood or gaming. I suppose we can add this to the list of reasons I wrote above as to why I believe Google is ahead of everyone else.

COMPUTER USE
ByteDance reveals $1.30/month agent.

ByteDance’s cloud unit Volcano Engine has introduced a new AI coding assistant called Doubao-Seed-Code, entering China’s AI tooling market with an aggressively low price.

The service is being offered at an introductory rate of 9.9 yuan (~US $1.30) for the first month (below many competitors), while the ongoing standard monthly fee will be 40 yuan (~$5/month, four times less than Cursor or ChatGPT Plus).

According to the company, the model achieved top performance on the SWE-Bench Verified benchmark, placing it alongside major systems such as Claude Sonnet 4.5 from Anthropic.

TheWhiteBox’s takeaway:

There’s way more than meets the eye here. China is less concerned about revenues and profits than about building AI superiority. Therefore, their open source doesn’t stem from their desire to democratize access to AI, but has always been a geopolitical maneuver to counteract US GPU export controls.

The reasoning is simple: if they aren’t letting us compete in computing, we will make their AI software a free commodity, and we will outcompete their products with severe price drops that no US company, except perhaps Google, can manage.

China knows that US labs' kryptonite is the lack of revenue, so they are going to ensure that remains the case.

On a final note, please be cautious about interacting with Chinese web services; your data is probably not secure at all. Instead, I recommend you access models through US providers like Groq (as Chinese models are always open source).

FRAUD
Waymo Goes Highway

Waymo, Google’s autonomous driving car project and the one with the most significant number of autonomous miles on the planet by a landslide (see image below), is finally offering its car on the freeway, a considerable challenge for autonomous cars due to the more unpredictable nature of freeways.

Tesla FSD has been navigating freeways for a long time, but they are never a fully autonomous drive (until the emergence of the RoboTaxi, but that’s a very recent effort).

But is the comparison between Tesla and Waymo a fair one? The answer is actually no, for neither.

TheWhiteBox’s takeaway:

I suggest you take a look at the video in the tweet, because personally, I’m not that impressed at all. Just look at the image below; what kind of maneuver is this?

Moreover, whenever I talk about Waymo and Tesla, I always feel compelled to explain that they are actually very different approaches to the same problem and not remotely comparable.

Waymo uses LiDAR. LiDAR is a sensing method that measures distance by illuminating a target with rapid pulses of laser light and analyzing the reflected signals. By timing how long it takes each pulse to return, it calculates precise distances to surfaces in the environment.

On the other hand, Tesla uses cameras as the sensorial component of their AI. In both cases, the underlying engine is an AI model, but the way this model behaves—more specifically, how it receives input—is profoundly different.

The bottom line is that Tesla’s approach is less reliable in edge cases, making it less autonomous than Waymo’s in a pure side-by-side comparison. However, Tesla’s FSD is more general.

What do we mean by ‘edge case’ and why is that a problem for Tesla’s cameras?

For example, cameras can be ‘dazzled’, meaning they can have inferior visibility in situations with intense sunlight glare, deep shadow, nighttime darkness, fog, heavy rain, or dust.

As we’ve always discussed, in AI we have the depth vs. breadth trade-off, meaning a generalistic approach can be applied in many more places, but lacks the accuracy of depth models in that particular task (or city, in this case).

This entails Waymos have to be carefully expanded to new locations progressively. And while Teslas aren’t overfitted to a given city like Waymos (i.e., they don’t know the city by “heart”), meaning that you can deploy them basically anywhere in the US, and it works, they severely struggle with edge cases like the ones shown above.

Bottom line, Waymo is more autonomous but less general, Tesla is more general but less autonomous.

Which is better? In all honesty, I would choose a Waymo over a Tesla Robotaxi today, but I would buy a Tesla before a Waymo because it serves me in more situations. However, if we’re looking for a pure autonomous play, edge cases aren’t that bad when dealing with a customer support agent, but I can get literally hurt in an autonomous driving edge case accident, so Waymo takes it there.

Nonetheless, there’s a reason humans carefully monitor most robotaxi rides to this day. The jury is out on this one, and each has its merits.

  • Tesla gathers driving data at speeds no one else can even fathom, but a question remains as to whether they’ll eventually smooth out the edge cases.

  • Waymo delivers secure autonomous driving, albeit with a painfully slow expansion, which may indicate a more autonomous future. However, if Tesla resolves its issues soon enough, it’ll be too late for most cities already full of Tesla RoboTaxis.

Luckily, this isn’t a winner-takes-all market, and other players, such as NVIDIA/Uber partnership, will have a say. Therefore, my overall bet is that both Google and Tesla will generate a substantial amount of revenue from this.

Bias disclaimer: I am an Alphabet shareholder, so I own “Waymo stock” as a subsidiary of Alphabet Inc. I am not a direct Tesla shareholder, but I am indirectly exposed through Vanguard S&P 500 index funds.

ADOPTIONG
Anthropic Launches Use-case Library

Anthropic has launched a library of use cases for its products, featuring over 45 options, including creating brand assets and process flowcharts, such as the one above.

The library includes examples for each to help your imagination, including actual artifacts you can see and interact with (pieces of code that expose what Claude created). It’s free.

TheWhiteBox’s takeaway:

Really nice initiative. It doesn’t hurt to know how the creators of the models we use every day propose using models. The biggest reason is not due to a lack of imagination, but rather the data distributions.

The individuals who wrote these use cases have access to Claude’s training distribution (i.e., they are familiar with the data on which Claude was trained). Therefore, you are much more likely to be successful with the use cases they recommend than improvising your own.

For instance, GPT-5 may be better at bar graphs and Claude better at Sankeys simply because they were trained more on them. However, you don’t know that, so these recommendations provide you with insight into Claude’s training data indirectly.

I tried using GPT-5 to see if it was a match for what Claude did in the previous flowchart example, but for a research paper I’m reading. And as you can see below, it's not particularly great (although it must be said this was a one-shot; it’s unclear how many tries Anthropic’s team took).

Closing Thoughts

A week full of interesting news worth talking about, with all major AI Labs (except for xAI, although they’ve raised $15 billion at a $200 billion valuation) making announcements.

So, what are the takeaways?

  • Model development is shifting toward adaptive-compute systems that determine how long to think, replacing external routing and making inference strategy—rather than model size—the primary driver of improvement. OpenAI was the odd one on this, but it seems to have followed suit with Google, Anthropic, and xAI in adopting adaptive models.

  • AI proves once again it’s a human accelerant rather than an autonomous agent (for now). The Claude robotics test unequivocally shows that AI empowers humans, but still misleads or stalls without human judgment, making the human’s presence mandatory.

  • The typical pattern across all newly funded AI Labs is that they are all building their own models and applications. Durable moats now come from training models, not wrapping them (actually, that never was a moat in the first place).

And, as evidence of this, we have the funding momentum moving behind vertical, niche AI labs, which suggests that the market has shifted from first-generation prompt-based companies to second-generation model-owning companies.

Finally, it’s also worth noting that frontier economics not only do not improve but keep deteriorating. Leaked OpenAI inference costs highlight a widening gap between compute spending and revenue, revealing that business viability is a more urgent challenge than public narratives suggest… all the while, Chinese Labs seem committed to making the business of making money in AI an impossible one, offering ultra-cheap tools positioned as strategic pressure on US labs’ revenue models.

See you on Sunday!

For business inquiries, reach me out at [email protected]

Keep Reading

No posts found