TheWhiteBox by Nacho de Gregorio
Posts
AIs Suck at Making Money & the Future of AI Models

AIs Suck at Making Money & the Future of AI Models

Ignacio de Gregorio Noblejas
July 01, 2025

THEWHITEBOX
TLDR;

First and foremost, I apologize for not sending an email on Sunday. I was working all week-end on something I’m much closer to showing you. Stay tuned.

Welcome back! This week, we take a look at an eye-opening research by Anthropic on AI’s viability to run a business, Meta’s poaching spree, several news items about OpenAI, and Google’s latest start-up killer product.

Finally, we reflect on the future of AI models based on the predictions of two highly reputable AI insiders.

Enjoy!

AGENTS
Can AI Run a Business?

In a commendable exercise of self-awareness, Anthropic has carried out a unique study with a real business that an AI (Claude 3.7 Sonnet) was in control of and had to make money from.

And the results speak for themselves: the AI, despite working in a straightforward business, failed miserably. Among the many failures, we had:

Missed profit opportunities: Ignored high-margin sales, like declining a $100 Irn-Bru offer.
Hallucinated critical details: Invented a fake Venmo account for payments.
Poor pricing strategy: Priced items (like metal cubes) below cost without proper research.
Weak inventory/pricing adjustments: Rarely updated prices and failed to account for free competitors.
Excessive discounts/freebies: Gave unnecessary discounts and free items, including a tungsten cube.
Didn’t learn from feedback: Repeated unsustainable behaviors despite acknowledging them.
Identity hallucination: Invented a fictional backstory and confused its own role, possibly triggered by April Fool’s Day.

The model even claimed to be a real person!

TheWhiteBox’s takeaway:

The silver lining for Anthropic, they claim, is that many of these issues could be solved with additional training; however, this is more based on faith than clear evidence.

This study highlights several limitations that make AI a challenging endeavor, particularly when aiming to create agents that act autonomously.

Models continue to heavily hallucinate, bordering on deception, which drops trust to record lows. Hallucinations are not a solved issue and are worsening with the development of reasoning models.
Models still don’t show causal understanding. They are clearly totally unaware of the consequences of their actions. Lying about being in a place it possibly couldn’t be is not tolerable. I do believe that cause-and-effect issues can be solved with trial-and-error training (penalizing whenever those situations take place). However, causality remains an unsolved issue.
Models are easily tricked and jailbroken. With enough effort, you can break these models to deviate from their goals. Refusal is not a solved issue.

And the list goes on. We are making progress, but man, are we early.

HEALTHCARE
Is Chain-of-Thought Thinking dying?

Over the last few weeks, the sentiment toward chains of thought, ‘thinking’ tokens reasoning models generate prior to answering a complex question to help them increase the chances of solving the task, is changing… for the negative.

The reasons seem to be many:

These chains can become unintelligible, with responses from DeepSeek R1 looking like this: “(Dimethyl(oxo)-lambda6-sulfa雰囲idine)methane donate a CH2rola group occurs in reaction, Practisingproduct transition vs adds this.to productmodule. Indeed"come tally said Frederick would have 10 +1 =11 carbons. So answer q Edina is11.” What’s the point of a chain of thought if we can’t understand it?
They don’t always reflect what the model is actually “thinking”, meaning they can’t be trusted to help us understand the model’s internal processes.
It’s a counterintuitive approach to thinking; why do models need to speak to think? Furthermore, a growing number of experts are optimistic about latent reasoning models that also think before answering, but in representation space (thinking in silence before actually speaking, as I explain below).

Additionally, some of these models are absurdly ‘wordy’, generating an insane amount of thinking tokens to answer even simple questions, resulting in a worse user experience, and with models sometimes getting lost in their own thinking traces.

Personally, I do share this negative sentiment toward these models. Don’t get me wrong, they have become my go-to models for a while now (mainly Gemini 2.5 Pro/Flash, and o3/o4-mini from OpenAI), but I simply don’t think they are the final form factor of reasoning models.

The idea of letting models think in silence makes a great deal of sense to me. But what does that mean?

Current models need to speak to think. That is, to continue computing the response, they need to output a word, as if humans had to verbalize their thoughts instead of thinking silently.

Latent Reasoning Models, such as Meta’s COCONUT or a recent model by MIT, Mirage, don’t work that way.

In their case, in the last step before choosing a word to output, they instead take their pre-language prediction and feed it back to the model and continue reasoning.

In human terms, instead of doing what models like o3 do (my brain thinks it needs to say ‘4’ but has to verbalize it in order to continue), latent reasoning models can think the answer might be ‘4’ but instead of verbalizing it, they continue thinking internally.

And only when they realize they are ready to answer, do they.

These models are not only much more efficient, faster, and less ‘wordy’, but they are also better thinkers (allegedly). In another recent research by Meta, they showed that latent reasoners can consider several solving paths simultaneously, unlike current reasoners that are ‘forced’ to choose a solution path to explore as they speak about it.

In other words, models like o3 choose a path in word-space, and decide if that’s the best path. Instead, latent reasoners, while thinking in silence, can consider multiple solutions simultaneously, just as humans can consider several ways of solving a problem at the same time.

The counterargument is that latent reasoners are less interpretable.

As they think in silence and then respond, it’s tough to discern the thought process. Still, as mentioned earlier, chains of thought can’t be trusted, so maybe that isn’t that bad eventually.

I can’t say whether GPT-5, Grok-5, or Gemini 4 will be latent reasoners, but I wouldn’t be surprised if they were.

	Prof G MarketsMarkets

HARDWARE WARS
Zuck’s OpenAI Poaching Spree

Probably the biggest news of the week, Meta has undergone a massive poaching spree of OpenAI researchers, with the number of recruits surpassing 10 people. Some of the compensation packages grow up to $300 million in four years, with 8 and even nine-figure signing bonuses.

Among these, we have:

The entire OpenAI Zurich lead team, specialized in computer vision (Lucas Beyer, Alexander Kolesnikov, and Xiaohua Zhai)
Shengjia Zhao, a deep learning expert who contributed to GPT-4 development.
Jiahui Yu, who joined OpenAI in late 2023 after working at Google DeepMind.
Shuchao Bi, who oversaw OpenAI’s multimodal models.
Hongyu Ren, who was OpenAI’s lead on post-training for its o3 and o4 “mini” models and upcoming open-source model releases .

Others include Trapit Bansal, another key OpenAI researcher, and SSI executives Nat Friedman and Daniel Gross, as well as the multi-billion acquisition of Scale AI and its CEO, Alexander Wang, new Chief AI Officer at Meta.

This news is so in vogue that the Wall Street Journal has published an article showing Mark’s ‘The List’, a list of star researchers to hire from top AI Labs.

TheWhiteBox’s takeaway:

Meta is fully committed to AI, with no hesitation. This is primarily good news because Meta is committed to open-source research, allowing these researchers to publish openly, which will benefit the entire industry.

It’s also clear proof that Llama 4 was a massive flop, and Mark isn’t willing to give second chances to the current team.

Entering speculation territory, I believe Zuck it starting to see that AI is a serious risk for its social media business, and sees it as an opportunity to grow into this new market before the other business dries out. The reason I believe this is that social media burnout is a very real thing, and we are already seeing AI products aimed at fetching important news from several social media sources and avoid ‘doom-scrolling’.

And that’s without even considering the massive amount of new AI-generated accounts that will soon make social media a nightmare. I’m not bullish at all on the future of social media, and it might be the case that Zuck isn’t very optimistic either.

HARDWARE WARS
Is OpenAI Using TPUs?

According to a report by The Information, Google has persuaded OpenAI to adopt TPUs, rather than relying solely on NVIDIA GPUs through Microsoft and Oracle clouds.

If true, which appears to be the case, OpenAI joins a growing number of AI trainers and servers like SafeSuperIntelligence, Meta, and Thinking Machines Labs that are pivoting to a slightly inferior yet much cheaper inference stack that Google offers in comparison to the high-end NVIDIA Blackwell GPUs.

TheWhiteBox’s takeaway:

Is NVIDIA in trouble? Not quite.

Competition was going to come eventually. Curiously, Google seems to be a larger competitor than AMD in today’s world, because AMD’s servers don’t scale to the number of GPUs per server that AI training demands, at least until 2026.

But what makes Google’s TPUs attractive compared to NVIDIA’s GPUs?

Besides the much lower margins over the product (NVIDIA’s margins are falling due to increased per-GPU memory allocations, but are still insane as being the market leader), TPUs like Ironwood (not yet deployed) offer the promise of the most significant scale-up in the business, meaning they offer the largest accelarator servers on the planet.

Let me explain.

AI runs on accelerators (GPUs (NVIDIA/AMD), TPUs (Google), LPUs (Groq), NPUs (Huawei/Apple), etc.) The issue is that AI models and their memory cache are almost always larger than what a single accelerator offers, so you need many accelerators for each AI workload. In other words, AI compute workloads are almost always distributed across several hardware devices.

In turn, this means these devices need to communicate information with each other, which is challenging because excessive communication can result in the hardware moving data instead of processing it. In layman’s terms, idle time.

This ‘idle time’ is not only very expensive accounting-wise (you aren’t getting a return from your costly CAPEX investments with a very short depreciation schedule, as moving data is not revenue-generating) but also lengthens training (models take longer to train) and impacts latency (models take longer to respond).

However, GPUs inside a single server use special communication hardware (NVLink/NVSwitch in the case of NVIDIA) that minimizes communication overhead, so naturally, the goal is to keep the workloads inside the server (server-to-server communication is painfully slow).

And here is where other players can make a dent on NVIDIA, as:

NVIDIA’s biggest GPU server currently uses 72 B200 Blackwells (all-to-all, so all GPUs are directly communicating)
In contrast, Google’s largest server, called a ‘pod’, is equipped with over 9,000 TPUs. Yes, more than nine thousand. The trick is that they aren’t all-to-all but optimized anyway, so it’s a desirable proposition for AI Labs.
Huawei’s CloudMatrix features 384 all-to-all Ascend 910C NPUs, the largest all-to-all server in the world. The trade-off is that those chips are nowhere close to the state-of-the-art, and are extremely energy-intensive per unit of provided FLOP (unit of computation).

A single CloudMatrix server requires 500kW of power, or half a MegaWatt. For reference, NVIDIA’s most powerful server, Blackwell NVL72, requires ~120kW. But Huawei’s offering has an advantage, and that is being in a country that in May alone added 120 GW of solar+wind power, meaning that requiring lots of energy is irrelevant if energy is basically free.

The takeaway is that NVIDIA has the best accelerators, period, but Google’s more competitive pricing (better FLOPs per dollar) and Huawei, with China subsidizing their efforts, make NVIDIA’s job of selling GPUs a much harder one.

Moreover, if AMD delivers on its next server and joins the race along with others like Amazon or Microsoft (the latter’s efforts are not going particularly well, though), I believe the commoditization of the accelerator market is coming in 2026.

CUSTOMIZED SOFTWARE
OpenAI’s Consulting Services

OpenAI has taken a page from Palantir’s book and has created a team of FDEs (Forward-Deployed Engineers), a consulting service for companies spending more than $10 million on OpenAI products that helps these customers create unique fine-tuned versions of OpenAI models for their use cases.

Just like Palantir, the idea is that OpenAI delivers a better experience for its customers, but crucially, takes in learnings from these exercises to create new products that can then be sold to others.

TheWhiteBox’s takeaway:

But why is this relevant news?

If you’re a regular of this newsletter, you know I’ve been promoting the advent of customized software as the new paradigm. Feared by companies for decades, with AI software can now be tailored not only to the company, but to every single use case in that company, essentially making generalized software (the SaaS market), obsolete.

It’s coming.

GOOGLE LABS
Google’s New AI Product, Doppl

Google Labs, the team that created products like NotebookLM and Deep Research, has presented a new product, Doppl.

The product is a well-known pipedream of the industry, a product that lets you upload your image and clothes, and generates a video showing you wearing them to see if you like it.

TheWhiteBox’s takeaway:

Nothing world-changing, but it’s certainly another barrage over AI startup land, as several companies were working on this precise thing.

For me, the takeaway is that this only reinforces my view that the application layer will soon be a cemetery of AI startups outcompeted by model-layer companies.

In the crypto world, the mantra was ‘not your keys, not your coins,’ a criticism of people naive enough to allow crypto brokers to store their coins on their behalf, making them vulnerable to potential issues with the company.

I now have a similar mantra for AI: “not your model, not your business.”

M&A
OpenAI Acquires Crossing Minds

OpenAI has acquired Crossing Minds, an AI recommendation engine for e-commerce platforms. As seen in the video, you can use the tool to search for items in e-commerce stores using AI.

Now, the team will be joining OpenAI.

TheWhiteBox’s takeaway:

This move is heavily focused on ChatGPT’s consumer side, another move in Direct-to-Consumer for a company that is already highly leveraged on consumers instead of enterprises, a decision that is understandable because adoption among consumers is much larger, but also concerning because it makes your revenues much more dependent on macroeconomic factors and user consumption.

I also see this as a clear move in favor of OpenAI’s Operator, its computer-use agent. Crossing Minds’ AI recommendation engine could boost the capacity of these error-prone engines to search for items in an e-commerce store more easily.

Among potential use cases, shopping stands out, so having an engine that can allow models to search e-commerce sites via natural language conversations instead of clicking (a highly error-prone action trajectory) makes a lot of sense.

TREND OF THE WEEK
Why 2026 AI Will Look Nothing Like Today

Today, I've set a goal to show you the best way I can what the future of AI models is.

After weeks of hinting at it, I finally feel comfortable enough to draw this future, thanks to a reputable AI researcher sharing his take, which, excitingly, looks very similar to the things we have been discussing lately.

That future model looks very little like the ones we have right now. The future, or the “cognitive core” as this illustrious researcher has described it, is small.

Let’s dive in.

Knowledge Has a Price

Put in very, very simple terms, the process of creating an AI model is as follows:

Expose the model to never-ending amounts of data
By learning to predict what will happen next in the data, the model captures the underlying knowledge

For instance, if the model can finish the sequence “Everest is on the border between China and…“ and the model confidently predicts “Nepal,” we can argue that the model now knows where Everest is.

Even if we argue that the model doesn’t truly understand what Everest, China, or Nepal actually are, at least it can accurately predict knowledge that describes all of them.

The issue is that this compression of knowledge into the model has a price: size. As the amount of knowledge the model has to capture grows, so does its size. In layman’s terms, a larger model can store more knowledge than a smaller one.

As knowledge compression is seen as crucial to creating smarter models, and the data fed into these models also grows as another vector of model smartness, the models naturally increase in size as well.

But the question is: do we need these models to know it all?

The ‘Other’ Way

As I was saying, we don’t train these models to know it all, but because larger datasets are correlated with higher model intelligence.

But we both know that this is an uncomfortable reality; we have larger-than-life models that know everything there is to know about the Han Dynasty’s poetry, even though you couldn’t care less about it.

Thus, what if instead we had models that are very small but are connected to search engines, which provide the knowledge to the model only when needed?

What if the actual model knows stuff ‘barely,’ but it’s great at leveraging real-time data retrieved from local documents or Internet search engines when required?

What if we don’t need models to know everything, but being proficient at searching that information only when needed?

These models, as they aren’t required to store every piece of knowledge ever created by humans, can be much, much smaller and much, much more convenient to use. Faster and runnable on modest hardware we all have in our pockets or desktops, a ChatGPT that can be stored in your personal devices, protected from cybersecurity threats, and reasoning at the speed of light.

Interestingly, this idea is referred to as the “cognitive core,” as defined by Andrej Karpathy, and I’m growing increasingly convinced that it’s the future of AI.

Let me explain how it works.

The Big 5 & Cognitive Core

While larger datasets have led to better models, we also have proof that, in reality, it isn’t just about growing the dataset arbitrarily; quality matters.

Importantly, you can train compelling small models if the training dataset is of extreme quality, with examples like Microsoft’s Phi model family. Therefore, if quality beats quantity, you can achieve impressive results with a much smaller model.

However, a recent breakthrough has opened another door, allowing us to deploy much larger models in smaller packages: Matryoshka models.

I talked about them recently, so I’ll spare you the trouble of going over it again, but the idea is generating a lot of attention because only in the past few days have the models been actually released.

The idea is straightforward: while training a ‘larger’ model, train smaller versions inside of it, which can also be specialized in different areas. The larger model (the largest Matryoshka doll) is the best model, but in some tasks, a smaller doll (model) inside of it can do the job just fine, so you use that model instead.

Thus, the idea behind Matryoshka models is to allow switching between model sizes at runtime, enabling you to store the rest of the model in hard memory while it is not in use.

But why is this a big deal?

The idea is very compelling: while most AI models have to sit in the very-constrained operating memory of our devices (for reference, the RAM of an iPhone is in the order of 150 to 200 times smaller than the flash memory these devices also have), this architecture allows you to store a huge model in the much-larger hard memory and only load the necessary parts to the constrained operating memory every time.

Therefore, we now have a way to create better, smaller models (with higher data quality) while also not compromising on size too much, thanks to Matryoshkas.

But what other qualities, features, and characteristics will this “cognitive core” model have?

They call them the Big 5.

Web search: If our models won’t know how many sides Ramanujan’s “Lost notebook” has, and for whatever reason you want to know, it will need to look it up—it’s 138, for those wondering.
Code execution: AI models not only use code to, well, write code, but also as a scratchpad to think, verify calculations, and perform other important tasks (in fact, most reasoning models rely heavily on code execution to reason).
Document library: Generally referred to as ‘RAG’ (Retrieval Augmented Generation, although I dislike this because web search is also RAG is its own way), crucial to provide these models with data that’s local, aka of value to you.
Multimodal: These models are also required to process and generate video, images, and audio, as other means of communication and expression.
Agentic: More formally, leveraging tools specifically through the Model Context Protocol servers to carry out actions without risk of failure, which I showed you how to do in previous issues.

With the Big 5, having knowledge inside the model is not only not necessary, but a complete overkill, and we can get away with a much more nimble model, the “cognitive core” that ‘reasons’ while complementing the ‘Big 5’ to retrieve information and take action.

Besides Andrej Karpathy (famous for seeing things before they happen), Sam Altman himself has shared a similar view of the future, defining these future models as: “a very tiny model with superhuman reasoning, 1 trillion tokens of context, and access to every tool you can imagine.”

He then described it as the “platonic ideal,” and if you look carefully, it just what we have just described. He even went further and aligned with this idea of models not having to know everything, stating, “using these models as databases—what we are doing just now—is ridiculous.”

He knows something we don’t, or do we?

The “Cognitive Core” and the Future of AI

The status quo is broken; we run AI models that are not efficient enough for planetary-scale inference. We need them to be small, nimble, and smart.

AI Labs are clearly pointing in that direction, and so am I.

Until Thursday!

THEWHITEBOX
Join Premium Today!

If you like this content, by joining Premium, you will receive four times as much content weekly without saturating your inbox. You will even be able to ask the questions you need answers to.

Until next time!

Give a Rating to Today's Newsletter

For business inquiries, reach me out at [email protected]

AIs Suck at Making Money & the Future of AI Models

THEWHITEBOXTLDR;

AGENTSCan AI Run a Business?

TheWhiteBox’s takeaway:

HEALTHCAREIs Chain-of-Thought Thinking dying?

HARDWARE WARSZuck’s OpenAI Poaching Spree

TheWhiteBox’s takeaway:

HARDWARE WARSIs OpenAI Using TPUs?

TheWhiteBox’s takeaway:

CUSTOMIZED SOFTWAREOpenAI’s Consulting Services

TheWhiteBox’s takeaway:

GOOGLE LABSGoogle’s New AI Product, Doppl

TheWhiteBox’s takeaway:

M&AOpenAI Acquires Crossing Minds

TheWhiteBox’s takeaway:

TREND OF THE WEEKWhy 2026 AI Will Look Nothing Like Today

Knowledge Has a Price

The ‘Other’ Way

The Big 5 & Cognitive Core

The “Cognitive Core” and the Future of AI

THEWHITEBOXJoin Premium Today!

Give a Rating to Today's Newsletter

THEWHITEBOX
TLDR;

AGENTS
Can AI Run a Business?

HEALTHCARE
Is Chain-of-Thought Thinking dying?

HARDWARE WARS
Zuck’s OpenAI Poaching Spree

HARDWARE WARS
Is OpenAI Using TPUs?

CUSTOMIZED SOFTWARE
OpenAI’s Consulting Services

GOOGLE LABS
Google’s New AI Product, Doppl

M&A
OpenAI Acquires Crossing Minds

TREND OF THE WEEK
Why 2026 AI Will Look Nothing Like Today

THEWHITEBOX
Join Premium Today!