THEWHITEBOX
My Honest & Sober Opinion on AI

Next week, I’m giving a keynote to executives from top companies here in Spain, and I’ve been tasked with giving a state of the union on AI, from markets to product.

It’s designed to be a review that keeps you instantly up to date on the industry and all the tricks it tries to play on you. It’s a sober, no-bullshit analysis that will help you make better decisions along the way.

If you ever wanted to ask me, “What’s your honest opinion on the state of AI?” this is what I would answer.

Let’s dive in.

The Macro Picture in 2 Minutes

First, we are going to recap the state of the industry from a macro perspective: how big it is and where money is coming from through a series of graphs you can go over in 2 minutes.

From the looks of it, the level of investment we’re seeing in AI is unprecedented, especially in terms of the steepness (how fast we are deploying so much capital).

Source: Dell’Oro, IEA

However, once we factor in global GDP, we realize that the situation is not that impressive or unique in history:

Source: Dell’oro, IEA

Irrespective of that, the stock market has reacted in good measure, rallying to new all-time highs at the time of writing, and catapulting the ‘winners’, companies exposed to AI in a positive way, and “sepulting” the losers, those that are negatively affected by AI.

Today, the stock market is a market of AI winners and losers.

Which is to say, in the eyes of the market, your correlation, positive or negative, relative to AI, determines your fate. Perhaps no better example of this is the following, where I’ve plotted the performance of the CPU and Memory chips (a weighted basket of the most popular stocks in each) relative to a SaaS index.

And while the latter is down 15% year-to-date, memory stocks are up an average of 200% (again, weighted), meaning they’ve tripled in value, and CPU stocks have, on a weighted average basis, doubled.

Source: Author

This has, of course, concentrated indices (and earnings) in many countries around AI (we have some extreme cases like Korea), but the US is an example of this too:

This has stoked fears of an imminent bubble explosion (it’s a bubble, but the implications are harder to answer), but if we compare it to other technology-driven bubbles, it’s nowhere near the same levels of ‘bubbleness’, at least relative to the “dot-com” bubble of the early 2000s:

Source: Pictet Asset Management

Seeing all of this, the natural question is: who’s paying?

The default answer everyone resorts to is US and Chinese Hyperscalers, especially the former (Meta, Amazon, Microsoft, Google, and Oracle), with capital expenditures that are more than impressive and grow incredibly year over year.

Source: Morgan Stanley

However, these guys don’t have infinite money, and it’s having an impact that's emptying reserves really fast.

Source: Bloomberg

Thus, if you dig deeper, you realize that the source is actually ‘sources’ across vastly different levels or associated risk; a mixture between Big Tech cash and debt, and private equity, credit, and VCs.

The conversation most enthusiasts aren’t prepared for is that, above, there’s almost a trillion dollars of money that has to come from high-risk yields (~$800 billion) and dogshit securitizations, where we’re going to see some pretty large fuckups (~$150 billion).

Soon, though, we’ll have retail investors joining the party granting liquidity to OpenAI and Anthropic; let’s see how that goes.

But some of you may be inclined to argue that revenues are finally piling in and financing much of this, given Anthropic’s impressive revenue growth. But as you probably know, they are trailing this massive investment by a considerable margin (no more than $100 billion/year by year’s end).

Yes, we’ve all heard the claims of massive AI revenues from the Hyperscalers, but every time I hear that argument, I just show them this:

Source: The Wall Street Journal

Guys, a lot of this money is “made up.” But what is holding enterprises back from adopting this amazing technology yesterday?

Oh boy, where do I start.

State of the Technology

Most people just assume AI is ready. It’s not. At least not for the most meaningful, economically valuable tasks.

And to know “where AI makes sense, and where it doesn’t”, as someone looking to adopt the technology, be that a CEO or a tech enthusiast, the same rules apply; the only thing that changes is scale (for good and bad).

There are three factors you should consider: the particularities of inference, economic, and technological.

Why AI inference is a real mess

To truly understand the situation in the AI industry, we need to grasp AI inference and the issues it entails.

AI inference, serving AI models to users, is what pays the bills for the entire industry. At this point, AI is almost synonymous with Generative AI, the field in which models generate responses back to you, with examples like ChatGPT or Claude, at least in terms of investment.

These responses are made up of ‘tokens’; words in text models, image patches for image and video generators, and so on.

Since everything is tokenized (both the input you provide and the model's response), AI is charged by the token, making your ‘AI costs’ a function of both processed and generated tokens.

In other words, the entire business is summarized in one simple inference (pun intended): tokens equal revenue; the more tokens I generate, the more revenue I make. Thus, the goal is to process and generate as many tokens as possible and charge more than they cost me.

Sounds simple. However, the problem here lies in “charge more” and generating tokens in a way that I can make a return. Because how much I have to charge to actually make money, once we account for all costs, is way more than we can charge now. But we’ll get to that later.

For most consumers, this is all irrelevant because we pay for subscriptions, as AI Labs try to hide the complexity of it all.

But at heart, everything is unitary (meaning the Labs incur a cost for each token they process and generate, and hope the subscription value stays above the total cost of those tokens).

And sadly, subscriptions are not profitable, and AI Labs lose money in the vast majority of them. For example, Anthropic recently claimed the average developer costs up to $13 per day, or ~$260 for a 20-day work month. In reality, nothing about AI is profitable these days.

And before we tackle the actual implications, I think it’s important to give you an idea of why the economics of AI are so broken and why the price of the technology, compared to previous technological improvements, is actually going up.

Are the new models just too big and costly? Well, yes and no. Models are getting larger and costlier, but improved hardware largely eliminates that impact and, in fact, $/token is falling with each new hardware generation.

Which is to say, the problem is not the hardware (though that is half true, as we see below), but mostly the cost of purchasing it; it’s just that they are spending so much money to grow that they have little option; there’s really nothing they can do about it because the hill of costs they are trying to climb is too steep.

Let me explain why the cost structure of AI providers is so broken. The real problem lies in the unit economics of what sits behind the AI, the hardware, especially in the two big hurdles below.

  1. Our hardware isn’t optimized for inference (serving AI models to users), which is the main driver of most compute demand. This means the cost of producing tokens is still too high relative to how much companies can charge for them.

  2. Capital costs: As mentioned, the cost of getting set up destroys the entire business case from the start.

You will have heard that AI inference is super profitable, with some people even quoting gross margins of 90%. That means that, for every dollar of sales, the gross profit is 90 cents.

The problem is that this number conveniently ignores both R&D (Research and Development) and also the fact that they quote gross margins to ignore operating costs, and most importantly, cash flows, which is also telling, because that’s where the real problems reside (capital costs). Once you account for those, the situation is much, much worse (huge losses, basically).

But why is hardware suboptimized for inference?

The reason is the steepness of the curve that defines the inference trade-off:

  • If we maximize token throughput (generating tokens), we sacrifice user latency, which skyrockets, and users say goodbye to you.

  • If we minimize latency (optimize for tokens/second, also known as interactivity) to improve the user experience, you're running your hardware at a massive discount relative to what you could be making with it.

This gives us the famous throughput vs. interactivity curves, like the one below, which basically shows that the more interactivity you want (higher tokens/second per user), the fewer tokens you get per GPU. As LLMs are billed by the token, that means less revenue.

And can’t we just stay at the top of the curve? Well, it’s not that simple. Because AI models are commoditized, customers churn if your responses are slow.

Therefore, most inference providers operate at the lower end of the curve, ensuring a good user experience even if that means less revenue.

Of course, as I was just suggesting, one option is to make the workload more ‘GPU-friendly’. In inference, that means moving up the curve to the left, increasing the number of tokens each GPU produces, and thereby securing higher revenue relative to the hardware you have.

But if you do that, customers churn, so you’re forced to live in ‘suboptimized land’.

The problem with this decision is that you’re still running a race against time, because your chips depreciate really fast and your hardware utilization is worse; you’re paying a premium for hardware you’re then running at a discount, like buying a Ferrari to drive it only through downtown Tallahassee.

On the depreciation topic, People are now convinced accelerators last more than 5 years, but I’m not convinced, given that newer chips are being fried to meet the huge inference demand. I’m kind of on Burry’s side here but I think it will take time to prove him right on this one (and he’s exaggerating a lil’ bit).

So, even if you want to offer the best user experience possible, you have to account for the fact that every such decision makes it harder for you to ever make money.

At this point, they have two options: premium inference and new hardware.

The former offers ultra-fast tokens at a huge premium. If you’re going to be forced to live in the lower ends of the curve, producing way fewer tokens than I would need to make money at standard prices, I’m going to offer a premium service and charge many times the usual price.

Nonetheless, Cursor and Anthropic both charge six times the price for tokens that are 2.5 times faster than the standard ones. And guess what? People are paying.

To offer those speeds, they simply batch your request with fewer concurrent users (or even alone). Basically, they go down the curve even more to a lower number of concurrent users, but charge a huge premium in return to each of these ‘premium users’.

The other option is using hardware with a better design for such workloads. Although the inference trade-off is unavoidable (it’s just maths), the thing is that the nature of our hardware, particularly GPUs, which are built for workloads very different from AI inference decoding, just makes it even worse.

One example that is more designed to live in that part of the curve is Cerebras, the hottest recent IPO, which offers such high memory bandwidth per chip (the main bottleneck in inference) that even for extremely sparse workloads (e.g., using an entire cluster to serve a single user, the type of stuff you have to pull off in inference), the amount of compute I get out of the chip is enormous (left side of the below graph).

However, as you can see below, the issue is that for denser workloads (i.e., training or the inference prefill stage), Cerebras loses its appeal almost immediately (and Cerebras is highly custom hardware, so it's very expensive to build, making the use case particularly off-putting).

Nonetheless, for workloads like training, a single R200 chip (NVIDIA Rubin Chip, retailing at ~$50k) delivers better throughput than a $1 million Cerebras chip, offering 20 times the performance at 1/20th the price.

This means that, in reality, there’s really no perfect solution, and the ideal hardware depends on the situation. So, what do we do if there’s no one-size-fits-all solution?

Well, easy, general-purpose hardware is dying.

The fact that the picture changes so much across different hardware types makes it very clear that the future of AI inference lies in disaggregation, with solutions such as NVIDIA’s SuperPoD or Amazon’s deal with Cerebras to run inference on a mix of Trainium and WSE chips.

If you were wondering why NVIDIA paid $20 billion for a company, Groq, that posed close to zero threat to its business, well, now you know: it turns out it was not a matter of competition but of survival.

In both instances, ‘GPU-like’ accelerators are used for the denser parts of the workload, and the sparse parts are streamed to hardware such as Cerebras’s WSEs or Groq’s LPUs to keep throughput high (and thus, revenues piling in) while still offering a great user experience to users (fast and cheap tokens).

So, what’s the takeaway here? Well, to me, it’s that the picture is way more complicated than many investors and enthusiasts alike think.

Which leads well into what all this means to your wallet.

AI on the margins

If you’re a regular reader, you know I’ve been screaming off the rooftops for quite some time that AI is way more expensive than we think (and the section above proves why). In a way, we are getting spoiled as AI Labs are in full ‘Silicon Valley’ mode, where growth is all that matters, and profits can come later.

They are getting all of us hooked to this technology, and one day they’ll start raising prices, and there will be nothing you can do about it (well, as we’ll see later, there’s one thing).

In fact, that day has already come, and prices are going up from Copilot to Claude Code. Frontier AIs are no longer guaranteed to be priced lower than before. If anything, they are raising prices.

Besides pushing unitary economics higher (charging more per token), they are also transitioning to usage-based pricing, ensuring they earn a margin on every token they provide instead of charging a flat subscription price.

This leads us to one of the most important graphs in all of AI: understanding how software’s cost curves change with this technology.

In traditional software, you were faced with an important capital investment every few years to buy equipment. After that purchase, even though user numbers increased, costs remained roughly stable and, importantly, predictable.

Cloud simplified things even more. You still had an upfront cost (the cloud migration), but from then on, you’re just dealing with stable costs all the way. Don’t get me wrong, they might have a slight upward trend over time, but not too crazy.

And crucially, onboarding new users incurs negligible additional cost. For instance, onboarding Jack from marketing to HubSpot represents an almost negligible increase in costs, just the price of the seat.

As most software is purchased through SaaS deals, you can’t see what’s going on under the hood, but for HubSpot, that new user represents negligible cost increases over the underlying hardware, which is why traditional software has such high margins.

For you, it’s still a good deal most of the time because while seat prices do go up almost religiously every year, you can predict the behavior of your IT hardware spending, even though you may not realize you’re paying a huge premium for that software.

Well, all this goes to shit with AI.

Because with this technology, every new user counts. And to prove this, I’m going to show you one of the wildest metrics in the history of this technology, and that’s saying something, because you’re not prepared to know how much some users are spending on AI.

Behind the paywall, we move from the industry-level picture to the practical reality of deploying AI: the cost traps that appear once usage scales, the technical limits that still make automation harder than many assume, the hidden fragilities companies need to design around, the operating principles that separate serious AI adoption from superficial experimentation, and a set of final recommendations to adopt AI for anyone remotely serious about this technology.

logo

Subscribe to Full Premium package to read the rest.

Become a paying subscriber of Full Premium package to get access to this post and other subscriber-only content.

Upgrade

A subscription gets you:

  • NO ADS
  • An additional insights email on Tuesdays
  • Gain access to TheWhiteBox's knowledge base to access four times more content than the free version on markets, cutting-edge research, company deep dives, AI engineering tips, & more

Keep Reading