Why Most AI Investors Will Lose Money

INVESTING
Why Most AI Investors Will Lose Money

A few weeks ago, I gave you my reasons why I thought most AI startups were doomed. Today, with the excitement around AI startups continuing to grow, I want to take it a step further to see whether I was the one swimming naked. And as it turns out… no I wasn’t, because the picture is actually worse than I thought.

Most current AI software startups are bullshit, and most of them (and their investors) will lose a lot of money.

If not all.

And not because their products are bad, but because their business doesn’t make any sense and will never be profitable. Most venture capitalists are high on early successes and incapable of seeing how a train is about to steamroll their investment. And that train is not OpenAI. It’s called stagnant growth and low margins.

However, to understand why this is the case, we need to look at the AI industry as a whole.

Therefore:

  1. We will take a close look at Coreweave after its IPO to explain why a company that shouldn’t exist does. Understanding this will make several key aspects of what makes AI infrastructure different very apparent and how even at this layer, most companies are still not going to make it. With a simple risk management and TCO framework, you will be able to easily tell for yourself who’s doomed.

  2. We will then dissect the infrastructure business, understanding how customer dynamics completely change in the GPU cloud era, which will explain why most billion-dollar-backed startups with $100 million in annual recurring revenue (ARR) will still fail and never make profit, despite early success.

  3. And, finally, we will take a look at AI software and, well, I really don’t have a different way to put it, but quoting Steve Carell in The Big Short“call bullshit on every fucking thing.”

Let’s dive in.

The Company That Shouldn’t Exist.

I firmly believe that if you understand CoreWeave’s business, you already understand the AI industry better than most VCs.

So, what is Coreweave?

CoreWeave, the latest AI company to IPO (or, dare I say, the only one so far), is a company that builds, manages, and rents accelerated-hardware data centers for others.

Accelerated hardware is any hardware that parallelizes compute to increase throughput. GPUs, TPUs, NPUs, LPUs, WSEs… all fall in this category. As AI requires performing a lot of computations per second, these pieces of hardware are the only option.

Originally an Ethereum mining company, they executed a perfect pivot with the Generative AI craze and, since then, have become a go-to provider of data center buildout and management for various key customers like NVIDIA, Microsoft, or OpenAI.

From blockchains to AI? How?

Although that is no longer the case, there was a time when Ethereum behaved like Bitcoin. By behave, I mean that it was a ‘mining’ blockchain.

Miners compete to solve a very complex mathematical problem, and the one that solves it gets to add the next block to the blockchain, receiving Bitcoin or Ether as payment. While participating in this ‘competition,’ they also validate the new transactions to secure the blockchain.

The key thing here is that participating in that game requires hardware similar to GPUs (not quite the same, but enough), making the transition quite easy for them.

But seeing CoreWeave’s market and potential competition, you can’t help but wonder: How do they even exist?

NVIDIA’s Baby Boy

Besides the fact they are the best accelerator cloud (more on that shortly); they are also NVIDIA’s darling.

Since the Generative AI explosion, NVIDIA even managed to become the most valuable company in the world for a brief period. However, they are currently very concentrated in a single business, selling GPUs.

Naturally, they want to diversify, and there’s only one natural way to do so: building GPU data centers and renting them. They sell GPUs, so while they're at it, why not rent them on a large scale?

The problem is that, from a customer relationship perspective, that’s probably not the best idea. Not only is their revenue concentrated in one revenue stream, but that revenue stream is also concentrated in a handful of companies that would become competitors if NVIDIA entered the accelerated cloud business.

Competing with your own customers doesn’t sound great. So what did NVIDIA do? They saw CoreWeave’s play and thought to themselves: if we can’t compete directly, we are going to create our own GPU cloud subsidiary.

They do have their own cloud computing offering, called NIMs, but one that isn’t close to the scale CoreWeave or Hyperscalers have.

Thus, albeit owning “just” owning 3.8% of the current CoreWeave today, this a delusion of a much higher stake that has been reduced over time. Therefore, this clearly isn’t an equity play, but a revenue diversification one:

They get to own their own GPU customer, diversifying the pool of customers while also holding a meaningful stake in what has turned out to be a winner in the market.

Therefore, Coreweave has been allowed to exist because NVIDIA ensured they could buy NVIDIA GPUs, something it could not have done otherwise (Hyperscalers would have easily outcompeted them on price); CW exists because NVIDIA wanted it to.

But does that alone explain CoreWeave’s existence and success? Of course not.

Brilliant Execution

Currently, CW provides services to top labs like OpenAI or Meta, to trading firms like Jane Street for other types of accelerator loads, and even manages some NVIDIA internal clusters, among many others:

Source: CoreWeave

I recommend this link for a deep dive into why CW is such a good GPU cloud provider, but here’s the summary:

  • Flexible deployments, crucial as each client will have its own needs and constraints

  • Automated node health checks (passive + active) ensure high goodput and minimize downtime.

  • Deep monitoring, early GPU access, and top MLPerf results make them a leader, though focused mainly on long-term, large-scale rentals.

In short, they offer great customer experience, almost no downtime and higher hardware utilization than other clouds by proactively running continuous healthchecks on the cluster.

It sounds like the bare minimum you would expect, but you would be surprised how rare this is because running these data centers is actually very complicated.

But funnily enough, this isn’t the biggest reason one should be bullish on this company. What makes CW great as a business besides the operational aspects, the reason why I’ve decided to talk about them to build my investment thesis is that, unlike most AI companies today, this company’s business actually makes sense and will make money.

Yes, that “low bar” is literally the bar for the industry. And most investors have yet to realize.

And here is where we enter our first “great forgotten” in AI investing: risk management.

Risk Management Still Matters

Most of the mortgage speculation that led to the last big financial crisis 17 years ago rested upon poor risk management of mortgage-backed securities… or lack thereof.

And without the overarching drama, poor risk management—or lack thereof—remains an issue very much ignored today.

But why?

It’s no secret that Generative AI is expensive if you want to run it at scale. However, AI introduces a clear 1:1 correlation between infrastructure investment and revenues; if you have higher demand from your clients for Generative AI products, you have to invest more in your GenAI cloud products (this was not the case in the previous cloud era).

And while your customers having to pay more to generate more value from GenAI appears to be a great thing for GPU providers, it also introduces customer risk, as much of your expected revenue from customers will be disguised debt or subjected to more stringent payment conditions.

You may be puzzled by what I’m saying right now, but I promise everything will make sense in a minute.

A good example is NVIDIA. They charge billions at a time for every purchase order they receive. Nobody, not even their clients, who happen to be the world's richest companies, pays upfront. In fact, they often won’t pay a single dime until the cluster is up and running.

How do we know this?

Simply, we only need to take a look at NVIDIA’s accounts receivable (money owed to them by customers). In just a year, the amount of accounts receivable has grown considerably. More concerningly, the days it takes to receive payment (Days Sales Outstanding, or DSO) has grown 32% to 53 days according to their last earnings.

No one has batted an eye at this because NVIDIA’s customers are as reliable as they get, but the point is that this is equivalent to the MBS issues that poisoned markets almost twenty years ago, but in today’s AI world.

People aren’t paying attention and it’s going to become a real problem.

That said, even though analysts have obsessively focused on NVIDIA’s declining gross margins (important, but inevitable due to increase HBM costs as we saw in our AMD and Google’s deep dives), NVIDIA’s CFO, Colette Kress, did touch on the fact, blaming “shipment linearity and increased inventory to support our Blackwell product ramp,” which might be true.

Still, it’s most likely because customers aren’t as comfortable paying until everything is up and running due to the macroeconomic situation and also because, concerningly, deploying Blackwell GPUs seems to be much trickier than previous generations (liquid cooling, off-the-chart per-server power requirements exceeding 120kW), which seems to be what the CFO is lowkey acknowledging in very carefully-chosen words.

Although NVIDIA’s customers are some of the most liquid corporations on the planet, the alarming change is clear and thus risk profiling is necessary, as there’s a decent chance not only that customers will increase their DSO but, worse, that they may default on payments.

And when you factor this in, it’s where things start to get nasty.

Wasn’t the Point to Make Money?

Accelerated cloud providers can be broken into three groups:

  1. Bare metal. They buy the chips from NVIDIA, build the data center and rent the bare metal and the operations to customers to use the GPUs as they please.

  2. Software. Companies that rent out the GPUs at a very low-level from the companies in group 1, and offer the management of these accelerators to others, offering post-training, fine-tuning, and simply serverless inference services.

  3. Verticalized. They do some combination of groups 1 and 2. They handle the underlying hardware, but also offer software on top.

But at the end of the day, they are all in the business of managing AI data centers. This is much easier said than done.

The assets you’re investing in are not only expensive, but they are also faulty and burn pretty quickly; every three to five years, you need new GPUs.

Therefore, they lose value over time and quite rapidly. If your depreciation schedule is 3 years, for example, every year, your GPUs will decline in—book—value by a third, so your capacity to charge for renting them falls proportionally (or at least to a certain degree).

Therefore, you are basically running a race against time in which your goal is to have your ‘GPUs go brrr’ (generating as many FLOPs as possible), while every year having to drop prices to account for the performance loss, all the while having invested hundreds of millions of dollars (or billions).

So, if we’re saying that risk management is very important, how do we define risk here? 

While banks usually use the graph below to determine the interest rates of a loan based on the risk of default, you can adapt it to evaluating customer default risk and tailor your payment flexibility depending on two factors:

  • Duration and size of the renting contract

  • Paid upfront or over time

This gives us a visualization of customer risk, where the more vibrant the color (moving up and to the right), the higher the default risk.

Specifically, we can more or less define four groups based on the two variables (contract duration and payment conditions):

Of course, ideally, you want to be in the bottom-left part. This is basically where CoreWeave is.

As the customer is paying upfront, they will hold you accountable for the depreciating value of the assets and thus force you to decrease prices, so they can also offset the risks of paying a multi-year, billion-dollar contract upfront.

It’s a win-win; customers get lower prices, and CW has zero payment risk. However, to play this game, you are still going to need rich customers who can pay upfront, which means that customers concentrate on a handful of candidates.

For other companies, startups like TogetherAI, Replicate, Lambda, and so on, they have no option but to move around the graph to attract the rest of the customer pool by increasing flexibility in the two variables:

  1. Allowing shorter contracts

  2. Allowing more pay-over-time flexibility

Naturally, this increases the risk for you as a GPU provider, and the price for the customer, as you have to offset the risk somehow. This is quite possibly one of the worst businesses you can be in, as it requires:

  1. Massive investment, hundreds of millions of dollars, or billions, to buy the accelerators and build (or rent) the data centers.

  2. Taking up a lot of customer risk by being flexible in either contract duration (the shorter, the more risk of not finding new customers once the contract ends) or payment flexibility (you will allow delayed payments, but you raise prices to offset the risk of payment).

  3. But you can raise prices only as much, as it’s not like you’re offering something wildly different to what others are offering; in the eyes of the customer, your offering is a commodity (all providers are selling the same damn thing: GPUs).

  4. All the while, you’re competing with an NVIDIA-backed, operationally excellent, and trusted player, CoreWeave, and the rest of the Hyperscalers with dozens of billions in free cash flows.

  5. And to top it off, we are entering an inference-based compute world where per-GPU arithmetic intensity collapses so most of your running time won’t be making computations.

If your puzzled by the last point, reading Thursday’s newsletter will solve all your doubts.

For those reasons, I believe investors in companies that aren’t in the low-risk zone, i.e., the majority, will never make money as they will be outcompeted on prices, struggle to maintain high hardware usage, or simply run out of money once customers start increasing their days to payment.

But this is something obvious to the executives in these startups. Therefore, they are doing what anyone would do in their place: verticalizing up to software, offering LLM platforms for training, fine-tuning, and so on, to increase the value of their offering.

This leads us to the biggest realization that dawned on me a few weeks ago and which sustains my entire investment thesis that most investors will lose their money.

That is not only a mistake but also explains perfectly why AI software companies are mostly dead on arrival.

But why? 

Subscribe to Full Premium package to read the rest.

Become a paying subscriber of Full Premium package to get access to this post and other subscriber-only content.

Already a paying subscriber? Sign In.

A subscription gets you:

  • • NO ADS
  • • An additional insights email on Tuesdays
  • • Gain access to TheWhiteBox's knowledge base to access four times more content than the free version on markets, cutting-edge research, company deep dives, AI engineering tips, & more