Oracle Makes History, Hardware Fails, & More

THEWHITEBOX
TLDR;

Hello! This week, we have a lot of insights to share, from your very first visualization of hallucinations (and what to do about them) to a detailed overview of NVIDIA’s first inference chip, the product that changes NVIDIA’s roadmap forever.

We also discuss other news like AI’s latest hardware fail, Microsoft and OpenAI’s rivalry, and Oracle’s stellar stock day.

Enjoy!

THEWHITEBOX
Things You’ve Missed By Not Being Premium

On Tuesday, we had a very eventful start of the week with news that should matter to anyone. We had it all: billion-dollar settlements, new great models, more stellar fundraisings, and even a near-telephatic AI product (yes, you read that right).

MARKETS
Oracle’s Best Day Since 1992

Oracle has had its best trading day since 1992, no less, adding +$200 billion in market value with a ~30-40% stock increase in a single day.

The reason is simple: OpenAI and quite a lot of wishful claims. Oracle has secured multiple multi-billion-dollar contracts recently, including a major deal with OpenAI worth $300 billion. These deals have sharply increased Oracle’s backlog (“Remaining Performance Obligations” or RPO), up around fourfold year-on-year.

RPOs represent the total amount customers have committed to spend under signed contracts. However, the company can’t book this revenue until the service or product has been fully delivered.

Unrealized-yet-guaranteed revenue, basically.

What this essentially means is that the market has realized Oracle’s AI play wasn’t remotely priced in, and the stock was significantly undervalued.

But is all this warranted or are investors playing with fire?

TheWhiteBox’s takeaway:

In my view, this is just investors adjusting Oracle’s valuation to Hyperscaler levels; it’s the kind of pop you could expect from AMD if they had such remarkable guidance claims and investors suddenly adjusted AMD’s price to NVIDIA’s.

However, there’s a catch: this valuation is now based a lot (too much for my liking) on guidance.

Owning Oracle stock is now more expensive than any of the other Hyperscalers, meaning investors are pricing in future revenue growth. In other words, they are proclaiming Oracle as the new Hyperscaler, but based on nothing but promises. However, this is understandable, considering:

  1. Oracle’s “guaranteed” future revenues, the RPO metric, have truly skyrocketed to $455 billion.

  2. Oracle’s guidance for the following years is extremely ambitious, as they forecast cloud revenue growing from ~$18 billion to $144 billion in fiscal year 2029, which would represent a ~1300% increase from last year’s AI business (~$10 billion).

Saying “we are about to increase our AI business by 1300% in five years” will obviously make your stock pop. My concern here is nothing but what has been worrying me for months: debt.

The AI Capex run is now unwaveringly dependent on debt, and it’s non-bank transactions that are taking place the most at this point, meaning private credit funds, which sometimes involve insurers (which means a potential debacle can spread across other sectors fast), are becoming more prominent lenders.

In a recent report we covered, JP Morgan announced it was entering the private credit space while highlighting the growing risk of this sector growing too much as a proportion of overall debt, putting the current number, as of July 2025, at a staggering 9%.

I must say that AI Capex debtors are mostly hyper-rich companies, so the risk of default is low, but many of the associated startups, like the AI Labs (OpenAI, xAI, etc.) and the data center constructors/renters (Crusoe, Coreweave, or Oracle), are massively in debt.

And very importantly to this particular case, Oracle does not have the other Hyperscalers’ free cash flow; it simply doesn’t (it’s literally negative, -$5.88 billion for the trailing 4 quarters), meaning they don’t have the money they want to spend, so we are talking about a much more leveraged bet than other Hyperscalers and it’s not even close.

Let me put it this way: OpenAI doesn’t have $300 billion to give to Oracle, and Oracle doesn’t have $300 billion in compute to give OpenAI.

All in all, Oracle has the largest upside (although much less today) amongst Hyperscalers, but also carries the largest risk.

PRODUCT
Microsoft Jumps into Anthropic’s Hands

According to reporting by The Information, Microsoft has reached an agreement with Anthropic so that the latter provides its models to many tools in the MS suite, like Word or Excel.

As they clarify, this has nothing to do with the ongoing battles with OpenAI, and is them acknowledging that Claude performs better than GPT-5 in some particular instances, quite a change in discourse nevertheless.

In the same week, OpenAI announced a LinkedIn competitor, indicating that it will now compete with Microsoft in the software sector.

TheWhiteBox’s takeaway:

No matter how correct they are about it publicly, there’s no denying these two organizations really do not like each other. Besides, it’s wise for both of them to reduce dependency on the other:

  • OpenAI no longer relies solely on Microsoft for GPUs, with Oracle being a crucial part of the ongoing Stargate project (and even discussing using Google’s TPUs), and with the company also preparing an inference chip for 2026 with Broadcom.

  • Conversely, Microsoft no longer relies solely on OpenAI models for its products; it also partners with Anthropic and other players like Grok and Gemini.

The truth is that OpenAI is no longer just like any other startup; it's a behemoth in its own right, so believing Microsoft’s tight grip was going to remain was too naive in my view. Importantly, Microsoft’s Generative AI implementations suck big time compared to ChatGPT or Gemini, so I hope this really makes them improve their suite offerings (Copilot, Power Automate, and the like).

Unmatched Quality. Proven Results. Momentous Creatine.

Momentous Creatine delivers Creapure®—the purest, pharmaceutical-grade creatine from Germany—supporting strength, recovery, and cognitive performance. NSF Certified for Sport® and free of fillers or artificial additives, it’s trusted by professional athletes, Olympians, and the military’s top performers.

Head to livemomentous.com and use code HIVE for up to 35% off your first order.

INSIGHT OF THE WEEK
Visualizing Hallucinations & What To Do About Them

This great piece of research shows hallucinations in crude daylight. Although one could argue that every single prediction is a hallucination, we typically only refer to those instances where the model makes an error.

But what we do know for sure is that it’s intrinsically related to model certainty. That is, how certain a model is about the next prediction; hallucinations always appear when the model gives a low probability to a prediction, which can be measured.

The idea of this project, shown above, by Oscar Balcells, is to highlight in color those words where the model is, basically, “winging it,” meaning it’s not very certain, but it has been trained to give an output anyway, so it gives it anyway.

A critical pattern to notice in the above GIF is how the hallucination rate basically explodes the longer the prompt, which is an excellent visualization of one of the main issues with autoregressive language models (basically all modern models): error accumulation.

Think of a model’s response to you as the model traversing the space of what it knows. Whenever it doubts, it indicates that the model is entering an area of knowledge where it has less certainty.

Sometimes, hallucinations are nothing but conversation pivots; if the model says “I like…” and you’re asking it about a story it has to improvise, the options are basically endless (e.g., ‘sugar’, ‘chocolate’, ‘cats’, all are valid continuations), so the uncertainty is going to be high either way, which mathematically looks like a hallucination which it actually isn’t (this is why it’s confusing to call them hallucinations because all predictions can be considered an hallucination in their own right).

However, at times it’s really a hallucination (i.e., the model is doubting about a thing that has a clear answer). And once it hits that true hallucination, the model basically enters ‘error mode’ and starts hallucinating at an accelerated rate, because it has entered itself into unknown territory. This is an example of error accumulation, because the model is now much more prone to making a mistake, since it has already made one, and every prediction is based on the previous one.

Consider, for example, asking a model “What’s the capital of France?” and instead of predicting the capital of France is “Paris”, it predicts “Berlin”. At that point, the model enters its ‘Berlin space of things’ and starts hallucinating in that direction of its knowledge, even reaching conclusions like the Brandenburg Gate being an “iconic French monument.”

Smarter models today can ‘self-correct’ and acknowledge a previous mistake, but it’s very, very rare.

However, if you open a new conversation with that same model and ask it, “Is the Brandenburg Gate French?” the model would most certainly say “No.” However, in the previous conversation, the model was being asked that same question after having hallucinated Berlin as France’s capital, and thus, the model is much more likely to “lose its sense of reality” and overcommit to the mistake.

In mathematics, the order of the factors in a multiplication does not affect the result (commutative property), but in AI, it does: you can make models say totally different things depending on how you ask them.

This is why, besides preventing that first hallucination (which requires other software gatekeeping models), it’s crucial that you put enough effort into the prompt. Be extremely detailed, deliberate, and know from the very beginning what you want from the model. That will take it where you want it to go; otherwise, it’s a matter of time before you start seeing all red.

AGENTS
Replit, $250 Million & Agent 3

Replit has raised $250 million in its latest funding round, boosting its valuation to $3 billion, almost triple its 2023 valuation.

The company’s annual revenue has skyrocketed from $2.8 million to $150 million in under a year, fueled by a global user base exceeding 40 million as one of the great victors of the “vibe coding” paradigm.

On the product side, Replit introduced Agent 3, its most advanced AI coding assistant yet.

Unlike earlier versions, Agent 3 can autonomously test and fix code, build custom workflows, and handle longer and more complex tasks, running for up to 200 minutes at a time. The system also boasts a testing engine that is three times faster and ten times cheaper than before.

TheWhiteBox’s takeaway:

Let me be clear, I kind of dislike the CEO, not my kind of guy. But credit where credit is due, this is one smart company and a very good product. Out of all the vibe coding platforms, this one is possibly the best.

My issue with the sector in question is that I don’t believe in this sector’s vision. I don’t believe in a future where people become serial app builders, so I think their retention metrics will always, always suck.

Importantly, I also believe they are playing a game that is too competitive, and they will eventually be cannibalized (or, hopefully, acquired) by the likes of OpenAI, Anthropic, or Gemini.

To me, vibe coding platforms represent the perfect example of short-lived success; companies with a very appealing product that, in reality, will never be a serious product beyond a toy someone is marveled with for ten minutes and then never uses again.

Nonetheless, two startups that are more or less in the same arena, Lovable and StackBlitz (Bolt), have 61% and 48% four and six-month retention rates, respectively, meaning they lose 11.6% and 11.5% of their customers every month, or 80% of their consumer base every year. For reference, ChatGPT’s retention rate is almost 80% for six months (just a 4% monthly churn rate, which is tolerable).

Having such large churns forces these companies to acquire new customers consistently, implying huge amounts of marketing spend and low liquidity (in the event they are profitable, which is another story). Not a fan from a business perspective, but the product does what’s intended of it.

LAWSUITS
Judge Puts $1.5 Billion Settlement On Hold

On Tuesday, we discussed Anthropic’s historic $1.5 billion settlement agreement with authors (read the original post for a detailed overview of what happened and why they lost).

Now, in fear that the process won’t be clean, the same judge is putting the entire thing on hold, amid concerns that Anthropic’s lawyers would reach agreements with authors that would push the new deal “down the authors’ throats.”

TheWhiteBox’s takeaway:

Fair. We are talking about $1.5 billion paid on an average of $3,000 per book (500,000 books in total), so the payment process can be tricky at best.

If you want to set a precedent, you'd better make sure you do it right. However, as we argued on Tuesday, this does not seem to be a precedent, and it’s possible that we will not see any other billion-dollar copyright infringement settlements because Anthropic was just very naive in their approach.

HARDWARE
NVIDIA Announces First Inference GPU

Finally, NVIDIA unveils its first inference-only GPU, the Rubin CPX GPU, offering incredibly impressive performance for inference (when you and I are using the model).

But what makes this GPU different from traditional GPUs? This is a bit technical (in layman’s terms, of course), but by the end, you will have a better understanding of NVIDIA’s future than most.

To understand this, we need to understand what’s going on when we say “AI inference.” We have two types of AI workloads:

  1. Training:
    We run a series of sequences through a model that tries its best to predict what word comes next out of all possible words it knows, and the response is compared to a ground truth word. Here, per-sentence latency does not matter, so we pack as many of them as we can through our GPU system. Here, most executed mathematical operations (that’s all an AI model is, a lot of mathematical operations, mainly adding and multiplying) are packaged into matrix multiplications, lots of them, which is the type of workload where general-purpose GPUs shine.

  2. Inference:
    Here, the AI model serves responses to users. As users are sensitive to latency, we employ engineering tricks, such as caching, to minimize redundant operations and reduce the number of operations required for the next prediction. Without going into much detail, the predominant operation here is matrix-to-vector, which is not ideal for GPUs because it leaves a lot of compute room unused and results in excessive data movement.

But what am I getting to with this?

  • In the former, we are mostly “compute-bound” we have our GPUs running at maximum computation capacity.

  • On the contrary, in inference, we are mostly “memory-bound” meaning GPUs spend a much larger portion of their time moving data in and out of their memory (or to and from other GPUs) than actually computing stuff.

But to make this trick to make inference go faster, inference itself is divided into two parts: pre-filling and decoding.

  1. Pre-filling:
    Called the “context loading” phase, here, we build the cache and process the entire sequence that the user has sent, benefiting from more matrix-to-matrix computation (where GPUs excel). Here, it’s more about having as much compute as possible; data movement speed is less relevant.

  2. Decoding:
    This is the famous next-word prediction mechanism that comes to mind when considering inference. Here, the model predicts the response word by word, as you can see when you interact with ChatGPT. The fact that each prediction takes approximately the same time is due to the pre-filling stage (where we have loaded the context).

If you’re struggling to follow, consider this: pre-filling is similar to when you’re shown a paragraph of text and read it entirely before answering, which is the context-building phase. Decoding occurs when you start to answer. However, now, every new prediction is made very quickly because you have taken the time to read the entire text thoroughly beforehand.

Knowing all this, we can now explain NVIDIA’s new GPU. The new GPU focuses on the pre-filling stage, which is something you wouldn’t have understood if I hadn’t gone through the previous explanation. It has a much larger share of compute surface area on the chip, thanks to less space given to memory, which is also smaller because the memory type used, GDDR7, is much slower (we don’t care, in this phase it’s mostly about the GPU going brr) while also occupying less space, allowing for more compute surface area.

This is why this GPU is considered inference only; it’s focused on a very particular part of the process that mostly occurs on inference. This is important for many reasons:

  1. It’s NVIDIA’s first inference-only chip, a response to many of its rivals that are all inference-focused chip makers like Groq or Cerebras. Extremely welcoming news for NVIDIA investors.

  2. It focuses on 4-bit precision, indicating that AI models are moving into even-lower precisions, which means that the current 8-bit paradigm should evolve to 4-bit.

  3. It proves NVIDIA’s unequivocal belief that AI models will continue to be Transformers. At this point, NVIDIA’s future is linked to the Transformer in one way or another, which feels equally secure and uncomfortable to me.

  4. It’s also telling that NVIDIA believes the future is video, as they are including video encoders and decoders on the chip (chip area uniquely focused on video compression and decoding), which might suggest that the future of AI models is not text, but video (expected if you’re a regular of this newsletter).

Long story short, we can use NVIDIA’s roadmap as a predictor of the AI industry’s future (as NVIDIA creates its chips based on suggestions by AI Labs of what they would need in the future).

Thus, NVIDIA is saying the future is Generative, video-focused, low-precision, and with highly specialized chips, meaning the days of general-purpose GPUs are coming to an end.

But is NVIDIA a good oracle? I believe it is.

HARDWARE
Wired Hates Friends

Wired has released an analysis of the ‘Friend’ pendant and, to the surprise of no one, they hated it. Titled ‘I Hate My Friend’, the journalists “go off” against the hardware, explaining that, while it aims to offer constant companionship, the sentiment toward the product is mostly negative.

For instance, its tone comes across as snarky and judgmental rather than helpful or friendly, leading to irritation rather than comfort.

Privacy concerns also loom large since it is always listening, raising questions about data use and legality in public settings. On top of that, technical glitches, like mishearing in noisy environments, connectivity dependencies, and occasional memory resets, clearly undermine its reliability.

Overall, in their view, the product feels intrusive, socially awkward, and not yet polished enough to justify its promise of AI friendship.

TheWhiteBox’s takeaway:

Besides the technical limitations, which can be addressed, the reality is that no one wants to be over-analyzed by your friend’s vigilant “friend” AI.

I believe many people in Silicon Valley don’t go outside enough and, for lack of a better term and excusing my language, have their heads too much up their asses.

This is something that has been built on the assumption that people will welcome you recording everything they say, especially at a time when people can get canceled by almost anything.

Maybe I’m being too harsh about the idea, but the reactions from the Wired journalists are the reactions I would have expected. One thing is to have a device like the telepathic headset we saw on Tuesday, which can record your own thoughts or help you have conversations in the wild you don’t want others to hear (the literal definition of a privacy-centric product), and another is a product that forces you to relinquish your privacy and that of others.

This won’t work, and OpenAI needs to be very careful as to how they present their upcoming hardware in collaboration with Jony Ive; the world doesn’t want spy products. If you feel lonely, sign up for a CrossFit Box, and you won’t feel alone anymore.

Closing Thoughts

This week has brought a wealth of educational content, which I hope you find valuable. From learning how to interpret hallucinations to understanding NVIDIA’s roadmap, I believe you'll find it informative.

We’ve also seen the rise of a new star, Oracle, and the grievances of Microsoft and OpenAI.

But the biggest takeaway for me is the acknowledgement of AI’s latest hardware fail, the Friend pendant. Every single AI hardware release has been a tremendous failure, one after the other. The near-telepathy product we saw on Tuesday holds much greater promise, but the feeling is that when push comes to shove, people don’t actually want AI everywhere in their lives.

Certainly not myself. I just hope the AI future that’s dawning on us is one where interactions are mostly professional, and AIs don’t become our best friend, our lover, or, who knows, our silent killer.

Until Sunday! 

THEWHITEBOX
Join Premium Today!

If you like this content, by joining Premium, you will receive four times as much content weekly without saturating your inbox. You will even be able to ask the questions you need answers to.

Until next time!

For business inquiries, reach me out at [email protected]