THEWHITEBOX
TLDR;

Hey there! This week, we have another claim from a new lab that has, allegedly, found AI’s “Holy Grail”, Anthropic’s super interesting and meaningful dreaming feature, why AI Washing is no longer in vogue, why semiconductor stocks are all doubling in value this past month, a new string of very powerful OpenAI releases, and a textbook example of what I believe is the future of enterprise AI.

Enjoy!

RESEARCH
Have we finally found the Holy Grail?

If you ask AI researchers what the holy grail of Large Language Models (LLMs) is, it’s probably subquadratic attention that actually scales.

I know, if you aren’t an AI geek like me, that sounded like gibberish, but that’s precisely why you’re reading this newsletter, because I can explain this stuff in simple terms.

I went into excruciating detail here recently, but to simplify, at the heart of every AI model you’ve used lately sits a component known as the Transformer, which does two things: processes the sequence (how words in the input interact with each other, like picking to which noun an adjective refers), and adds internal model knowledge to enrich the context.

That is, a prediction is made by considering both the input context and the model's prior knowledge of the topic. The issue is that the sequence-level processing, which we call ‘attention’, is extremely computationally intensive.

As every word has to talk to every other word, it requires a lot of computation and grows quadratically with sequence length (i.e., if the sequence doubles in length, computational requirements quadruple, if the sequence triples, computation nine-folds).

Say you have the sequence “Everyone feared the Ottoman red-bearded pirate”. The word “pirate” will attend to all previous words and decide which ones carry more valuable information, and enrich its meaning with those.

For example, it may attend to “red-bearded” and build the concept ‘red pirate’, which ‘evolves’ the meaning of pirate from a standard one to one that had a red beard.

With that context, the LLM might induce that we’re talking about Barbarossa, which it has read a lot about, and start adding information from its internal knowledge about that particular pirate to enrich the prediction.

That is why LLMs can talk about things that aren’t in the context; they can retrieve them from their own memory.

To build such intuitions, every single word in the sequence goes through this “What should I attend to?” process, explaining why this gets very expensive very fast as sequences grow.

Therefore, finding a way to minimize the computational requirements of this procedure has long been the most coveted algorithmic breakthrough in AI.

Much like humans, we would expect there to be a way for AIs to conveniently decide which parts to attend to, thereby considerably compressing computational requirements. Now, a new company thinks they got it.

They are promising numbers that feel absurdly good to be true: a 12-million-context window (meaning their models can handle between 12 and 72 times more context at any given time than most models), reasonable speeds despite the long context, and one-fifth the cost of leading LLMs.

Really a win everywhere. No trade-offs, no nothing.

TheWhiteBox’s takeaway:

This is, of course, extremely likely to be too good to be true. How is a Lab that has only raised $29 million cracking the code that trillion-dollar Labs have been after for years?

I can’t say it’s impossible, but it’s very, very unlikely.

They show great scores on two very popular benchmarks, but we can’t discard that those benchmarks were simply ‘benchmaxxed’ (which you can do with a subquadratic model, at the expense of losing performance everywhere else).

If true, this is a tectonic shift in the space that could crash entire markets because our intuitions about how much compute we need would be challenged. The truth? It’s probably a sham.

CONTEXT ENGINEERING
Anthropic’s models can now dream

Completely outshone by their deal with SpaceX, Anthropic has presented Dreaming, a new feature where agents read their past interactions with users, identify useful patterns to remember, and store them, effectively acting as dreams that let AIs recap their day so that the next time the user talks to them, they’ve improved performance by having a better set of memories.

But let me explain to you why this is a much bigger deal than it seems.

TheWhiteBox’s takeaway:

Super neat feature that makes a lot of sense for two big reasons. The first one is obviously the adaptation it implies: models become progressively more bespoke, forming these memories in a task-specific setting, not having to deal with whatever task you’re giving them, while at the same time assessing what’s worth remembering.

But it also has a very strong engineering advantage: batching.

When we think about AI inference, GPUs can serve responses to multiple users in parallel. That means, for “roughly” the same effort, the GPU can serve more users, which better uses the hardware (e.g., for a single model forward pass, it can make predictions for 10 people instead of 1).

However, as you might guess, this doesn’t come free; the more batched users you have, the slower individual responses become because the computations take longer.

In other words, we are making a more efficient use of hardware in exchange for a worse individual experience. This is the main trade-off in AI inference.

The issue is that speed is one of the most, if not the most, important variables in user experience (people hate waiting), so inference has to be run at artificially low batches (meaning we could run inference at much, much larger batches) so that each individual response is blazingly fast.

In fact, the “fast modes” both Anthropic and OpenAI have introduced are nothing but queuing your responses into very small batches (I wouldn’t discard even batch 1), so that the responses are incredibly fast at the expense of paying much more for those responses (because, well, they are serving an entire server to you).

Therefore, a crucial point about this dreaming feature is that models can generate memories during demand troughs, outside your usage zone (while you’re sleeping), and run them at artificially high batch sizes.

Why? Simple, because you are getting batched with other users who are also not using Claude at the time, so they aren’t waiting for those responses.

This gives Anthropic the capacity to generate useful memories while being extremely efficient with hardware, reducing the cost per token by an order of magnitude (if not more).

The endgame? Although not explicitly mentioned by Anthropic, it’s clear to me: Actual dreaming.

You see, many scientists believe that dreams help our brains consolidate memories. I myself like to read things in the afternoon, “sleep on them” during the night, and only write about them the next day; the mental clarity with which I then understand things that the previous night I didn’t is incredible, so I just never write about anything remotely complicated if I haven’t slept on it first, period.

But how do we apply this to AIs?

In a future with enough compute (it will take time naturally), AI Labs will be able to create bespoke versions of their models for us by applying parametric dreaming. In layman’s terms, right now these “memories” are simply text snippets attached to the model’s prompt, so that it not only sees your request, but also the set of memories it now has as additional context.

Here, we’re talking about a future where these memories are used to fine-tune the model, becoming part of what it knows rather than relying on external contextual knowledge.

To be clear, this is not a super original idea I thought about; it’s something that’s already being explored as a way to approach continual learning, the idea that models never stop learning.

CYBERSECURITY
Mythos Already Making a Statement

Although Mythos’ risk to the world was clearly exaggerated, it’s becoming clear that the model does represent a step change, not in catastrophism, but certainly in utility regarding cybersecurity.

As an example, the Mozilla team behind Firefox fixed more bugs in April 2026 using Mythos than in the previous 15 months combined, and they’ve explained how Mythos has helped them in this very interesting blog post.

The biggest change they mention with respect to previous model generations is the rate of false positives. For the longest time, AI bug reports were the worst kind: confidently wrong.

They looked promising, but were actually shit. The Mozilla team now claims this has changed over the last few months, and AI models can be trusted way more.

Similarly, they also blame better harnesses, meaning they can now better control and steer models by better understanding how to treat them.

TheWhiteBox’s takeaway:

Mythos was obviously a step-function improvement. Everyone and their grandmother knew it, so it’s not surprising to see confirmation (though this particular statistic is undeniably incredible). It’s a pity that it was all framed as world-ending instead of “world-improving.”

The key thing people miss is that AI remains a tool that empowers the experts the most. We frame the technology as something a laymen on cyber like me could use to hack NASA, and that’s simply outrageously wrong.

By the way, this is also the reason why I believe the narrative “AI kills the need for knowledge” is hilariously misguided. If anything, it’s more important than ever.

The more of an expert you are on a topic, the more juice you can squeeze out of the AI bottle.

All things considered, once again, it’s pretty clear that Mythos will do more good than harmso what if we stop presenting AI as a negative thing for humanity if almost every single thing it does is good?

AI WASHING
Why The AI-Led Layoffs Narrative is Dying

Ever since Block opened the floodgates of the AI washing, using AI as a scapegoat for making mass layoffs, many companies have followed, as the idea of not taking blame for overhiring and simply accusing AI of your issues is a great way to improve your margins without taking a hit to your ego.

However, investors aren’t buying it anymore.

As I explain in this Medium post, which you can read for free here, the latest members of these clubs are probably regretting their decision right now, as the stock prices of companies like Coinbase, Cloudflare, and Upwork are having a really bad time since they announced AI-led layoffs.

But the key takeaway is what all this now implies.

TheWhiteBox’s takeaway:

It turns out, not only do investors not believe the narrative, which immediately indicates the CEOs are hiding something, but there’s a growing concern around the “good use” of this technology, to the point that being AI adopters can be detrimental if the use is, put mildly, “unsophisticated”.

It surely doesn’t help to learn that Coinbase’s CEO, one of the companies joining the AI washing club, claimed, and I quote: “Non-technical teams are now shipping production code and many of our workflows are being automated.”

Hilariously, later that evening, the product went down. And while I can’t confirm causality, that’s all many investors need to know about how unsophisticated, or dare I say clumsy, Coinbase is being about its use of this very powerful and equally complex technology.

Will AI now be unpopular to adopt? I don’t think so, but I’m sure good CEOs are taking notes on two fronts: AI washing is no longer in vogue, and unsophisticated use of AI leads to worse products.

THEWHITEBOX
Anthropic’s Deal with SpaceX

In what’s possibly the weirdest and most unlikely AI partnership ever, SpaceX and Anthropic have agreed for the latter to rent the entire Colossus 1 data center from SpaceX (formerly xAI), which represents 300 MW of top-tier accelerator compute (mostly NVIDIA GPUs, if not all).

Considering Anthropic and Elon supposedly hated their guts not long ago (due to political motivations, as Anthropic is considered considerably left-leaning, Elon is considered considerably right-leaning), this is an extremely bizarre deal which Elon himself had to clarify, explaining that he was impressed when he met the Anthropic leadership in that they are a net good for society, but SpaceX reserves the right to break the contract if they assess that Anthropic is not doing “what’s best for humanity”.

As part of the agreement, Anthropic also expressed interest in partnering to develop multiple gigawatts of orbital AI compute capacity. That is, space GPUs are once again on the menu.

Interestingly, it seems SpaceX won’t have access to the Claude models.

TheWhiteBox’s takeaway:

I’m not going to comment on the space GPU thing because, to be honest, nobody knows whether it’s the next trillion-dollar idea or like hoping pigs could fly someday, so there’s little I can talk about that front.

As for the terrestrial deal, the terms were not disclosed, but we can make a rough estimate of who gets what.

Colossus 1 is 300 MW. Using Jensen’s math of $50 billion in capital costs per GW, this is approximately a $15 billion capital investment from xAI. At 6-year depreciation, that means it's $2.5 billion in depreciation costs alone per year, plus the electricity costs that could be $500 million/year.

At 0.083 $/kWh, it amounts to ~$218 million/year on electricity costs, but these are rising, so it could very well be closer to $500 million.

Thus, to make money, SpaceX has to rent this at, at the very least, $3 billion. Adding some less-optimistic depreciation (Anthropic will fry these GPUs), that number grows to $4 billion or more.

I know not the entire data center depreciates as quickly, but GPUs are the majority of the cost, and I think I'm being optimistic about depreciation, so between $2.5 and $4 billion a year in depreciation costs makes sense to me.

Assuming a 15% margin, which SA considers to be a good Neocloud margin, and adding an ‘Anthropic is desperate for compute and will sign anything’ premium, we can assume 25% return, so that means SpaceX could be getting up to $5 or even $6 billion if they were aggressive in the negotiation per year.

As for what I think of the deal, it shows how desperate for compute Anthropic is, a company that considerably underestimated its own growth and is literally “dying from success.” In fact, Dario just acknowledged they underestimated compute requirements by 8x. On the other hand, it shows that xAI’s models are not really being used much.

I’ve always argued xAI could live just off internal demand from “Elon Corp,” but right now they're probably having those GPUs sit unused while incurring $2.5 billion/year or more in accounting losses the year they go public.

Seeing that Elon hates OpenAI even more, there were few other options that could work, so I think this is a good deal for both; Anthropic gets “immediate” access to a third of a gigawatt of compute at a time where, as we saw on Monday, getting compute live is a challenge, and SpaceX just got ~$2 billion/year in new ARR “out of thin air” months away from going public which, considering they “just” have $16 billion and mostly Starlink, it’s a nice add both in terms of size and diversification.

THEWHITEBOX
Semis Go Balistic

While the world viewed AI with moderate suspicion, most investors poured into NVIDIA as the better option to balance risk and reward. But now, the world thinks AI really is it, and the value is pouring down in a spectacular way to other hardware players.

Over the past month, the rally has moved into almost every layer of the AI supply chain: memory, CPUs, chip design, substrates, storage, and manufacturing exposure.

The PHLX Semiconductor Index has risen nearly 54% over the past 25 trading days, while the VanEck Semiconductor ETF is up about 44%, a pace now being compared with the late-1990s chip boom.

The biggest stars are the companies closest to AI infrastructure. SK Hynix and Samsung have led the Asian memory surge, with Samsung crossing the $1 trillion valuation mark and Hynix hitting record highs as demand for high-bandwidth memory continues to tighten. Micron and Sandisk have become the US memory/storage winners, with investors treating memory less like a commodity cycle and more like scarce AI infrastructure.

AMD has been another standout, jumping after strong earnings and a more bullish view of AI server CPU demand. That helped lift Intel and Arm, too.

And this is just a shortlist of all the semiconductor stocks that have massively rallied these past few weeks, which raises the question: Why?

TheWhiteBox’s takeaway:

The rally can be summarised into two narratives I’ve talked about a lot, and which have been central to my own investments: memory becoming secular and CPUs becoming an AI power play.

Ever since I invested in Hynix and Samsung (which Premium subscribers were made aware of), one has tripled in value in less than six months, and the other has doubled year-to-date.

My reasons at the time were that memory stocks, traditionally the living embodiment of a cyclical business (huge revenues at peak, terrible earnings at the troughs), were about to become secular; companies with predictable demand, which have huge connotations.

A cyclical business has, by definition, lower valuation multiples because you never know when you might run into a drought and see your stock cut in half, with investors valuing your business more on a multiple-to-book basis than on earnings, naturally depressing the valuation.

But with AI’s omnipresence and its increasing reliance on compute, compute is the most important in the world right now and will enjoy such a prominent position for years, which in turn guarantees demand for memory stocks that stop being cyclical.

That is forcing a valuation rerating, which is huge given how much money they are making right now, with Samsung set to be the most profitable company on the planet in 2027.

The CPU story is less evident but obvious in my book. Since I invested in AMD and Intel, both have tripled in value. Back in February, I predicted that the time of the CPU had come because AI agents rely on tools that must run on CPUs.

If AI servers were traditionally 4:1 or even 8:1 in the GPU:CPU ratios, now the values are converging to 1:1, and, in some cases, some particularly tool-heavy agent workloads require more CPUs than GPUs.

I’ve talked multiple times about the four current bottlenecks in AI: advanced packaging, memory, CPUs, and power, and placed all my investments around them (except for power, I don’t own any utilities, to my despair).

The next big bottleneck? Optics, but we’ll leave that for another time.

However, my last point is a word of caution. These stocks are rallying as if there were no limit to how much they can grow, and everyone is assuming AI has entered a parabolic growth phase.

But there could be limitations on revenue growth on the horizon.

One is AI reliability, which I outlined in the last newsletter; the other is revenue concentration, which looks like it’s growing by the day, given how concentrated Hyperscaler revenues are between just OpenAI and Anthropic.

AGENTS
OpenAI’s Codex Chrome Extension

OpenAI has released a Chrome extension for Codex.

The extension lets Codex work with websites where the user is already signed in, similar to the WebMCP Google released a month or so ago, but much more neatly executed. This means Codex can use browser-based tools that require authentication, such as internal dashboards, web apps, or other online services.

OpenAI says users can control which websites Codex can access. The extension is designed to run in the background across browser tabs, using approved sites when needed.

TheWhiteBox’s takeaway:

Really awesome integration that works incredibly well. It still feels like we’re once again forcing AIs to live in our world (which makes me believe these solutions are just transitory until we build the Internet for agents), but the fact that you can have AIs interact with websites without having to deal with login issues is a blessing when used well.

That said, there’s no way around it: you should be very careful about how you use these solutions, as the agent is literally acting on your behalf. Agents make mistakes, although this time they are doing so in your name.

REAL-TIME GENERATION
OpenAI Also Releases New Voice Model

Quite possibly the most impressive of all OpenAI releases today, the company has released three new voice models, allegedly scaling their intelligence to GPT-5 levels.

The first is GPT-Realtime-2. It is designed for live voice agents that can listen, respond, reason through requests, and use tools during a conversation. As mentioned, OpenAI says it has GPT-5-class reasoning, better long-session context handling, and more natural conversations.

The second is GPT-Realtime-Translate. It is built for live speech-to-speech translation. Developers choose the target language, and the model can detect the speaker’s language, translate the speech, and return both the translated audio and the text transcript.

The third is GPT-Realtime-Whisper. It is a live speech-to-text transcription model. OpenAI describes it as useful for meeting captions, workflow documentation, and other cases where spoken language needs to be converted into text in real time.

This is actually quite big. Let me explain.

TheWhiteBox’s takeaway:

By far one of the most impressive demos I’ve seen lately; the video in the link above is really worth watching.

The models are notably smarter, and they also handle conversation really well; talking to an AI is no longer a you-talk-then-me-then-me-again, and so on, and the AI knows when to shut up while also continuing to listen; it feels incredibly more natural.

The other element from the demo that shocked me was the real-time translation, which even supports several voices… speaking several languages at the same time, and even jumping back and forth between them.

I don’t know how on Earth they did this, but it’s amazing.

My gut tells me it’s that it’s simply a larger model, period. Audio models have been quite small for the longest time because they were not prioritized. Thus, it’s very likely that these are all derivatives of the huge base pretraining (codenamed ‘Spud’) that gave us the incredible model GPT-5.5.

Let’s not forget that OpenAI spent more than 2 years shipping new models from the same base model. With the new base model, Spud, every new model coming out feels like great progress.

CHATBOTS
OpenAI Also Releases GPT-5.5 Instant

The company says the model is faster, more accurate, and better at giving concise, useful answers. Its biggest reported improvement is factuality: OpenAI says GPT-5.5 Instant produced 52.5% fewer hallucinated claims than GPT-5.3 Instant on internal high-stakes evaluations across areas such as medicine, law, and finance.

The model is also meant to be better at image analysis, STEM questions, and deciding when a web search is needed. OpenAI says it should produce less cluttered responses, with fewer unnecessary follow-up questions, less excessive formatting, and fewer gratuitous emojis.

GPT-5.5 Instant also expands personalization. It can make better use of past chats, uploaded files, and connected Gmail when relevant, while OpenAI is adding “memory sources” so users can see some of the context used to personalize an answer.

The rollout begins for all ChatGPT users, while paid users can still access GPT-5.3 Instant for three months before it is retired. GPT-5.5 Instant will also be available in the API as chat-latest.

TheWhiteBox’s takeaway:

One key advantage of having a much smarter model on a forward pass basis (i.e., each individual prediction from the model is smarter than previous versions) is that the non-reasoning models, those that do not think for longer on tasks and immediately respond, get a particularly strong boost in performance.

OpenAI seems to think GPT-5.5 Instant represents such an increase, and from what I’ve seen, responses from the ChatGPT app lately are much snappier, so I suspect the Instant models, which were clearly an issue for OpenAI until recently, seem to be much more capable of absorbing a lot of the requests I send to the app.

DEPTH
This is the Future of Enterprise AI

Ramp, a company expense management company that looks more like an AI Lab these days, has just shown us what the future of enterprise AI looks like.

And I promise you I can’t overstate how true this is.

The idea is that enterprises don’t need customer support agents to be good at sales, nor a marketing agent with PhD maths skills; you need models that do their job as well as possible.

The issue is that AI, historically task-specific, has evolved into foundation models that are good at many things, great at none, totally the opposite of what companies truly need.

And while all these companies are using these foundation models and will adopt them for a myriad of cases, for most enterprise workflows, what they need is depth; a model that may not be able to do anything else, but does a particular task incredibly well.

This is what Ramp has done.

They have released Fast Ask, a specialized AI agent inside its Ramp Sheets product, designed to intelligently retrieve data from complex spreadsheets.

General AI models often struggle with spreadsheets, reading too much or too little, wasting time, and tokens. Fast Ask solves this by acting as a fast, lightweight “librarian” that quickly navigates workbooks, reads only the relevant data, and delivers clean answers to the main reasoning agent.

Built in partnership with Prime Intellect, Ramp post-trained a small open-source model (Qwen-based, ~3B active parameters) using reinforcement learning on finance-specific tasks.

The result?

Higher accuracy than much larger models, at Claude Haiku speed, while being significantly cheaper and faster than Opus, the much larger model.

TheWhiteBox’s takeaway:

Despite being perhaps even 100 times smaller than the models it’s competing with, it beats them.

But how?

The key is that the model is trained on the task. It’s not just a very large model which happens to be ‘good enough’ at any given task, it’s a smaller model, thus broadly dumber, but one that receives “on-the-job training”, becoming great at the task.

Once enterprises realize this, OpenAI and others will have a hard time selling AIs to enterprises unless they start offering on-premises, fine-tuned deployments to customers, something that Labs like Mistral are already doing with Forge.

Closing Thoughts

The acceleration is palpable, and every week I feel like I’ve left out many things I would like to talk about despite this newsletter already being almost 4,600 words long.

The big takeaways this week are:

  1. Markets are awakening to semiconductors being a much bigger deal than people realized

  2. Simultaneously, they are realizing many companies are “full of it” in their outrageously justifications about AI being the culprit of mass layoffs

  3. Product releases are increasing their frequency, with OpenAI and Anthropic releasing what’s basically a new feature every day

But if there’s news of all of the above, I want you to remember the last one. The future of enterprise is deep open-source models, not general-purpose private models.

I fear that if markets realize this, which they will, they will crash because Anthropic and OpenAI’s narratives hinge on open-source continuing to be irrelevant.

However, in fact, open source will probably capture most of the enterprise value once people realize it’s the best option because it allows you to train models on your company data.

For business inquiries, reach me out at [email protected]

Keep Reading