
THEWHITEBOX
TLDR;
Hello from Washington DC, my home again for another three weeks. This week, we have several huge announcements across all layers: technology, with releases like GPT-5.5 or DeepSeek v4, and one of the coolest AI projects I’ve ever seen in FlipBook, markets, and product.
Enjoy!

FRONTIER
OpenAI Launches GPT-5.5, New SOTA?

OpenAI has finally launched GPT-5.5, describing it as its most capable model yet and positioning it around longer, more agentic work rather than simple chat interactions.
The model is designed to handle coding, research, document creation, spreadsheet work, computer use, and multi-step workflows with less user guidance. It is available in ChatGPT and Codex for paid users, and in the API, where GPT-5.5 is priced at $5 per million input tokens and $30 per million output tokens.
Surprise, a price hike! Who would have expected? Well, if you’re a reader of this newsletter, you knew this was coming.
OpenAI reports gains across coding, tool use, professional work, and browser-based tasks. It says GPT-5.5 scores 82.7% on Terminal-Bench 2.0, 84.4% on BrowseComp, 75.3% on MCP Atlas, and 84.9% on GDPval, all focused on agentic and real-world tasks.
The company frames the model as better at understanding ambiguous requests, using tools, checking its own work, and completing complex tasks across several steps.
TheWhiteBox’s takeaway:
I’ve been using this model for the past day or so, and the feedback is great. It’s snappier, thinking adaptiveness has improved a lot (i.e., it knows when to think for longer and when it can answer directly), and the model is a blessing to use for coding.
The big question here is how this model compares to Anthropic’s Mythos. And the answer is that it depends.
While GPT-5.5 seems to be somewhat above Opus 4.7 (haven’t seen per-prediction comparisons, meaning GPT-5.5 might be smarter simply because it has higher thinking budgets), it’s not quite at the level of Mythos, as seen in the thumbnail.
However, GPT-5.5 Pro is another story: it's basically Mythos-level, but is actually available to use today. The flip side is that this isn’t a fair comparison. The reason is that, unbeknownst to many, OpenAI’s Pro family isn’t a different model at all; it’s a best-of-N sampling of the base model.
In other words, GPT-5.5 Pro is GPT-5.5 run multiple times on the task, and the answer you get is the “best” response, with “best” being the most common answer across the multiple tryouts. This is unfair to Mythos, which gets similar results after a single run.
On the flip side, this is also probably unfair to OpenAI, which has released GPT-5.5 again as a distillation of a larger model (they call it ‘Spud’ instead of Mythos) but did so more effectively than Anthropic with Opus 4.7. So the true equivalent to Mythos is almost certainly not GPT-5.5, which feels more like a model of Opus 4.7’s size.
In a nutshell, you can already tell by how confusing my explanation is that it’s really almost never an apples-to-apples comparison, and knowing who’s really technologically ahead is very hard to say.
But what’s not hard to say is who’s ahead product-wise, and that’s OpenAI, for a very simple reason: they have the best available model right now and/or the most deployed compute, which remains the key lever.
The biggest overall takeaway? Models keep getting better.
CHINA
Finally, DeepSeek has dropped DSv4

More than a year after they crashed markets with the release of DeepSeek R1, DeepSeek has released its new frontier model: DeepSeek v4. And it hasn’t disappointed.
We’re talking about the most advanced model we have deep architecture data on, so I’ll provide a deep analysis in our next Leaders post. But in a nutshell, it’s the largest open-source model ever released (almost two trillion parameters) and includes several incredibly promising avenues of research I must dive into in our next post. Stay tuned!
RESEARCH
Meta’s Employees are being Tracked… by Meta
Reuters reports that Meta is starting to install new tracking software on US-based employees’ computers to capture mouse movements, clicks, keystrokes, and occasional screen snapshots for AI training.
According to internal memos seen by Reuters, the system, called Model Capability Initiative, runs on work-related apps and websites and is meant to give Meta’s models examples of how humans actually use computers, especially for tasks such as navigating dropdown menus and using keyboard shortcuts.
A Meta spokesperson tells Reuters that the data is used for model training rather than for performance reviews and says safeguards are in place to protect sensitive content.
TheWhiteBox’s takeaway:
This is just the new normal. As AIs become increasingly pervasive and data-hungry, AI Labs need to find any possible way to get more data to advance new model capabilities.
Don’t expect any remotely probable sense of ethics or privacy to get in the way of the $1.5 trillion in AI spending waiting to be justified by revenues that, today, are easily 30 times inferior.
THEWHITEBOX
The Infinite Visual Browser

A group of researchers has just reminded us that AI doesn’t have to be all about chatbot assistants and ad-targeting engines, but can also be an amazing way to change how we consume the Internet.
Flipbook is a prototype "infinite visual browser" where your entire screen/UI is generated on-the-fly as AI-produced pixels (images or live video), with zero HTML, CSS, JavaScript, or traditional layout engine, but an AI model producing everything via images.
As there’s no layout per se, just frames, the video model reacts to your interactions, like you clicking for more information on a certain
This was done by fine-tuning LTX’s video model, which we covered a while back, to produce 1080p images at 24 frames per second.
Fascinatingly, at least at the time of writing, you can try Flipbook for free here (although the model isn’t as fast in this demo and there’s a lot of demand to test it).
TheWhiteBox’s takeaway:
I highly, highly recommend you click the link and see the videos for yourself. It’s incredible to think this is just a video model streaming images back to you in real time.
This is genuinely incredible, and the more of an AI expert you are, the more impressed you are.
Of course, you can argue against the point of this, which is not necessarily something the world was desperate for, as websites are already very performant as they are. Instead, it’s about making art out of the ordinary.
CHINA
New Week, New SOTA Local model

Over the past month or so, we’ve been on a streak of weeks where every new one gives us a new state-of-the-art local model, this time again Alibaba with their dense model Qwen3.6 27b model, a model that packs enormous intelligence per parameter, offering the same performance as Qwen3.5-397, just a generation behind and 15 times larger, and being fairly competitive with Claude Opus 4.5, a model that was frontier just 4 months ago, despite fitting in my MacBook Pro.
Truly incredible progress that raises the question: when will we have today’s frontier models running on local hardware?
TheWhiteBox’s takeaway:
But how real is this? I’ve been testing the model locally on my MacBook, and it’s safe to say it’s very smart.
But we must be very careful with claims here, because this model could very well be benchmaxxed, as is most often the case with local models. In other words, as AIs learn via training on data, you can “hide” the limitations of your model by training it specifically in those benchmarks you want it to do well.
In AI parlance, the model could be overfit to the eval set and not perform as well elsewhere, like a kid memorizing a maths test and scoring 10/10 without actually understanding a single thing in maths outside the test.
But what’s also safe to say is that small models are getting better, and eventually we’ll have today’s frontier models running in our consumer hardware.
Of course, those models will not be frontier by then, as ChatGPT Turbo3000 running on a 1,000 GPU cluster will be smarter. But here’s the thing: you might need ChatGPT Turbo3000 to create a complicated app, or cure cancer, but not to read your emails.
In other words, the small models of the future will be “enough” for most tasks, and that realization doesn’t get remotely factored into the revenue and profit projections of the big Labs, which assume the world will always prefer top models for even the simplest stuff. I disagree.
RESEARCH
Jeff Bezos new AI Lab
Jeff Bezos is officially an AI bro now, raising up to $10 billion for a new AI Lab designing models that understand the real world. Much like most recent AI Labs, such as Yann LeCun’s AMI Lab, they are all focused on the same problem: improving AI’s capacity to understand the physical world.
The crux of the issue is that frontier models today, models like ChatGPT or Claude, have pretty great coding, textual, and speech capabilities, but severely lack spatial intelligence.
The reasons are many; it could be a fundamental limitation of the current prominent architecture, the Transformer, but that seems very unlikely. Instead, the most likely reason is data.
For AIs to capture spatial patterns, you have to feed the AI spatial data, which is very scarce in the digital world.
Put simply, you can’t teach about gravity using text. Seeing this fundamental limitation, several companies have tried to capitalize on it, like World Labs and the aforementioned AMI, with Bezos’s new pet project being the latest in the long tail of startups trying to give AIs a sense of our world.
FRONTIER RESEARCH
The end of reasoning models?
This article by The Information is very timely, considering it discusses, based on what they are hearing among SF bros, exactly what I described in our last Leaders segment: the return to models that “think less, but better.”
As I explained in the last newsletter, Opus 4.7 behaves considerably differently from how models have behaved over the last two years, putting more effort into each prediction rather than improving performance by doing many of them, which led to the advent of reasoning models that “think for longer on tasks” by verbalizing very long reasoning sequences.
Instead, Opus 4.7 and GPT-5.5 obtain as good results or better than their predecessors, with up to ten times fewer required tokens, reducing overall latency and most likely cost too (I say most likely because making models much smarter per prediction, or “better thinkers,” has required making models much, much larger).
But is this a good move?
TheWhiteBox’s takeaway:
This turn toward better-quality thinkers was long overdue, but the AI Labs were forced to change by the sheer amount of compute required to serve reasoning models to users.
Don’t get me wrong, inference-time compute, the ability of models to think for longer of tasks, isn’t going anywhere, but intelligence per dollar is going to be the metric moving forward.

MARKETS
The CPU Supercycle is here
Yesterday, all the major CPU stocks had historical surges, led by Intel’s 19% stock surge on a single day, AMD netting a 15% gain, ARM another 15%, NVIDIA a massive 5%, and Amazon almost 4%, telling us that the stock market has finally realized CPUs are back in vogue and in a massive shortage.
Don’t mean to imply I told you, but…
TheWhiteBox’s takeaway:
On February 23rd, I wrote about the massive incoming CPU shortage and how it would lead to huge stock gains for chip stocks (I’d been talking about this shortage earlier).
Since then, AMD is up 76.5%, Intel 83.5%, and ARM 88%, although, to be fair, I did talk about ARM as a key beneficiary, but I wasn’t as convinced of the play because many Hyperscalers were steering away from ARM’s own Neoverse CPUs and only licensing the ISA. It turns out, there’s simply too much demand for CPUs.
This will only get worse due to agentic workloads. But what’s with this “all-of-a-sudden” CPU craze?
I went into great detail in the CPU post, but the idea is that until now, Generative AI workloads have been heavily skewed toward the AI. In other words, although GPUs need CPUs just to know what work needs to be done, the majority of the work is done by the GPUs, where the models run.
However, with agents, AIs that can use tools, these tools mostly run on CPUs, so the compute balance of the workload shifts massively in favor of CPUs in the most extreme cases, and can be 50/50 for the average agentic trace.
In plain English, this means AI providers now need not only a huge number of GPUs but also a huge number of CPUs.
And as happened with GPUs, or perhaps even worse, there isn’t remotely enough CPUs to sustain the demand, which has already led to considerable price hikes from companies like AMD or Intel, and turning their core business, the CPU, into a fully-fledged and reinstated ‘AI play.’
Is the market exaggerating with yesterday’s incredible gains, probably, but it’s clear that markets are recalibrating their intuitions of what ‘AI hardware’ means, and that’s a whole lotta CPUs.
MEMORY
SK Hynix Scores Historical Earnings
SK hynix presented its quarterly results yesterday, and it was a massive blowout. Revenue came in at 52.5763 trillion won ($35 billion), operating profit at 37.6103 trillion won (~$25 billion), and net profit at 40.3459 trillion won (~$27 billion).
The net profit was higher than the operating profits (profits from its core business) due to non-operating coming from areas like exchange rate gains.
Revenue was up about 198% year over year, while operating profit rose a little over 400%, setting record quarterly results.
Reuters reports the figures broadly met or slightly beat expectations, and the company’s operating margin reached roughly 72%, which is extremely high for a hardware company.
The company said demand for AI chips remains above its current production capacity, which supports pricing and keeps the market tight.
TheWhiteBox’s takeaway:
Exceptional results that exceeded even our very optimistic expectations; what an absolute killer of a company.
The question is, of course, the sustainability of these results, which do look quite unsustainable if I’m being honest. However, people are too quick to think Hynix will just go back to pre-ChatGPT levels, as with most other cyclical experiences in the memory market, but I don’t think that theory will hold at all in this era.
AI is extremely memory-hungry and also a very hardware-intensive industry that burns through hardware like fire through paper. GPUs depreciate rapidly, so the need for memory will persist for years to come.
And as I always mention, you can think Hynix’s business is cyclical, but making that case while simultaneously not making the same case for NVIDIA, whose GPUs need memory like humans need oxygen, or for any other semiconductor company for that matter, is nonsensical.
As these companies put more supply into the market over the next few years, margins will fall, but thinking that Hynix will just go back to how it was before hasn’t actually studied AI’s computational demands enough.
I understand why investors aren't yet fully ready to embrace the new state of the semiconductor industry, but when they do, it’s going to be fun to see this company turn into a rocket ship.
CODING
SpaceX to Acquire Cursor for $60 Billion
And just as we predicted just days ago when SpaceX announced they were going to give compute to Cursor so that they could train their next coding model, the company has now announced what’s basically an acquisition of Cursor (or the right to execute it at risk of having to pay a $10 billion fine in return) for $60 billion.
If it materializes (which surely it will, considering how much xAI, which now belongs to SpaceX, needs a powerful coding model to continue to fight the AI wars), Elon Musk gains access to one of the most used AI coding software in the world (probably only fourth to Codex, Claude Code, and Copilot VSCode), and the huge amounts of data they gained over their three years of existence.
For Cursor, it would be an impressive run, going from a fork (a copy) of a free open-source project by Microsoft to a $60 billion acquisition in just three years. And to think Cursor was born as a Visual Studio fork, what has Microsoft been doing the last three years?
TheWhiteBox’s takeaway:
As I’ve explained several times, Anysphere, the company behind Cursor, was building its product the right way. They first started as a provider of third-party models, but they’ve slowly become capable of training truly great models, offering what’s basically frontier-level performance in many coding tasks.
To be very clear, that is precisely what SpaceX is buying; not the software, but the talent and data Cursor has amassed at training coding models, which have become the primary killer use case for this technology so far.
HARDWARE
OpenAI Shares Compute Targets

OpenAI has shared its compute targets for 2030, and they are astronomically high: 30 GW, or roughly $1.5 trillion in spend at current prices.
Back in January 2025, they shared similar estimates that were rumored to have been cut down, but it seems this wasn’t the case, because they are. And we can’t blame them, considering compute has proven, time and time again, to be the most important thing.
TheWhiteBox’s takeaway:
Sam Altman’s much more aggressive approach to securing compute than Dario's might prove decisive in the war between OpenAI and Anthropic.
The latter completely underestimated the need for compute and is now scrambling to find more where there isn’t, probably signing far-from-ideal deals with Amazon (below) or Google, the latter of which is planning a new $40 billion investment into Anthropic.
Anthropic is in a really bad place right now in terms of compute—not in terms of valuation, which it’s even hitting 1 trillion in secondary markets—because they can’t possibly meet the demand they are seeing (rejecting that demand is not an option), and has become the staple for poor uptimes and unjustifiable behaviors with consumers, like suddenly cutting off Claude Code from the Claude Pro subscription ($20/month), an unprecedented decision that is morally unacceptable knowing some people paid for the yearly Claude Pro subscription thinking they would get Claude Code in the pack.
Still, I have yet to hit a single rate limit in ChatGPT or Codex in my more than three years of membership (been paying for the $200/month subscription for over a year now, previously the $20/month one).
COMPUTE
Anthropic’s new multi-GW deal
Anthropic has expanded its deal with Amazon by signing a new long-term agreement centered on cloud infrastructure, chips, and financing.
The company, which might have just gotten compromised, and a set of unauthorized users might have gained access to the ‘too-powerful-to-release’ model Mythos, says it has committed to spending more than $100 billion on AWS technologies over the next ten years in exchange for up to 5 gigawatts of compute capacity.
Amazon separately says it invests $5 billion immediately, with another $20 billion tied to commercial milestones, on top of the $8 billion it had already committed before.
TheWhiteBox’s takeaway:
At this point, Anthropic basically belongs to Amazon and Google, something I’m not sure Anthropic’s leadership was too keen on. But the desperation to access compute puts you into situations you would have wanted to avoid.
HARDWARE
Google Announces New Flagship Hardware Products

Source: Sundar Pichai, Google’s CEO
Google made several AI hardware announcements at Cloud Next 2026, centered on a broader push to build infrastructure for what it called the “agentic era.”
The company unveiled its eighth-generation TPUs as two separate chips for the first time: TPU 8t for large-scale model training and TPU 8i for low-latency inference. This is Google's first try at a new specialization layer: if TPUs were born for AI, now they are being specialized into training and inference chips.
As you may guess, if you’re a regular, the biggest difference between these chips is memory, as inference is extremely memory-bottlenecked (how fast a model is served is most of the time a function of how fast memory can move data). The inference chip has three times as much SRAM (on-chip memory) as the previous TPU generation and greater memory capacity.
The other big changes come at the network level. Google now claims it can synchronize training over one million TPUs. That is, the equivalent of more than half a gigawatt of Blackwell-equivalent compute on a single training run; an absolute monstruosity of a cluster.
And talking about clusters, alongside the new TPUs, Google also expanded its Nvidia-based lineup. It announced A5X bare-metal instances built on Nvidia Vera Rubin NVL72 systems, a potential multi-site cluster with around 960k NVIDIA Rubin NVL72 chips, another monster data center that will become available eventually, most likely not before 2027. We’re talking about several gigawatts of compute in that cluster.

TheWhiteBox’s takeaway:
The narrative around Google is simple. As long as AI is a computer game, Google should win, because nobody has more computing power than they do.
And I say 'should' because with Google, you just never know, and the fact that they are by far the richest Lab on the planet and still trailing OpenAI and Anthropic in key money-generating areas like coding or agents tells it all.

IMAGE GENERATION
OpenAI Launches GPT-Image 2

OpenAI is back again as the state-of-the-art of image generation… and by a long shot. They have announced ChatGPT Images 2.0 as a major upgrade to image generation in ChatGPT.
The company’s examples emphasize better prompt precision, stronger typography and multilingual text rendering, improved realism and stylistic range, more coherent multi-panel or editorial-style compositions, and support for practical design tasks such as posters, infographics, brochures, bookmarks, and marketing assets.
Furthermore, the model seems particularly strong at writing, generating graphs, slides, and whatever you ask, with no mistakes and pristine quality, as shown in the OpenAI GW news market section, which was generated using this model.
TheWhiteBox’s takeaway:
Saying this is the new SOTA of image generation feels almost like an understatement. By far the most impressive example I’ve seen is this one, where the AI is capable of generating valid barcodes; literally mind-blowing.
Now it’s Google’s turn to answer back.
But I do have to say I did not expect Google to ever lose the lead in image generation, given the huge data gap they have thanks to YouTube. Turns out, I was wrong.
AGENTS
“Always-On” Agents
OpenAI has launched “workspace agents” in ChatGPT, a new shared-agent feature aimed at teams on ChatGPT Business, Enterprise, Edu, and Teachers plans.
The agents are designed to handle repeatable work across connected apps and can be created, previewed, published within a workspace, and run inside ChatGPT or Slack.
OpenAI describes them as Codex-powered agents, and the company says they are rolling out in research preview, with Business availability arriving gradually over the next few weeks.
TheWhiteBox’s takeaway:
In another episode of ‘Silicon Valley rediscovers decades-old tech’, it seems all Labs are now pushing cron-based products: first, Anthropic with Routines, and now OpenAI with this.
Jokes aside, this looks part of the broader push toward “always-on” agents that OpenAI is currently testing internally via the codename ‘Hermes’.
This is basically a text-to-agent system that you can configure to run at specific times via cron jobs. Good feature that, with the right integrations, can be quite powerful, but this is not remotely as fancy as it looks.
That said, there is an obvious application for agents eventually: run an AI in parallel with you, executing the tasks that need reviewing while you do other stuff. Naturally, the issue here is finding the balance between reliability and cost.
Running agents 24/7 consumes tokens like few things in life; it’s literally bankrupting Anthropic, to the point that they had to close subscriptions for customers using agent products like OpenClaw. Worse, agents aren’t very reliable yet, so failures happen, which means that token use is even more inefficient.
Still, this will eventually work; there’s too much at stake not to. And personally, I’d prefer to have those agents run locally.
CONTEXT ENGINEERING
Google Workspace Intelligence
Google introduced Workspace Intelligence for Gemini Workspace at Cloud Next 2026 as a new semantic layer that connected emails, chats, files, calendar context, collaborators, and projects across Google Workspace.
The company said the goal was to let Gemini understand a user’s broader work context in real time, so it could retrieve relevant information, rank priorities, and support more agent-like actions across Docs, Sheets, Slides, Drive, Gmail, and Chat.
The launch also came with a broader set of Workspace updates tied to that same strategy.
Google said Chat would become a more central interface for interacting with Gemini,
while Sheets gained natural-language spreadsheet creation and richer dashboarding tools,
Docs gained infographic generation and image editing features,
Slides gained one-pass deck generation using company templates,
Gmail gained an AI inbox and search overviews,
And Drive added shared project context hubs.
TheWhiteBox’s takeaway:
The way Google is completely humiliating Microsoft in the enterprise workspace business with AI features is truly unacceptable, given that Microsoft has direct access to OpenAI’s entire IP and models.
The end game here is, of course, turning Google enterprise products into an “all-things Gemini” experience, a truly declarative form factor in which humans communicate directly with agents, and the agents have access to Google’s entire enterprise platform to access the necessary context to provide answers that are grounded in the user’s context.
As for the per-product features, some are a delight to use, particularly Gemini for Google Slides, which I consider as good or better than Claude for PowerPoint or Claude Design.
While Google has yet to nail coding agents, in terms of enterprise, it’s head and shoulders above the Microsoft system. Unless they massively improve Copilot, they are going to start losing customers to Google, and fast. Nonetheless, I’m already seeing several companies in the Microsoft ecosystem signing workspace deals with Google.

Closing Thoughts
What an eventful week. From the releases of the two best private and open models in GPT-5.5 and DeepSeek v4, to a huge step-function improvement in image generation with GPT Image 2.
Furthermore, small local models also continue to make huge progress, with the current generation being as good as the frontier we had 6-12 months ago. Just picture yourself when you first interacted with GPT-4 back in 2023. Today, superior models run on an iPhone.
The other big takeaway for me is the renewed conviction that AI spending is here to stay. For one, AI is beginning to deliver on its promises; the things I’m capable of today are nothing short of extraordinary relative to what I was able to do just six months ago.
But spending will only accelerate; Google’s plans look set to cross $200 billion in investments in 2027, and Hynix, Samsung, and now the CPU players are all smashing previous records on both top line and margins; it really feels like we’re in a golden age of semiconductors, one that is just getting started.
Stay tuned for the DeepSeek deep dive, where we’re going to explain in incredible detail what the state-of-the-art of AI research looks like.

Give a Rating to Today's Newsletter
For business inquiries, reach me out at [email protected]
