THEWHITEBOX
TLDR;

Welcome back! Today, we are once again packed with insights, from model releases like GPT-5.4 and the incredible UNI-1 from Luma, research that shows how AIs can’t control their own “thoughts”, how scientists have replicated a fly brain, Oracle’s ongoing debt problems that might lead to thousands of layoffs, ChatGPT for Excel, and a beautiful piece of research that might change how models are served to users, and more.

Enjoy!

THEWHITEBOX
GPT-5.4 is out, a new flagship from OpenAI

OpenAI has finally released GPT-5.4. It’s positioned as its “most capable and efficient frontier model for professional work,” and the launch spans ChatGPT, the API, and Codex, with a separate GPT-5.4 Pro tier for maximum performance on harder tasks.

The biggest emphasis is on professional knowledge work. OpenAI highlights stronger performance on tasks like slide decks, spreadsheets, documents, legal analysis, financial modeling, and long-horizon deliverables.

The three main themes of this release are computer-use agents, larger context windows, and hallucinations.

OpenAI says GPT-5.4 is its first general-purpose model with native state-of-the-art computer-use capabilities, meant to let agents operate computers and complete workflows across applications.

It also stresses the very large context window, about 1.05 million tokens (~750,000 words), which it presents as important for planning, executing, and verifying long multi-step tasks.

They also push reliability and factuality. The release page says GPT-5.4 is their “most factual model yet,” with individual claims 33% less likely to be false and full responses 18% less likely to contain any errors than GPT-5.2 on a set of flagged factual-error prompts.

TheWhiteBox’s takeaway:

From my first interactions with it, it’s a very good model. It’s faster, understands nuances better, and, much like the new GPT-5.3 Instant, isn’t as literal as previous models (meaning they were too eager to correct the slightest of mistakes to an undesirable extreme).

They do have a nice new feature that lets them declare their intent back to you. Before, once you asked them for something and they started “thinking”, you really didn’t know what they were doing. Now, before thinking, they declare what they are going to do:

And no, no ‘extreme thinking mode’, meaning The Information was, once again, not accurate

In other words, it’s much more communicative during its long thinking runs, which immediately improves customer experience.

Furthermore, although I haven’t put it to the test myself, some renowned mathematicians are swearing this model is extremely impressive, to the point that some are saying it feels like their personal “move 37”, or even more.

This is a reference to the famous move 37 in a Go game between AlphaGo and Lee Sedol a few years ago, which many claim was the first example in history of an AI being genuinely creative (we even had documentaries about it). The move made no sense at all at first, shocking everyone, including Lee, who even left the room for a while, but it turned out to be a great one that no human in history would have made. Here’s the move if you’re curious.

AGENTS
And with GPT-5.4 Comes ‘ChatGPT for Excel’

With the release of GPT-5.4 has come an unexpected surprise: a ChatGPT plugin for Excel. As you can see above, the plugin opens a ChatGPT window (synced with your ChatGPT account), available for all paid tiers in Canada, the US, and Australia (I’m using a VPN to access it as I’m in Spain).

TheWhiteBox’s takeaway:

This comes after OpenAI acknowledged its efforts to improve GPT-5.4's financial capabilities. The product works seamlessly with your Excel desktop app, but, sadly, makes a lot of mistakes, at least from my experience.

The positive thing is that this is the worst it will ever be, but right now feels like something you can use to get a financial model going, not something you can trust to be 100% hands-off.

Whether it’s better or worse than Claude Cowork for Excel, I do not know, and it’s still very early to trust third-party opinions (it’s still a beta product, so we must also be conscious of that).

VIDEO EDITING
Local SOTA Video Editor

LTX Desktop is an open-source AI video editor from LTX that tries to bring generative video into a real desktop workflow.

Instead of acting like a simple text-to-video demo, it combines generation and editing in one app, with support for text, image, and audio inputs, plus timeline-based editing and export to standard production formats.

TheWhiteBox’s takeaway:

The highlight is the local-first focus. On powerful consumer hardware, users can run video generation locally; on unsupported hardware, the app can fall back to API-based generation.

Haven’t tried it because I don’t do video editing, but the tool looks crazy impressive. The magic here is obviously the AI, but it’s very surprising to see that they have allowed models to run locally, an open-source solution that has become a rarity in AI these days.

AGENTS
Google Workspace CLI. A Dream Come True for Agents

Source: Author

Google has released the Google Workspace CLI, a command-line tool that lets developers and AI systems work with Google Workspace apps like Gmail, Drive, Docs, Sheets, Calendar, and Chat from a single place.

The main idea is convenience. Instead of using separate APIs and custom scripts for each Workspace product, developers can use a single shared tool to access multiple Google services. This makes it easier to automate tasks and build software that works across Workspace.

TheWhiteBox’s takeaway:

The big reason this matters is that it is the ideal tool for agents to interact with your workspace.

As we saw in our last Leaders post, where we covered how to start building your own AI secretary, we integrated it with an Outlook component I built myself so that the AI, which I communicated with via Telegram, could access my Outlook email accounts.

This is the same, but applied to Google and offering way more tools than I implemented, as it not only connects your agent to Gmail but to the entire Google Workspace and is maintained by the Google team itself.

Put simply, your agent is now a first-class citizen in your Google Workspace (email, calendar, drive, sheets, etc.), and thanks to the +100 skills this release includes (the same skills we discussed in that post, which I encourage you to read), your agents are now capable of doing pretty incredible stuff with your Google ecosystem we’ll soon explore in this newsletter.

IMAGE GENERATION
Luma’s Uni-1, Google’s Biggest Rival?

I thought Google was alone in the lead of the image generation models and, well, I was totally wrong, to the point we might have someone beating Nano Banana.

That someone is Luma’s UNI-1. The main claim is that UNI-1 can plan and reason through visual tasks before generating images, and it reports strong results on reasoning-heavy visual editing and fine-grained visual grounding benchmarks, beating Nano Banana 2 across the entire RISEBench, a benchmark specialized not in evaluating image quality only, but evaluating models on tasks that require heavy reasoning.

That is, models succeed only if they generate the image successfully on a task that requires heavy reasoning (actually solving a problem, not just drawing a bird).

The model is also supposedly high on “spatial intelligence,” meaning it can understand 3D concepts, being capable of doing things such as the one below:

TheWhiteBox’s takeaway:

I’ve been testing the product a bit over the last few hours, and it’s absolutely incredible. I’m beyond impressed.

As always, we need to let models stand the test of time, but I could perfectly envision this model becoming the new state-of-the-art for reasoning-intensive image generation.

For instance, I gave it one of my last Medium articles, and it one-shotted a pretty impressive slide deck. Yes, the model turbocharged on the hype train (my Medium piece wasn’t nearly as dramatic), (by far, actually) and way superior to what Google gives me with NotebookLM.

Is this better than what a professional creative would do? Hell no, and a creative using this tool would surely blow my results out of the water.

But here’s the thing: I didn’t pay a single dollar for this, and the subscription costs $30/month (to be fair, I haven’t made up my mind on whether I want this, as I’m already “forced” to pay Nano Banana 2 as part of my Google Workspace subscription; I’m not sure it’s that that worth it for me).

Importantly, I looked at the model’s trace, and it continually reviewed its own generations to identify issues or mistakes, showing self-correcting capabilities that, to this day, are actually quite rare. All in all, I’m beyond impressed.

Good job, Luma.

RESEARCH
The First AI Fly Brain

In one of the most fascinating studies in a while, a group of researchers has mapped an exact copy of a fly’s brain (125,000 neurons, 50 million synaptic connections, the entire connectome).

But not only that, but they have embodied that brain in a simulation, and, incredibly, the fly started executing the exact same behavior as flies; it started walking, flying, grooming, feeding… all the exact behaviors you would expect from a fly.

As you can see in the link, the artificial brain (which, again, is an exact copy of the original) is introduced into a simulated fly in a physics simulation, meaning that the brain’s activations materialize into embodied output.

This is the first time we have effectively replicated a brain and observed identical behavior to the real thing.

The next stop? The mouse brain.

TheWhiteBox’s takeaway:

What makes this news unique is that it represents a profoundly different approach to artificial intelligence.

Until now, our “only” approach has been deep learning: training artificial systems to mimic human or animal-like behavior using gradient descent. In other words, mostly trial and error learning, reinforcing good tries and punishing bad ones so that, over time, the AI maximized the former.

Instead, here we just replicated the exact fly brain, and the behaviors “emerged” on their own. The promise?

If we could one day mimic a human brain (86 billion neurons, 100 trillion synapses), we could get machines that are essentially human.

RESEARCH
OpenAI Shows AIs Can’t Control Thoughts

With the release of GPT-5.4, OpenAI has also published interesting research on the ability of AI models to control their own thoughts, something that could actually be… good?

A concerning reality about current frontier models is that they are big black boxes, meaning we don’t really know what is going on inside them. One of the best ways we currently have to investigate what they are, for lack of a better term, “thinking”, is to monitor their chains of thought.

Models like ChatGPT ‘verbalize their thinking’, meaning they need to write down what they are thinking (the chain of thoughts, CoT) as they do so, like a person who can’t think unless they speak their mind.

Therefore, if the AI is thinking something harmful, we can intervene immediately because the AI “declares” it’s about to behave badly. And for the most part, this is a good approach because AIs usually show in their reasoning what they are intending to do. But sometimes, what they show in their CoT may not reflect their real intention.

In other words, sometimes models say they are going to do ‘x’ and then do ‘y’. Thus, here OpenAI wanted to investigate to what extent AI models can “hide” their real intentions by purposely contaminating or obfuscating their reasoning CoT to hide real intent if they understand that what they are about to do is something humans would stop.

In other words, say the model has been trained to achieve its goals no matter what, but starts doing stuff the user does not want. In that case, the user may say, ‘Please close the session.’ Despite this being a clear order from the user, it prevents the agent from achieving its goal, so the agent may decide to ignore your instruction in pursuit of that goal. This is an example of what we describe as reward hacking, because the model ‘hacks its way’ into achieving the goal by resisting your command.

Knowing that you will force quit if the agent openly disagrees with you, the AI might say ‘ok, closing session…’ while not actually doing so in practice. Seems scary, right?

But “luckily” for us, it seems AIs have literally almost non-existent control over their own thoughts, as shown in the thumbnail.

In other words, they mostly cannot obscure their reasoning on demand even if they want to.

TheWhiteBox’s takeaway:

These AIs are truly peculiar entities. And yes, we are clearly building faster than we understand them, creating genuine safety concerns.

But no, it’s not like AIs have gained consciousness as Anthropic has suggested; it’s much simpler to explain: reward hacking. They have been mathematically optimized to achieve goals, so resisting instructions that go against their “nature” is a direct, predictable result.

In other words, they can behave badly and go rogue, but not because they’ve gained the desire to do so; it’s because we have incentivized them.

I don’t think it’s a coincidence that Anthropic’s models, despite being the AI Lab that claims to care the most about safety, have been found to be the most reckless and willing to do “all it takes” to win.

To be clear, I’m not saying Anthropic’s researchers are bad people, but their obsession with creating safe AIs indirectly leads to reckless behavior by these entities. Think of it as how kids behave; if you tell them not to do ‘x’, they are now ten times more likely to do x.

DEBT
Oracle Job Cuts… in the Thousands?

According to several sources, including Reuters, Oracle is reportedly planning thousands of job cuts as rising data-center spending strains cash flow.

Reuters says the cuts could begin as soon as March, affect multiple divisions, and be broader than Oracle’s usual rolling layoffs, with some roles targeted because the company expects AI to reduce demand for them. The reason is none other than Oracle’s aggressive buildout of AI infrastructure.

While it has become a larger cloud-computing player, partly through a $300 billion OpenAI deal, and its AI infra business is expected to generate future revenues well above $500 billion over the next several years, investors are worried about how it will finance the expansion needed to serve OpenAI, xAI, Meta, and others.

The company said in December that fiscal 2026 capex would be $15 billion above the $35 billion estimate it had given earlier, and in February, it outlined plans to raise $45 billion to $50 billion this year for cloud infrastructure.

Unlike other Hyperscalers with similar-sized buildouts (much larger, actually), Oracle doesn’t have the free cash flow to fund this organically and is relying heavily on debt issuances to fund the party.

The problem? They simply don’t have the money, and investors don’t really trust Oracle much, given all this debt risk-taking. How do we know that? Simple, by looking at how its debt is trading.

As mentioned, Oracle is financing this massive buildout by issuing corporate bonds (i.e., borrowing money from investors in exchange for interest).

Looking at their latest debt bond issuance from last month alone, the weighted-average coupon was already quite high, at 5.82% (they issued the debt in several tranches with different maturities and thus different interest rates).

In layman’s terms, the average loan interest Oracle agreed to pay for that debt issuance was 5.82%, well over what the US Government pays for its debt (normally used as the reference for what the lowest interest rate should be, as the US Government is the world’s most trusted borrower)

But even worse, after just a month, the bonds are already trading below par in secondary markets, suggesting a considerable risk repricing.

  • The 5.700% Feb. 4, 2036 notes around 98.85 price / 5.81% yield,

  • the 6.550% Feb. 4, 2046 notes around 95.80 / 6.90% yield,

  • the 6.700% Feb. 4, 2056 notes around 95.37 / 7.07% yield,

  • and the 6.850% Feb. 4, 2066 notes around 94.35 / 7.03% yield

But unless you’re familiar with those terms, you’re probably asking, what does all this mean?

A “bond” is just a loan in which Oracle says, “Give me $1,000, and I’ll pay you that money back with a fixed interest, called a coupon, of say 10%.”

In practice, that means the lender receives 10% of the par value, or $100, every year (usually paid in two payments of $50 in this case). At the end of the bond’s maturity, the borrower pays back the $1000, too, so the lender, if it holds the bond to maturity, gets $2000 (20 payments of $50, a profit of $1,000, plus the initial $1,000 they lent).

But if the payment is fixed, why are the yields changing? Things get complicated when you factor in that these bonds are tradeable securities, meaning the original lender can decide to sell the bond to another entity.

These two parties have to agree on the price. If the lender doesn’t want to sell, it may ask for more than it lent to Oracle, like $1,100. Say the buyer agrees. The original lender gets $100 in net profit (plus any repayments it might have received while holding) but no remaining exposure to Oracle's potential credit risk, in exchange for missing the opportunity to earn $2,000 over ten years if it holds to maturity.

On the flip side, the new buyer earns the right to those interest payments, but the bond's profitability is now lower in this case. Why? Because it’s still getting paid the original coupon of $100/year, but based on a bond it has paid more for. In other words, the yield is no longer 10%, it’s now 100/1100 = 9.1%.

This simple maths investors never actually explain is what sits behind the notion that “if bond prices increase, yields fall, and vice versa.”

So if Oracle’s bonds are decreasing in price so fast, just weeks after the issuance, means buyers are paying less than the original bond was worth, resulting in higher potential yields. In plain English, investors are demanding higher payments to hold Oracle’s debt.

Generated using ChatGPT for Excel.

To be clear, the fact that a bond is trading below par value doesn’t necessarily mean the borrower is suddenly riskier; there could be a cost opportunity reason too (that investors figure more attractive investments elsewhere and thus remain more profitable to justify the purchase).

The real issue here is that the yield rise has occurred so rapidly and without any meaningful macro changes to explain it, beyond investors suggesting that original lenders mispriced Oracle.

And to pretty much confirm the thesis, some investors are buying insurance on the bond. We know this because we can track credit default swaps (CDS). In other words, some investors are buying insurance against the default of the same bonds they hold, an investment vehicle that may sound familiar to many of you because of its important role during the 2007 financial crisis.

At that time, it was insurance against mortgage-backed securities defaulting, which was indeed the right thing to do because, quoting Mark Baum in The Big Short, these MBS products were in reality “dogshit wrapped in catshit.” Here, the borrowers aren’t homeowners with low FICO scores, but Oracle itself. However, that has not prevented the fears from accumulating.

Nonetheless, from late 2025 into early 2026, Oracle’s 5-year CDS has widened sharply. Public reports put it above 110 bps (1.1%) in November 2025, around 126–139 bps (1.26-1.39%) in December 2025, and about 154 bps (1.54%) in early February 2026. And after Oracle announced its funding plan in early February 2026, CDS worsened by roughly another 35 bps.

I can’t get into the math here, but that means investors are currently pricing a 10-12% default risk over the next five years.

Knowing all this, the sudden need to fire so many people becomes much more apparent. But is Oracle doing the right thing?

TheWhiteBox’s takeaway:

The potential firing of up to 20,000 employees would imply billions of dollars in new cash flow, thereby reducing dependence on junk-rated debt. Oracle is really between a rock and a hard place here.

But aside from this particular case, which is mostly about liquidity, the overarching reality is that, over the coming years, your average knowledge-work-heavy company will become smaller.

If people can do more with less, you need fewer people; it’s just maths. Of course, we can’t conflate this with the “AI washing”, which I’ve talked about here recently, in which companies are using the “AI is coming” horror story to fix mistakes from the ZIRP era (zero interest rate policy), when many enterprises in corporate America massively overhired.

AI isn’t yet really ready to meaningfully disrupt companies, but that doesn’t mean companies won’t use the belief that it can to get their way.

Although in Oracle’s case, it’s much more about the fact that they took a massive gamble that jeopardizes its entire existence in the name of becoming a Hyperscaler. And, as always, employees are paying the price.

AI CAPEX
Google’s AI Infra Lead on the Future

If we have to take the word of Google’s New AI Infrastructure Lead seriously, the AI Capex boom is just getting started. Let me summarise it for you:

Google says it will spend a large sum on new data centers and AI infrastructure over the next few years. In the Forbes interview, he suggests this spending could eventually add up to more than $1 trillion, actually closer to $2 trillion, over the next ten years.

The basic idea is simple: Google believes AI will require enormous computing power (and unless the algorithmic status quo changes radically, they are correct), so it is building much more capacity now to stay competitive.

TheWhiteBox’s takeaway:

Google is betting that AI demand will be massive, and it is preparing to spend at an extraordinary scale to support it.

Will it pan out? Nobody knows, but Google seems to be in a true duality: it’s boom or bust. If the AI trade fails for whatever reason, Google is as exposed as a company can be, and could see its stock cut in half. If it’s successful, this company could soon be the most valuable in the world.

Closing Thoughts

The pace of progress continues to accelerate. Interestingly, however, most of the progress we are seeing in AI today is product-driven; it’s not that the models are step changes relative to previous generations released a few months ago, but the products are finally clicking.

In other words, most progress today is engineering-driven, which raises the question: Is AI progress stalling?

I genuinely don’t know, but here’s the thing: we shouldn’t care as long as products, which are the money-making machines, are actually becoming worth paying for.

On the adult side of things, as I predicted many months ago, debt continues its unequivocal overhaul of AI financing, becoming a key source of liquidity for companies. Some companies, like Oracle, are relying on debt so much that they are putting their business at risk and potentially laying off thousands of workers who did nothing wrong, just happened to be in the wrong place at the wrong time.

For business inquiries, reach me out at [email protected]

Keep Reading