THEWHITEBOX
TLDR;

Welcome back! Today, we tackle a broad list of topics.

From your weekly ratio of AI solving maths problems, Huawei’s new chip breakthrough, Uber sounding the token alarm, a trillion-dollar company going up by 19% on a single day, and Opus 4.8, among other interesting news.

Enjoy!

THEWHITEBOX
Google’s AlphaProof Nexus Solves 9 Erdős Problems

As published on arXiv, a Google DeepMind-led paper introduces AlphaProof Nexus, a framework that uses large language models and the Lean proof assistant to search for formally verified mathematical proofs.

The authors report that their strongest agent solved 9 of 353 open Erdős problems, including two questions that had been open for 56 years, at an inference cost of “a few hundred dollars” per problem. It also proved 44 of 492 OEIS conjectures after autoformalization and manual review.

The idea is that AlphaProof Nexus takes a Lean theorem with missing proof steps, lets LLM-based prover subagents revise proof sketches, and checks progress through Lean. In layman’s terms, it lets an agent “guess and verify” different approaches to solving maths problems.

To be clear, Google’s proof-solver, Alphaproof, has existed for a while now. The difference is that Nexus gives Alphaproof as a tool to another AI agent.

But why does this work so amazingly well?

TheWhiteBox’s takeaway:

The key is the use of the Lean proof assistant and compiler that can automatically verify correctness. This is a perfect counterbalance to an LLM’s biggest problem: hallucinations, as well as helping it improve under automatic verification.

Remember that in AI, we can only learn what can be measured. AIs excel at those areas where verification is simple; areas where discerning a model’s response quality is simple or even automatic, as is the case in maths.

Lean enables the AI to engage in a learning loop in which it can try new ways of solving problems and receive automatic feedback, making learning a matter of computation.

As I always say, there’s a reason we have amazing coding agents while AIs are terrible at writing. It’s not magic; it’s one task that is easily verifiable versus one that is very hard (what is great writing? It’s highly subjective).

But to me, the highlight here is the costs. They took only a couple of hundred dollars per solved problem. That is great news because it means frontier prices on hard problems are falling.

On the flip side, AI getting better at verifiable domains is not surprising; it’s maths, a matter of optimizing against a known and measurable objective. There’s zero reason to believe an AI can’t optimize against such problems.

But finding a way to train AIs on non-verifiable domains? Well, that’s still a mystery to this day.

HARDWARE
Huawei’s New Chip Breakthrough

Huawei has published a paper that is making quite the noise. The claim is that it has found a new way to scale chips despite US export controls: LogicFolding.

The problem China faces is simple. Advanced chips have historically improved by shrinking transistors, which lets companies pack more compute into the same chip area. But shrinking transistors below the 7–5 nanometer range requires extremely advanced EUV lithography tools, mostly made by ASML, which China cannot access due to US-imposed export controls.

Therefore, Huawei’s answer is not to shrink the transistor, but to change the chip’s geometry. Instead of spreading circuits only across a flat surface, LogicFolding stacks compute logic vertically. In theory, this increases density, shortens some wires, reduces power lost to wiring delays, and allows the chip to do more work without needing a more advanced manufacturing node.

In other words, Huawei says it achieved a major increase in density without moving to a smaller transistor node, something this time the US can’t prevent with US controls. This is not the same as catching up to TSMC or NVIDIA, nor does it make export controls irrelevant, but it surely hurts in Washington.

This is a bigger deal than what I can describe in a few words. Thus, I have written a much longer and more detailed piece about this, which you can read for free here.

TheWhiteBox’s takeaway:

The bottom line is that China now has a way to competitively scale compute on-chip without ASML's EUV tools, at least for smartphones.

Huawei’s reported density numbers for its 2026 smartphone CPU (graph above) appear competitive with, or even better than, Apple’s 2024 A18 Pro chip (iPhone 16 and MacBook Neo) despite Apple using a much more advanced TSMC node (3 nanometers).

Huawei is targeting AI processors by 2030-2031 with density comparable to future 1.4A-class chips. We can't jump to conclusions so soon, but if true, China’s 2030 chips could be competitive with US 2028–2030 chips on a chip-by-chip basis, massively closing the gap.

As we have discussed multiple times, China is already competitive at the server/system level today, so if that event materializes, it could be hard to see who's ahead.

All of this, again, without access to EUV litho tools (yet).

COSTS
Uber Sounds the Token Alarm

In a recent podcast, Uber’s COO said the ride-hailing company isn’t seeing a clear increase in productivity from using AI coding services despite their use by its engineering teams. That has prompted executives to discuss how to get a handle on token consumption costs. 

“If you‘re not actually able to draw a direct line to how much useful features and functionality you’re shipping to your users, [the costs become] harder to justify,” he said.

TheWhiteBox’s takeaway:

Next up, rain is wet. Jokes aside, who could’ve known that ‘tokenmaxxing’, generating as many tokens as you can as a sign of productivity, was not a well-thought-out strategy?

Needless to say, it was the CTO of that same company who first raised the alarm that their AI spending had skyrocketed far beyond what they had anticipated, burning through an entire yearly budget by March.

Now, the world is realizing what readers of this newsletter have known for months: AI is much more expensive than we thought, and not only that, but it will get worse as companies stop subsidizing tokens and start charging the real value.

As I’ve also explained, the issue is not that inference is not profitable, but that capital costs are too high. Inference is not the issue; NVIDIA’s and SK Hynix’s gross margins are.

And to be clear, this is hardly a call to stop using AI.

Instead, it’s about being smart about your AI use. Start measuring return on investments, like in every single technology you’ve ever used. “Selling intelligence” makes it look like you have to ‘tokenmaxx’.

Well, no, because if you do that, you’re going to start ‘bankruptmaxxing’.

MEMORY
Micron Goes Up 20% on a Single Day

As I write these words, the three DRAM memory companies, Samsung, SK Hynix, and Micron, have crossed the psychological threshold of one trillion dollars, a remarkable growth rate. For instance, Hynix has grown 11x in a year.

But Micron set all the records three days ago when it went up by 20%, or $200 billion, in a single day after UBS upgraded its price target to $1.6k, roughly double what it was at the time, which would mean more than double where it was. And the stock flew.

TheWhiteBox’s takeaway:

We’ve talked a lot about the importance of memory in the AI trade. They are key to both lines of progress: making models bigger requires more memory capacity, and making sequences longer requires more memory capacity and bandwidth.

But I can’t help but feel uneasy that a stock already worth $800 billion at the time can go up 20% on a single analyst quote. Are we approaching the peak?

Honestly, who knows? I will only say this: I have more than 25% of my liquid net worth on Samsung and Hynix, so we'd better not be at the peak.

Interestingly, both Samsung and Hynix trade at lower price-to-earnings multiples than the average S&P 500 stock (Micron has a much higher PE), and their forward PEs (multiples over projected future earnings) are still insultingly low because these companies are going to make so much money next year.

How much, you say? Just look at Morgan Stanley’s very high-level estimate of the Bill of Materials for the upcoming VR200 chip (Vera Rubin from NVIDIA). Memory has grown by 6x from one generation to the next.

And that memory does not include HBM, which is included inside the GPU line item. At least for the next two years, these three companies are going to print money.

Source: Morgan Stanley

The question, as always with these stocks, is whether we should be valuing them based on PE or PB. Historically, due to their cyclical nature, semiconductors have been evaluated by multiple-to-book value (assets - liabilities), because you couldn’t discount future cash flows because they were so uncertain.

At PB, their multiples sit around 10 on average. But here’s the thing: even if you insist on valuing them by book, you’re still going to have to rerate them because their books are growing incredibly fast due to the amount of cash they are receiving.

Importantly, new investments are not being financed, so the asset base is growing rapidly while liabilities grow very little, and thus the book value will continue to grow.

However, many people now actually believe memory is a secular market, and these companies will finally have predictable revenues (unlikely considering it’s still hardware), but here’s my two cents: it will remain cyclical but with much higher frequency, meaning it might remain cyclical, but new cycles will come fast.

DEBT
SoftBank is Playing a Dangerous Game

I’ve talked about AI leaning more and more into debt. But the following news just sets a new degenerate standard. New Bloomberg reporting explains that SoftBank has borrowed against its own OpenAI shares… to buy more of them.

It’s like the start of a joke, except that it’s not one.

SoftBank has committed roughly $60 billion to OpenAI, and internal advisors who questioned the size of the bet say founder Masayoshi Son shut them down. Former SoftBank insider Habib Imam described the position as "a bet on a worldview about AGI" and added, "you can't hedge a worldview." 

To fund the commitment, SoftBank sold its remaining Nvidia stake, took out a $40 billion bridge facility, and layered on a margin loan, all at around 8% interest. SoftBank's last bet of this scale was WeWork, which imploded in 2019.

TheWhiteBox’s takeaway:

These bets make sense (I wouldn’t be making those bets either way, though) as long as the price of the underlying shares keeps going up. Because if it starts going down, lenders are going to come crushing down on you.

This is just one additional reason to view the IPOs of Anthropic and OpenAI as the ‘it moment’ for the industry. If they are a resounding success, we’re into something. If they show the slightest flakiness, oh boy.

And to be clear, the issue is not with the Hyperscalers; they could cut back on investments somewhat if those IPOs fail, but they will survive, and investors know it.

The biggest problem is the debt side of things; what happens to all the AI companies that are largely dependent on external financing to survive, the CoreWeaves of the world?

Because guess what, they are great businesses, but no matter how great they are, they are still massively cash-flow negative. My enduring feeling is that there’s really too much money at stake here to let all this come down; people just don’t want to hold cash and will go to unprecedented lengths to justify these investments.

I could be wrong, though.

LAW
Law AIs Aren’t Really That Good Yet

Harvey has released some early results of its Legal Agent Benchmark (LAB), which show that frontier AI agents still complete fewer than 10% of complex legal tasks end-to-end under LAB’s strict “all-pass” grading standard.

Interestingly, there’s a clear winner in this category: Claude Opus 4.7 led the tested models at 7.1%, followed by Sonnet 4.6 at 5.4%, Opus 4.6 at 4.2%, GPT-5.5 at 2.1%and Gemini 3.5 Flash at 0.8%, with the latter two showing a really poor performance.

TheWhiteBox’s takeaway:

It’s important that this task and domain-specific benchmarks start to emerge to show how hard reality can hit. These models can do an okay-ish job, but fail miserably if you don’t hold their hand, at least today.

For now, they remain an interesting coworking tool that, used well, can really push what one can do. The other side of the coin is costs. Opus 4.7, the top scorer, cost about $50.90 per task and took roughly 22 minutes per run, while faster or cheaper models scored lower.

This doesn’t sound cheap at all, and continued use of these models can be a nightmarish expense.

And talking about Opus…

MODELS
Anthropic’s Opus 4.8 Out. Mythos Soon?

Minutes ago, as I was writing these words, Anthropic launched Claude Opus 4.8, an upgraded version of its flagship AI model, while also preparing to release a more advanced model, the long-awaited Claude Mythos.

Anthropic says Opus 4.8 improves on earlier Opus models in areas such as coding, reasoning, financial analysis, computer use, and browser-agent tasks. The company highlights “honesty” as a key change, saying the model is more likely to flag uncertainty and avoid unsupported claims.

Unsurprisingly, Anthropic is offering Opus 4.8 at the same price as its predecessor, despite performance improvements. The Verge also reports that Anthropic is adding controls for how much “effort” Claude applies to tasks, affecting token use and cost.

TheWhiteBox’s takeaway:

Minutes later, they announced a monstrous $60 billion Series H round, valuing the company at a whopping $963 billion post-money. As I explained in a recent post, the discussions were about a new $30 billion round, but it has turned out to be twice that size.

As for Opus 4.8, it seems like it's fully state-of-the-art, but come on, at this point we're both better than this and know that benchmarks rarely tell the same story. For instance, Opus 4.7 looks better on benchmarks than GPT-5.5, and that couldn’t be further from the truth in practice.

The next days will tell us how much better Opus 4.8 really is. But one thing’s for sure: it’s not going to be cheaper.

Closing Thoughts

AI is truly hitting a hard wall when it comes to costs, which have risen to a point that people simply can’t ignore them. I believe Uber is the first of many companies to realize that either you become sophisticated and frugal in your use of AI, or your AI bills will become a problem.

And the fact that Anthropic has released Opus 4.8 without a price reduction, despite the heat they are getting, tells me they just can’t.

Yes, they now command a $47 billion annual run rate, but if people truly start believing AI is expensive, these companies’ revenue growth frenzy may come to a halt. And right quick.

Remember, as I always say, the marginal cost of software is no longer zero once you include AI, which means every user counts.

Beyond the tech, we continue to see basic market degeneracy to the point where people are borrowing against their own shares to buy more. This is the type of behavior that, if things go south, we will all regret.

The silver lining is that AI continues to make progress in maths, but we can’t pretend to be eternally excited when all good news comes from verifiable domains like coding or maths.

The world is much more than coding and maths; we need results elsewhere, too, like in law. And the truth is that, once you evaluate AI in those domains, the news aren’t that exciting anymore.

For business inquiries, reach me out at [email protected]

Keep Reading