THEWHITEBOX
TLDR;

Welcome back! This week, we have a loooot to talk about. A lot.

One of the most commented on releases in AI history, Claude Mythos, and what’s true and not true about it, Meta’s comeback, the unaffordable Blackwell AI era, among others, ending with a deep analysis to answer the question most investors are begging to know the answer to:

Will OpenAI and Anthropic be good investments once they IPO this year?

Let’s dive in.

THEWHITEBOX
Claude Mythos… The Destroyer of Worlds?

Probably the AI news of the week, month, and even the year. Anthropic has announced something they are too scared to release. Yes, that’s literally the headline.

Anthropic has a new model, called ‘Mythos’, which is allegedly extremely capable in cybersecurity, finding loopholes and bugs in almost every piece of software it examines.

Of course, that means they are too scared to make it generally available, and are just releasing it to “family and friends,” that is, investors are big clients, who will be charged $25/$125 per million input and output tokens, respectively.

For what it’s worth, the model is definitely more capable in cybersecurity than its predecessors, with a much higher rate of successful exploits against software that both you and I use daily.

Beyond that, the model is allegedly absurdly strong at coding, agents, and basically everything, representing what’s clearly the next frontier of AI capabilities.

Perhaps most noteworthy is that it seems to be a much more efficient “thinker”, as shown in the graph below, where it scores better than all other Claude models whilst requiring much fewer tokens.

So, even if it seems more expensive for you to run (if you could, naturally), it might not be in practice.

TheWhiteBox’s takeaway:

I go much deeper into my thoughts on this in this Medium article that is free for you, but the point is that you would be wise to apply a very healthy dose of skepticism to this release.

If the risks are actually real, it was a good decision not to release the model until most of those risks are patched by those affected (Anthropic is sending receipts to these companies). Still, it’s very likely the claims are blown extremely out of proportion, and the effects are not that impactful.

Nonetheless, as researchers on this blog demonstrate, models already available can be quite successful at these exploits too, including one with just 3.6 billion active parameters.

I quote, “But here is what we found when we tested: We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis.

This doesn’t mean Mythos can’t be dangerous, but it implies that much of the risk is already here.

Let’s not forget that, as I discuss in more detail below, this company is IPOing this year, it’s growing extremely fast, and needs to keep the momentum going. However, the honest realization, and the reason I believe it was not released, is that this new model frontier is simply too expensive to run.

I ran some back-of-the-envelope math in the Medium article to give you a sense of how expensive it is to serve this model.

So, to me, this is closer to a marketing stunt than anything else, a way to save face with investors despite having spent at least $10 billion on an AI model they cannot serve to users.

MODELS
Meta’s Muse Spark, Finally Here

And finally, Meta joins the party again. Almost exactly a year after the failed release of Llama 4, which led to one of the most impressive buyout sprees in history, with some researchers receiving $100 million sign-up bonuses, Meta has unveiled the first result from the team it created (unappropriately named Meta SuperIntelligence Labs): Muse Spark.

The model is competitive with pre-Mythos state-of-the-art, meaning it’s in the same ballpark as Claude Opus 4.6, GPT-5.4 (non-Pro), or Gemini 3.1 Pro, but naturally behind (allegedly) Anthropic’s new beast.

TheWhiteBox’s takeaway:

The first noteworthy thing is that it’s not open source, confirming the well-known reality that Meta was no longer open-sourcing its efforts (at least not on release).

Overall, feels like a really good model: it’s less about how much it’ll be used and more about Meta making a statement that they’re back (in fact, Alexandr Wang, MSL’s CEO, already announced that much larger models are in training).

Therefore, I can’t comment much on the tech side of things, beyond the point they make on compression; the model has been trained to be less of a ‘yapper’, which seems to have become a staple of model training (as seen with Mythos above and SWE-1.6 belo)

To do so, during Reinforcement Learning (the phase in which models are no longer trained to imitate good responses, and instead are pushed into trying out stuff on their own until they get to the answer), they first let it generate all thinking it needed to get to the result, and then introducing a ‘length penalty’ that punishes the model for good responses that are too long.

In fact, they show how the model managed to reach an almost perfect score in AIME 2025 (a very hard maths benchmark) with much fewer tokens than what it originally required for much worse performance:

This is relevant because getting models to reach an “intelligence level” where they can get to the point faster not only provides a better user experience but also makes it much cheaper to serve users.

As models are indeed getting enormous, it’s vital for Labs that the budget for thinking drops without impacting performance.

MODELS
We Need a New Definition of Large Language Models

Although most people aren’t aware yet, no matter how exaggerated Claude Mythos ’ release was, the AI industry has changed. Why? Well, because we’re seeing the first examples of AI models trained on Blackwell GPUs, and that’s a huge deal.

Claude Mythos was, in fact, trained on Trainium chips, as AWS’s CEO confirmed.

In other words, all models we’ve been using until now have been trained on the previous generation of GPUs. Succinctly put, those models were way smaller than the new wave of models, and we’re talking about at least an order of magnitude smaller.

For example, we know for a fact that xAI’s Grok 4.20 model, a frontier model in every sense of the world, even if it isn’t as good as GPT-5.4 or Opus 4.6, is “just” 500 billion parameters (Elon confirmed it), which is around twenty times smaller than Mythos’s alleged size.

And the reason is not that we didn’t want to train larger models, but that we couldn’t because we couldn’t serve them (in fact, we still can’t).

The giveaway is server-scale-up domain sizes (i.e., the number of GPUs per server). The reason I can say this with extreme confidence is that in AI inference, you cannot serve models across multiple servers. That is, the AI model must fit inside the server and still leave room for the cache.

The reason is simple: memory bandwidth. AI inference is memory-bound, meaning the bottleneck that determines how fast the user sees the response is not compute power but memory speed.

In an NVIDIA GB300 NVL72, the type of server Mythos was trained on (similar to Trn2, the likely place), you have package-level memory speeds at 7.3 TB/s between the compute dies and HBM.

GPU-to-GPU communciation already drops this ~3x to 1.8 TB/s (which slows your model by 3x as this is the bottleneck), but if you drop to GPU-to-GPU communication with GPUs on different servers, that drops to 100 GB/s, meaning each GPU sees an increase in waiting times of 18x relative to GPUs on the same server as them, and 56x relative to its own memory.

Don't worry about the numbers; the only thing you need to internalize here is that the “further away” GPUs that have to communicate are the ones that are much slower at serving user responses.

Solution? Keep them close.

Therefore, inference is simply not possible across servers. Knowing this, we now have an idea of how much memory an AI model is dealing with: the amount of HBM the server has. In that server, that is 21 TB.

So if we have a ten-trillion parameter model, at FP8 precision (one byte per parameter), that’s a 10 TB model, leaving 11 TB to handle the huge KV Cache each user generates. This seems like a lot, but it’s not; a 100,000-token sequence, which is peanuts in today’s world, requires 50 GB of cache, so you can get around 200 users on a good day.

The problem? If we look at interactivity curves of much smaller models, we see that, in fact, getting 100 tokens/second or more (the bare minimum users tolerate, less than that, they bounce) means we can only serve 120 users… with a DeepSeek R1 model that is 20 times smaller than the alleged size of Mythos.

Interactivity curves for GB300s with DeepSeek R1. Source: SemiAnalysis

Your brain might need you to read this a couple of times, but the point is that the math ain’t mathing, as serving a model the size of Mythos would require batches of, maybe, just 10 users?

That’s literally hundreds of dollars per hour in revenue on a server you spent $3 million on. Repeat after me: the math ain’t mathing, and thus these huge models desperately need an entire new platform to be served (or a historic price hike).

The likely outcome? You know it: these models are enterprise-only territory until Vera Rubin-level servers come online (most likely end of 2026 at the very least), and will remain that way for months to come. For you and me, we’ll have to deal with the distillations that come from those models.

Luckily, as I always insist on, most tasks you require AI for do not require Mythos-level models.

VENTURE & IPOs
Anthropic Goes All In

Anthropic is having one hell of a 2026 so far.

The company has gone from a $9 billion run rate in December to a $30 billion one, meaning the company has gone from a monthly revenue of $750 million to $2.5 billion in less than three months, probably unparalleled in history and potentially even exceeding OpenAI’s run rate for the first time (more on this later).

But this has come at a cost; they have more demand than they can meet, forcing them to cut corners everywhere, impacting consumer users the most.

Users are seeing much worse performance as Anthropic is forced to deploy compute elsewhere, with that ‘elsewhere’ meaning ‘enterprise’, and even seeing subscription bans on products like OpenClaw (more on this in the product section).

OpenAI is smelling blood and being super aggressive with user rate limits, especially on Codex, where they have reset limits 3 times in a single week (meaning they are giving you extra compute for free).

But this has had another big impact on compute spending.

Despite Dario being famously cautious about committing to large amounts of compute relative to how OpenAI has operated, this doesn’t seem to be the case anymore, and Anthropic is now clearly desperately looking for new deals, which has pushed them more and more into the arms of Google (and Broadcom).

According to a filing from Broadcom, Anthropic will access 3.5 gigawatts of TPU-based AI compute capacity beginning in 2027.

Interestingly, we now have much more detail into the financials of these two companies, considering both are aiming for a 2026 public IPO. Becoming a public company means we can all become investors, which implies much greater financial transparency requirements.

In the portfolio section of this newsletter below, I analyze both companies in depth to explain which one I would invest in (could be both, could be none).

AI & JOB LOSS
AIs can’t reach pitch decks

In a very interesting blog post, Mercor explains how top AI models still struggle with realistic financial work when they have to read messy charts, tables, and slides rather than clean text.

To test this, they first ensured that the models lacked the parametric knowledge to solve the questions (by testing whether they had seen the data beforehand and preventing memorization contamination).

But even when providing the document, if it wasn’t in a fully textual form, the models saw a striking loss in performance, with up to 20% performance drop in the case of Opus 4.6

TheWhiteBox’s takeaway:

The theory is that AI will destroy all jobs in the near future. But the evidence keeps recurring that this is far from evident at all, and when you truly scrutinize these models in memorization-free tests, they break like glass on the floor.

Nothing fancy, just my weekly reminder to you all that models still can’t work well in areas they don’t know, and that is a crucial element any job requires, making it a pretty good predictor of job loss in the short term.

ASSISTANTS
OpenAI Releases $100/month Subscription

Just moments before finishing this newsletter, OpenAI said, “Wait, there’s one more thing, buddy.” They have finally announced the $100/month tier.

This new tier offers 5x more Codex usage than Plus and is best for longer, high-effort Codex sessions.  In ChatGPT, this new Pro tier still offers access to all Pro features, including the exclusive Pro model and unlimited access to Instant and Thinking models.

They are also being very aggressive. Quoting them, "to celebrate the launch, we’re increasing Codex usage for a limited time through May 31st so that Pro $100 subscribers get up to 10x usage of ChatGPT Plus on Codex to build your most ambitious ideas.”

TheWhiteBox’s takeaway:

As I was saying above, OpenAI smells blood on Anthropic’s compute issues, and they are trying to take their lunch money by being extremely aggressive with rate limits (basically losing a ton of money to get you as a customer).

All in all, it’s a great time to be in the higher tiers of ChatGPT. In fact, just in the last week, they’ve reset Codex's rate limits three times. I still have to get below 60% before these guys push the value back to 100%.

I’m pretty sure this won’t last, but oh boy, does it feel good to get spoiled. Right now, ChatGPT is the best deal in AI by far. And although $200/month may be too much unless you use this all the time as I do, $100/month is very attractive.

One of the reasons Anthorpic had such huge popularity was its $100/month Max offer, but I’m pretty sure OpenAI’s rate limits will be much, much higher.

CODING
Cognitions Ultra-Fast Coding Model & the Lessons it Gives

After reading my article above on the new scale of the upcoming model frontier with Mythos, you may feel disillusioned and left out.

But you don’t have to.

As I hinted at, most tasks don’t require frontier-level models, and non-frontier models are already becoming crazy good.

A perfect example is Cognition’s SWE-1.6 (New) coding model, which is extraordinarily fast (up to 950 tokens/second on the premium tier, 200 tokens/second on the free tier) as it’s extraordinarily capable too.

The main speedup seems to come from parallel tool execution, meaning the model can run tools in parallel rather than sequentially, as shown here.

Interestingly, they seem to be toying with the same approach I was alluding to with Meta’s model above, incentivizing model conciseness, which also implies faster responses because, well, they are shorter.

But I won’t sugarcoat this; no, this model is by no means in the same category as Claude Mythos, but let me instead frame it this way: does it have to?

TheWhiteBox’s takeaway:

Do you need all your thinking effort to buy food on DoorDash? No, your brain adapts to the task's complexity.

And while a year ago the answer to most problems was always the frontier models because the smaller models were unusable without fine-tuning, now smaller models, examples like Gemma 4, are good enough for most tasks.

In fact, you probably soon won’t have an option, as we’re going to get priced out of the frontier tiers. But my point is, you’re still going to be fine… if you become operationally excellent at using AIs.

Last year, I emphasized operational excellence as the secret to startups' survival. If your company were to become a massive user of AI, like everybody else, products and services would become commoditized, and price would become the deciding factor. When that happens, spending the least amount possible on AI will define your margins.

And by the least amount, I do not mean not using AI, but using it smarter.

But this idea applies to you and me, too. If AI becomes omnipresent in our lives, those who are smarter about AI usage will prevail, because you won’t be able to run your entire workflow repertoire on Claude Mythos, let me tell you that with confidence.

I predict consumer AI hardware companies like Apple will soon see much greater demand for AI laptops and consumer GPUs (we’re already seeing this), because running local models might soon be the only way for AI to make economic sense.

AGENTS
Claude Managed Agents

Anthropic has released a new enterprise product called Claude Managed Agents that, as the name suggests, allows you to support and manage agents for given tasks, which you describe to it in English.

Recall that agents are nothing more than an LLM with access to data sources and tools, so the idea of this platform is to help you manage both, turning agent creation into basically a drag-and-drop + prompt engineering task.

TheWhiteBox’s takeaway:

As with all Anthropic releases, it looks super powerful, and it’s beautifully presented, which of course means it’s gone super viral.

But this isn’t something really new; OpenAI released an agent builder a while back and miserably failed. The timing of this release is surely better (agents are now much better), but so far, most generative AI products have been very underwhelming relative to what the demos promised.

Until further notice, agents are easy to demo, hard to productionize, and the jury is out on this one.

And on the topic of Anthropic, their shift toward enterprise is pretty extreme. Not only are they releasing enterprise products, but they are literally killing customer subscription experiences, with users reporting a 67% performance degradation to allocate more compute to enterprise clients.

For what it’s worth, although the practices they employ are pretty immoral, they’ve been pretty open about their priorities, but if you’re an individual user, maybe it’s time you accept that you’re not a priority for Anthropic.

Still, we have to give credit where credit is due, and this company surely knows how to ship appealing products.

IMAGE GENERATION
OpenAI’s New Image Model, Very Close

OpenAI might be nearing the release of a new image model. According to the piece, early testers think Image V2 is noticeably better at prompt-following, composition, and, especially, rendering realistic user interfaces with correctly spelled text, which has been a weak spot for image models.

TheWhiteBox’s takeaway:

Based on testers' results, this is an important potential improvement over OpenAI’s current image system.

In a part of this industry firmly under the control of Google, OpenAI seems to be willing to put up a fight. But until further notice, Google remains king.

Closing Thoughts

Extremely eventful week with tons of model releases (and non-releases), news on markets, and products.

Of course, Mythos took all the spotlight, humanity’s first taste of the next generation of frontier AI models. The issue is that the world’s computing isn’t ready for these beasts, and all there is for now is waiting and seeing what the lucky ones with access tell us.

It’s also the week Meta has officially come back into the spotlight too, with a model that isn’t blowing users' minds, but it looks very solid and is a statement that Meta will compete (and has the compute to do so).

But going back to Mythos, it has been an awakening for me. Not because I have tested it or am afraid of what it can do, but because it has dawned on me that the days of “cheap” AI are reaching an end, and that, I believe, pretty soon, the AI Labs are going to start charging what they really should be charging us.

And let me tell you, it’s not going to be pretty.

I can see this coming from a mile away, so I’m going to start seriously putting an effort into embedding local models into my workflows. Because for many things, local models are enough, so I’m not about to let a $1,000/month subscription price me out of AI.

And talking about AI Labs, in the following section, we analyze their leaked financials, cross-reference them with other sources for revenues, and compute, to build a complete picture of these companies and answer whether they are going ot be worth investing in a few months’ time, when their IPOs finally come to us.

The most expected IPOs ever, boom or bust?

For business inquiries, reach me out at [email protected]

Keep Reading