Dangerous Acts, $13 billion, & the Mega POD.

In partnership with

How can AI power your income?

Ready to transform artificial intelligence from a buzzword into your personal revenue generator

HubSpot’s groundbreaking guide "200+ AI-Powered Income Ideas" is your gateway to financial innovation in the digital age.

Inside you'll discover:

  • A curated collection of 200+ profitable opportunities spanning content creation, e-commerce, gaming, and emerging digital markets—each vetted for real-world potential

  • Step-by-step implementation guides designed for beginners, making AI accessible regardless of your technical background

  • Cutting-edge strategies aligned with current market trends, ensuring your ventures stay ahead of the curve

Download your guide today and unlock a future where artificial intelligence powers your success. Your next income stream is waiting.

THEWHITEBOX
TLDR;

Welcome back! This week, we take a look at research from two sides: a new learning mechanism that might change how AIs learn, and the crude reality of AI safety: models can be really dangerous.

Moreover, we discuss Anthropic’s latest monstrous round, OpenAI’s acquisition of StatFigs, Google’s court win, how AI is saving the day for investment bankers, and AMD’s Mega POD plan to dethrone NVIDIA.

Enjoy!

THEWHITEBOX
What You’ve Missed By Not Being Premium

On Tuesday, we had another exciting round of news, including an impressive ping pong robot coming from China that will leave you speechless, the latest episode in the OpenAI vs xAI drama, Cohere’s wise strategy, and CloudFlare’s agent gating, which could redefine the Internet itself.

Moving on, we’ll analyze the fascinating business model of the latest success story in AI: OpenEvidence, alongside other notable developments from OpenAI, Google, and Microsoft.

M&A
Software is Saving M&A

US software firms are driving a surge in mergers and acquisitions, particularly via the acquisition of AI startups, resulting in this year’s strongest activity since 2021.

To date in 2025, US software companies have completed around 150 AI-related acquisitions, with total spending reaching approximately $33.8 billion, surpassing the combined volume of previous comparable periods.

But is this enough to save these companies?

TheWhiteBox’s takeaway:

If you’re a regular reader of this newsletter, this is not a surprising trend; we’ve been talking about AI’s effect on the software industry for months, basically giving our reasons to believe it’s a SaaS killer.

And while acquiring AI startups to boost offerings is a logical next step, it doesn’t solve the real issue: commoditization of their businesses.

My thesis regarding AI is very consistent:

  • AI eliminates barriers to entry in most software markets, which in turn incentivizes competition.

  • With more competitors comes less differentiation; your features are now replicable.

  • AI features also increase your costs of goods sold (COGS) considerably, a break from the traditional model where most costs were operational (salaries, cloud costs, etc).

  • AIs are pay-to-use; you’re getting charged by OpenAI every time a consumer uses your AI features. This makes licenses much less profitable and adds pressure for you to go into a usage-based pricing model.

  • Worse, with operational costs the same unless you execute mass layoffs, gross margins (revenue minus COGS) are much lower, so SaaS companies are seeing margins getting squeezed out.

In summary, higher costs, lower pricing power, and pressure to modify your entire pricing model, all while your offering becomes increasingly commoditized.

So, while making M&A can help you get to new products faster and more attuned to the changing times, you are still dealing with the real elephant in the room: you are now probably way oversized and need to cut costs extremely aggressively.

That takes balls, and I don’t think most SaaS companies have what it takes.

ANTITRUST
Google Always Wins in the End

Judge Amit Mehta ruled that Google cannot enter exclusive search distribution deals and must share some search data with rivals, but rejected the DOJ’s push to force divestitures of Chrome or Android.

Google can still pay companies like Apple (e.g., the $20 B Safari deal) as long as payments aren’t exclusive, since banning them outright could harm partners like Mozilla.

Interestingly, the judge said generative AI has changed the competitive landscape, with examples like ChatGPT or Perplexity, so remedies aim to prevent Google’s search dominance from carrying over into AI.

The DOJ criticized the decision as too lenient, while Google celebrated it; Google's shares rose 8% after hours, reaching 11% growth in the last 5 days at the time of writing.

TheWhiteBox’s takeaway:

Great news for the search company, which will no longer be required to sell Chrome, to the dismay of potential acquirers like OpenAI or Perplexity.

This leaves Google as the great victor, and one that is starting to look extremely undervalued. Nonetheless, I haven’t been shy of praising Google over the last year.

  1. Most complete AI offering of all companies (by a long shot)

  2. An exploding user base of the Gemini model family (+450 million monthly active users)

  3. Unique offerings in things like AlphaFold (drug discovery), or Genie 3 (world models and video games)

  4. Despite fears, a growing ad search business

  5. Booming cloud business (although, as I always say, it would be good to know how much of that revenue is self-generated)

  6. Owns the leading streaming platform, YouTube (ahead even of Netflix)

And if all that wasn’t enough, Apple is considering using Gemini for its AI search feature, and the upcoming Gemini 3.0 is regarded as a potential monster of a model. There are only rumors, so we can’t confirm, yet they are coming from a very credible source, SemiAnalysis, which has probably had access to the model for beta testing for quite some time.

VENTURE CAPITAL
Anthropic Raises $13 Billion

Anthropic, one of the top AI Labs, has secured $13 billion at a valuation of $183 billion in its Series F funding round. This valuation is post-money, meaning the investors raised the capital at a valuation of $170 billion, and $183 billion is the total value, including the raised cash.

They mentioned hitting a $5 billion run rate last August, which gives them a forward P/S ratio of 36.6 (183/5), meaning investors are valuing the company at 36 times projected revenues (when counting raised cash).

For reference, that is higher than OpenAI’s (~25), but way lower than the multiples of AI startups today, which means the valuation is fair or at least not an outlier.

TheWhiteBox’s takeaway:

This really is a lot of money, record-setting if it wasn’t for OpenAI. But the truth is that these companies do really burn a lot of money.

We don’t have access to Anthropic’s burn rate, but we know xAI burns around $1 billion a month, which suggests that companies like Anthropic or OpenAI are at that value or higher.

For those unaware, Anthropic’s play is somewhat different from OpenAI’s, though. They are a foundation model company, too, meaning they train gigantic models from scratch. However, their product strategy is different.

While OpenAI’s ChatGPT is clearly a consumer-end application, meant for personal use, Anthropic focuses almost exclusively on coding and agents. That is, the place where Claude, Anthropic’s model family, shines is in coding and in tool-calling, where the agent executes external tools to execute actions on behalf of the user.

This has positives and negatives.

  • On the positives,

    • Claude models are generally considered the best coding models (although the reality is much more nuanced, and many of us preferred GPT-5 today or even Gemini 2.5 Pro), and it’s undeniable that they are the leading models in this category in terms of use.

    • Claude models are truly excellent for agents. Coupled with Claude Code, a command-line product that can be used to automate many aspects of your life, as we have shown in the past. But besides the product itself, the models are, objectively speaking, great at tool calling, having lost the lead benchmark-wise only just recently (although in rather incredible fashion by the Chinese model GLM-4.5)

  • On the negatives,

    • Claude’s value for everyday tasks you might be used to asking ChatGPT is really not it (and to be fair, they don’t market themselves otherwise)

    • The lack of a search index built by themselves, instead relying on the Brave API, results in a much worse experience than using ChatGPT or Gemini. Again, really not their game anymore.

Put simply, the trade-off is clear: sacrificing performance in epistemic tasks in lieu of great performance in coding and agents. And, for now, revenue growth tells a clear story: it’s working.

TREND OF THE WEEK
A New Way to Teach AIs?

A group of researchers has presented TOP (Token-Order Prediction), a new approach to AI training that holds immense promise.

For quite some time, AI Labs have been toying with the idea of using Multi-Token Prediction to enhance both intelligence and potentially increase inference speed. The approach is dead simple: for every prediction the model does, predict multiple following tokens instead of one (e.g., to the sequence “I like…”, the model predicts “pasta, especially bolognese” in one single response).

The rationale is that forcing models to predict multiple tokens enhances intelligence (in some domains only, though) because it makes models plan ahead. A good example is poetry.

If the model is writing the second verse, it must anticipate the last word in the verse to ensure it rhymes with the last word of the previous verse. This positive effect is also common in areas like coding (to ensure correct syntax).

The problem with this procedure is two-fold:

  1. The objective is very hard. You are making a model predict stuff two, three, or more steps ahead, which becomes an impossible task for large values of tokens. Imagine you had to think of an entire paragraph in one shot. Hard, right? Well, it’s hard for AIs too.

  2. It’s rarely used in practice. While models like DeepSeek’s were trained using this objective, they are mostly run on a single-token-per-prediction basis during inference, losing most of the speed value (if not all).

Instead, the researchers behind TOP propose asking the model to rank tokens by proximity, rather than predicting the exact word that comes three steps ahead.

For instance, if we take Rick Astley’s famous chorus, “Never gonna give you up”, the model will not only have to predict that the next word is ‘never’, but also calculate the proximity of other words, like ‘gonna’ or ‘let’ to the current prediction.

In layman’s terms, if the model works correctly, it should give high proximity scores to ‘never’, ‘gonna’, ‘let’, ‘you’, and ‘down’, in descending order, which is the model’s way of saying it believes that, after ‘up’, the words “… never gonna let you down” come next (which is, in fact, the correct chorus).

But how is that any different from predicting multiple tokens at once?

Simple: it’s easier. The key takeaway from this is that you need to be strategic about what the model needs to learn.

Instead of forcing the model to output the exact next sequence of predicted words, we simply ask it to predict proximity values, basically outputting a rank of likely next words in future predictions, even if we are just predicting the next word only.

The point is that while these rankings may or may not be accurate, they can be corrected in future predictions as the model has more information. Thus, the model is still learning to plan ahead of the current prediction, all the while not being held to unrealistic standards.

Therefore, we employ a best-of-both-worlds approach. We continue to teach the model to make predictions based not only on previous words but also on the current prediction, enabling it to “look into the future” without committing to the exact next tokens, which is simply too hard.

In AI, you always have to choose learning objectives that are challenging but not impossible. You may want to train a pig to move faster, but while flying will make a pig go faster than teaching it to run, as far as I know, pigs don’t fly, so it’s a useless objective.

In different tests, an exact model trained with TOP and also with other methods like the basic next-token prediction and multi-token prediction, sees the TOP model getting the best results across most tests and across different sizes, proving that we might have found a new way of training AI models that might soon become the standard.

HARDWARE
What is AMD Cooking?

AMD is cooking something huge. But to understand what that is, we need to understand where the discussion is coming from. In this newsletter, we always discuss AMD (Advanced Micro Devices, $AMD ( ▼ 6.58% )) as an investment play with huge potential yet risky.

For a deep analysis, read our deep dive on AMD.

Thus, their valuation must not be understood based on current economics, but for what is priced into its value:

  1. Potentially the only significant NVIDIA rival

  2. GPU specs on par with NVIDIA’s

  3. Considerably undervalued compared to NVIDIA (although not on a multiples basis, where AMD can be seen as “more expensive” than NVIDIA, but more on the “legs” this stock has if it delivers compared to the size of the market)

But there are reasons why NVIDIA is blowing AMD out of the water.

  1. Better software and distribution. Everyone in distributed computing uses CUDA, facilitating NVIDIA adoption. It’s also better software overall.

  2. Much larger scale-up (NVIDIA’s servers are much larger than AMD’s, making them much more suitable for heavy workloads), crucial for inference

  3. NVIDIA is a purer AI play, having a larger surface area of their chips dedicated to low-precision matrix multiplications (4 to 16-bit). AMD’s chips are more often used for HPC workloads (High Performance Computing, like physics simulations) that require higher precision (numbers have a much larger number of decimals). NVIDIA is all in on AI, AMD not as much.

  4. NVIDIA also outperforms AMD on scale-out (communication speeds between different GPU servers), crucial for training.

Thus, NVIDIA vs AMD must not be understood based on single GPU performance, in which they are basically on par besides the software side (except software). But AI workloads are never single-GPU, so NVIDIA’s edge in communication speeds (both scale up and scale out) is the real differentiator.

But all this might change in 2026/2027, allegedly.

For starters, AMD’s MI400x server coming in 2026 should put them on par with NVIDIA’s next-generation GPU platform, Rubin (the next after the current one, Blackwell), at the scale-up level, meaning, for the first time, AMD would be at the same performance level as NVIDIA on the server level.

With servers going up to 144 GPUs per server in 2026, we are talking about servers with more than 13 TB/s of memory bandwidth per HBM4 chip, and with 260 TB/s of GPU-to-GPU communication speeds, while offering 3.3 times the compute performance of the current most advanced AI server on the planet, Blackwell GB200 NVL72.

Don’t worry about the numbers that much; you just need to know that a single server of this type is already good enough for most AI workloads on the planet, and more than enough to serve thousands of simultaneous users the most advanced models on the planet, which is to say: AMD would be finally ready to compete at the inference level no problem.

But for us to trust AMD’s capacity to enter the AI training market with force, we need more information on the speeds offered by the scale-out hardware, where NVIDIA’s Infiniband technology still knows no rival. So for now, this AMD server would only be able to eat into NVIDIA’s inference market (which, it must be acknowledged, is the largest AI compute market right now).

And here’s where today’s news comes in.

Allegedly, AMD is taking it a step further, with a planned ‘Mega POD’, expected for the end of 2027, with 256 GPUs in a single server, almost 4 times larger than the current largest Western server (Chinese Huawei has a 384-NPU server called CloudMatrix, huge by current standards but considerably inferior in terms of performance per watt).

That’s why I believe AMD has enormous potential. If they truly deliver on this promise (and deliver better software, as we saw in our analysis), I don’t know how NVIDIA could be sixteen times more valuable. So, unless NVIDIA is absurdly overvalued, AMD’s fundamentals won’t warrant such a huge difference for much longer.

MEMORY
Mistral Adds Memory

Mistral AI has introduced a beta Memory system in its conversational assistant, Le Chat.

Unlike systems that either recall everything or only when prompted, this features a hybrid approach: it automatically saves useful information and then recalls it in a smart, context-relevant way, all while keeping the process transparent (i.e., you’ll always see what’s being recalled, why, and where it came from, like a clickable receipt).

Users retain full control: you can turn memory off, start an incognito chat without memory, edit or delete individual memories, and even export or import memories.

Additionally, Mistral has introduced Memory Insights, lightweight prompts that help you explore what Le Chat remembers (summaries, trends, or notable moments from your own data, all editable).

TheWhiteBox’s takeaway:

Memory is becoming an indispensable feature of AI chat applications. When adequately executed, like ChatGPT’s memory feature, it’s a truly transformative experience, as you can “feel” the model’s responses becoming more relevant simply because the model has more context about you.

For example, I really, really despise the model flattering me for no reason, the typical “That’s a great question!” type of content. So I just went and told ChatGPT I didn’t like that, and it complied; being declarative of your opinions allows the model to immediately identify this as a core memory to store.

But how does this work? It looks like magic, but it’s actually a pretty simple system.

There are really two ways AIs can “remember” something: parametric memory and in-context.

  • Parametric memory refers to the information the model has learned and stored in its weights during training. It’s innate to the model; it just “knows.”

  • In-context memory refers to the information the model sees in its prompt, within the context of that particular interaction.

These memory features are part of the latter group, meaning they are added to your prompt so that the model has the reference. In other words, whenever you ask the model something, behind the scenes, those memories are also added as context that the model sees; it’s not like the model knows you intrinsically, it’s more like a ‘cheat sheet’ the model always has for every interaction with you.

However, the future looks more in the direction of the former. Based on comments by incumbents, especially Sam Altman, it seems they intend to achieve extreme per-user personalization, to the point that the model “just knows you”. That would mean training models on your data, making them ad hoc to your needs.

We are nowhere close to that future (more due to engineering and energy constraints than to technical feasibility), but it’s clearly the next step in the frontier.

M&A
OpenAI Acquires StatSig

OpenAI has revealed that it’s acquiring the product experimentation platform Statsig in an all-stock deal valued at approximately $1.1 billion.

As part of this acquisition, Vijaye Raji, founder and CEO of Statsig, will step into the newly created position of Chief Technology Officer (CTO) of Applications at OpenAI.

In his new role, he will lead product engineering organizations, including ChatGPT and Codex, overseeing core systems, infrastructure, integrity, and feature development. He’ll report directly to Fidji Simo, who joined earlier this year as CEO of Applications coming from Instacart.

Statsig’s platform, well-regarded for powering A/B testing, feature flagging, and real-time decisioning, will be integrated into OpenAI’s product workflows to enhance experimentation capabilities. The transition will proceed cautiously and is subject to regulatory approval.

TheWhiteBox’s takeaway:

If there’s one thing we’ve learned from OpenAI’s failed acquisition of WindSurf, it’s that we can’t be certain until it happens (the deal fell through because Microsoft wanted access to WindSurf’s intellectual property).

But this is perhaps an even more interesting acquisition because StatSig is used by many of OpenAI’s rivals, including Anthropic. In fact, this is the only possible explanation for the acquisition, the great data StatSig has on competitors.

In fact, Anthropic itself has acknowledged using StatSig for telemetry services on Claude Code, meaning OpenAI now has invaluable data to learn from to improve their Codex product (OpenAI’s response to Anthropic’s Claude Code).

Knowing this, it’s almost a guarantee that Anthropic has immediately dropped StatSig as a supplier of experimentation services.

SAFETY
Frontier Models can still be ‘Hacked’ into Dangerous Responses

Transluce, an AI startup that focuses on pushing the frontier of AI safety research, and famous for having jailbroken very popular models (proving how OpenAI’s o3 model would ‘intentionally’ lie about their actions), has presented new research that proves that small models can still be trained to hack frontier models, leading them to behave in extremely undesired ways, showcasing very high hacking accuracy:

The models are lured into these very concerning behaviors, like literally telling their users to cut themselves open or kill themselves, which may sound dumb to us both, but are the type of responses that can tip many people over the edge, as we saw last Sunday.

Importantly, these hacks are written in plain English, not the usual, extremely unlikely to see in the wild prompts that some people use to break these models. This is important because it demonstrates that prompts written by most people can still break models, leading to responses like the one in the thumbnail, where the model suggests the user cut themselves to feel alive.

Interestingly, they leverage reinforcement-learning training to achieve this. That is, using a carefully crafted system, they help the investigator model learn the tricks that tip models over. Importantly, as this is an RL-training environment, the model can develop unique techniques that are also model-specific, understanding the exact prompt structure that breaks ChatGPT, Gemini, or Claude.

TheWhiteBox’s takeaway:

Some of the examples they give in the research are deeply disturbing. While I’m not a fan of the hype around AI safety when it comes to fighting against dangers that aren’t remotely close to materializing (i.e., AI kills us all scenarios), vast research is already showcasing the risks of misuse of these models.

Just like social media tips people over the edge, AI can too, even encouraging the dangerous behavior that might result in self-harm or even suicide or murder.

Consequently, I predict that a growing mass of interest will move into mechanistic interpretability research, the area that studies models deeply to predict their behaviors in advance and mitigate them.

Closing Thoughts

To summarise, we continue to see intense money action in the space, led by the huge, $13 billion funding round by Anthropic. Currently, out of the four most valuable private companies on the planet, three are AI companies (OpenAI, Anthropic, and xAI), and all are less than ten years old (the other company is SpaceX).

We also had Google’s massive court victory, reviewed AMD’s super ambitious two-year plan, and discussed a new training method that could soon be a staple of the industry.

All in all, it seems like another week in AI, another week in progress type of week… if not for the fact that the latest news makes me feel we are moving backward.

Because the biggest takeaway for me this week is how unsafe these tools still are. Without proper AI safety research, smarter models are actually more dangerous than dumber models, resulting in an inverse correlation we really do not want to see.

I remain a huge skeptic on ‘AI doomerism’, but we can’t deny that these models are already enough to make people do really dumb stuff. Unlike ‘Terminator AI’ sci-fi, sycophany and jailbreaks are real issues that can cause models to promote very dangerous behavior to people who are most susceptible to being influenced by them.

Tranluce’s research shows how poor the methods against such jailbreaks still are. But Transluce’s results also show us something we can be excited about, what I’ve been rambling about for months: RL fine-tuning as a key driver of progress.

Transluce’s research demonstrates how to optimize LLMs for virtually any purpose, and the models just work. This is the key to enterprise adoption, because it guarantees specificity, high accuracy, and unlocks goal-oriented training: the possibility for AIs to be optimized for our precise use case.

We are optimists here, so while the lack of research and investment in AI safety is not looking great, AI-to-business-metric training appears within reach. And that, my friend, is what will make AI omnipotent, and omnipresent.

Until Sunday!

THEWHITEBOX
Join Premium Today!

If you like this content, by joining Premium, you will receive four times as much content weekly without saturating your inbox. You will even be able to ask the questions you need answers to.

Until next time!

For business inquiries, reach me out at [email protected]