OpenAI's $20k/month Agent, GTC 2025, & More

THEWHITEBOX
TLDR;

Today, we cover the upcoming AI event of the year: NVIDIA’s GTC 2025.

We then discuss several technological announcements, like a state-of-the-art reasoning model from China that beats DeepSeek R1 despite being 20 times smaller, the discovery of Google’s Co-scientist, and a new LLM-as-a-Judge model.

Furthermore, we discuss Apple’s $14k consumer-end supercomputer, as well as OpenAI’s latest rumored agent plans offering tiers up to $20,000/month (yes, that number is real).

Finally, I will present a demo of what’s probably the most viral AI product these past few days, Sesame, which you can try for free.

EVENT OF THE WEEK
NVIDIA Deep Dive 2.0

Welcome back! This week, we have a special edition. Our newsletter, packed with business leaders and AI insiders, has caught the attention of none other than NVIDIA.

Thus, I’m happy to announce that they have decided to support this week’s newsletter ahead of their GTC event, the event that sets the pulse for the AI industry every year, which is coming up in a few weeks and has free registration.

But why should you care about this event as much as I do?

NVIDIA’s stock (NVDA ( ▼ 5.07% ) ) represents the market's pulse toward AI. Understanding NVIDIA is understanding AI public investing. Despite this, the company’s value proposition is deeply misunderstood. It’s seen as a chip designer and nothing more, which is terribly misleading. Today, we are diving deep into NVIDIA’s true value proposition and strategic plans for the future.

But if you wish to understand NVIDIA’s strategy and the industry's direction from the words of its leaders, you must attend GTC 2025. Needless to say, GTC 2024 saw the announcement of Blackwell GPUs, Infiniband and Spectrum networks, Project Gr00t for robotics, and personal computers like Jetson, all staples of NVIDIA’s record-setting past year.

Consequently, the world will once again pay unwavering attention to the more than 1,100+ free sessions the event offers. Moreover, I will point out the exact sessions I will attend.

And without further ado, let me explain to you NVIDIA’s strategy for 2025.

Every Paradigm Needs Its Thing

When we covered NVIDIA in our deep dive a few months ago, we went beyond the usual analysis. That day, we learned that NVIDIA’s strategy went well beyond chips.

Nonetheless, NVIDIA is positioning itself in five “new” categories: consumer-end hardware, physical AI, simulations, autonomous vehicles, and cloud computing.

And while all five are worth discussing, today, I’ll focus on those that impact you the most: physical AI, simulations, and cloud computing, also known as the agentic plays.

But why?

The thing with agents, which are all the rage right now, is that… they mostly don’t work. In fact, they are a distant mirage of what their marketing teams advertise.

Exploring AI’s Limits

Everybody says 2025 is the year of agents, but I prefer the following view: 2025 is the year of robotics and reward engineering.

But why? Because agents are a lie right now due to their limitations.

If you look at most ‘agentic’ products in the market, they are mostly terrible. Even the most known successful agent implementation, OpenAI’s Operator, feels a lot like the Apple Vision Pro: you use it once, say ‘it’s cool,’ and never use it again.

And at least Operator works, unlike:

And the list goes on. So, if this is the year of agents, which still could be, what are we missing? Well, three main things:

  1. We suck at reward engineering.

  2. There’s no embodiment.

  3. Agent frameworks, the main tools for developers to create agents, are terrible, rushed products.

Acknowledging this, and ahead of NVIDIA’s highly anticipated GTC event, these limitations have forced NVIDIA to evolve from being merely a hardware provider to taking proactive steps to unlock the next AI paradigms: robots and software agents.

Rewards, Simulations, and Blueprints

Regarding reward engineering, we must first clarify what a reward is. As we have discussed previously in this newsletter, Reinforcement Learning, or RL, is the ‘niña bonita’ (the ‘cute girl’ in Spanish, an expression meaning the center of attention) of the AI industry over the last few months, because it just works and has allowed us to transform limited models like GPT-4o into a totally different beast in o3.

Reinforcement Learning (RL) as a training method where you reward good behavior and punish bad actions. Think of it as puppy training; we reward the puppy with treats until it learns the pattern.

Therefore, we need a good way to drive the model’s learnings to make RL work. In other words, we need good rewards. Sure, you can let the model do its thing by exploring ways to solve the problem, but we need a way to measure whether the outcome is correct or wrong so it can learn from it.

And while defining a reward in areas like maths (where 2+2 always equals 4) is easy, in domains where automatic verification isn’t available (like in creative writing) or worse, in robotics domains where we need to define “correct actions” for a robot with hundreds of independent joints, defining the reward to measure outcomes is basically impossible for a human.

This perfectly illustrates why most frontier AI reasoning models are focused on maths or coding; it’s not “just because”, it’s mainly due to the fact that we don’t know how to design rewards in areas where they can’t be automatically computed.

And what does all this have to do with NVIDIA? Here, the company is innovating by using AIs themselves as reward engineers.

Using examples like Project Eureka, which we covered here in the past, NVIDIA proved that using AIs in iterative loops allows the creation of reward functions so impressive that they can make robots maintain equilibrium in a Yoga ball. If you think this is not hard, try and describe the position of every robot joint during this exercise while balancing it across a sidewalk. Good luck! 

In simple terms, AI reward engineers involve an accelerated loop (only possible with simulated environments—more on that in a second), where an AI consistently tunes rewards until the desired outcome is observed, leading to remarkable examples like a robot performing pen-handling movements that are real challenge even for the best CGI experts.

The AI reward engineering loop

Moving on to the following agent issue, the agents that recent Turing awardees—’the Nobel Prize in computer science’—Rich Sutton and Andrew Barto defined, which we are still regarded as the agents we aim to build today, included perception. In other words, the agent could not only observe the environment but perceive it, too.

However, our current agents mostly lack a body and, with that, the perceptive capabilities to learn from the physical environment. NVIDIA argues that we need physical AI to make agents a reality (as so do the abovementioned scientists). And to make physical AI real, these robots require simulated environments for training, which NVIDIA provides.

But why do we need simulations?

Training a robot in real life is difficult because its physical embodiment breaks frequently during training. This can be dangerous to the engineers and, overall, painfully expensive. Therefore, we build physics-aware simulations that mimic the real environment for seamless training.

Additionally, these environments and their digital twin robots can be parallelized, drastically reducing training times (hence the term ‘accelerated loop’ I mentioned earlier).

Consequently, most robots, including the Figure AI robots we discussed on Thursday, are trained entirely in NVIDIA-powered environments.

In the robotics paradigm, NVIDIA isn’t only the hardware provider, it’s also the training environment provider.

After simulation training, the model is ‘transferred’ into its physical self, as when Hollywood films depict people downloading their cognition into a physical entity (a process called ‘sim-to-real’ in AI parlance). This method is how most robots you see in the news are being trained.

Finally, the third great agent problem is the tools for building them: agentic frameworks. These frameworks allow developers to abstract much of the complexity of working with AI agents and concentrate on high-level logic, using examples like LangGraph.

The issue is that these products are generally rushed and painful to debug. The experience is so bad that Anthropic, the company behind Claude models, openly recommends avoiding them entirely unless necessary.

Instead, NVIDIA has decided to take matters into their own hands and offer blueprints, where they prepare much of the scaffolding necessary for agents (tool calling, retrieval, and so on) and make it all accessible via an API that connects directly to the GPU node.

Some blueprints offer integration with agentic frameworks, but from my own experience, the more low-level your team stays, aka the closer to the model API, the better.

But what makes blueprints more appealing?

Well, the fact that NVIDIA has designed the GPUs allows them to prepare custom kernels (pieces of software to govern the behavior of the GPU) and abstractions that companies building agentic frameworks can’t, ensuring a seamless—and cheaper—experience.

What’s more, they can offer them via a serverless offering called NIMs (NVIDIA Inference Microservices). This means that the solution is basically a turnkey deployment flywheel for enterprises and customers who want to use AI with all the necessary agentic abstractions. This last point turns NVIDIA into another of the so-called Hyperscalers, eating into its market share.

Long story short, NVIDIA is much more than a fabless company (chip designer), but a robotics, agents, and cloud computing play.

But enough of my opinions. There are some questions I can’t answer for you, which brings us back to NVIDIA GTC.

Getting Enterprise/Investing Answers

The event will take place from the 17th to the 21st of this month and feature 1,162 sessions covering all things NVIDIA. It will provide over 800 technical and more than 300 business sessions, ensuring that a wide manifold of them will perfectly align with your interests and expertise.

Even if you can’t attend, you should really make the effort for someone in your team to do so.

What I like the most about this event, which convinced me of doing an open collaboration (which I never do), are three things:

  1. Registration is completely free

  2. Their customers, mostly Hyperscalers and AI labs (Microsoft, Meta, and so on), will also participate, ensuring this is not a thousand hours of ‘look how great NVIDIA is’ but instead ‘look how customer ‘x’ is solving problem ‘y’ with AI. ’

  3. It’s industry-verticalized. Whether you’re interested in healthcare, cybersecurity, agriculture, aerospace, energy, financial services, and the list goes on, there’s a session for you. I would bet a lot of money there are at least five you will find a must-attend.

Of course, I’m planning to walk the talk, so here are a handful of sessions I will attend:

Closing Thoughts

With a potential recession looming over our heads due to the fear of a tariff war, war escalation in Europe, decreasing consumer spending, and growing tensions in the Taiwan Strait, being up to date with what’s going on in AI is crucial for you not to overreact to the expected volatility.

Moreover, we have seen how agents are still a longshot from what incumbents say. They need expertise to work; it’s not a plug-and-play solution and probably will never be. NVIDIA knows this and is building tools to prepare for that reality. Don’t listen to marketing teams; let the experts do the talking.

GTC 2024 provided me with many of the answers I sought then, so I sincerely believe GTC 2025 will be worth every momentRegister today, and please get in touch with me with your best takeaways from the event; I would love to chat about it!

Now, on to next week’s news.

HEALTHCARE
Google Co-Scientist Is Really Powerful…

A few weeks ago, we discussed Google’s Co-Scientist tool, a Gemini-powered Google product that helps scientists speed up their research. And boy, does it speed up research.

Professor Jose Penadés's team had been working for years to prove why specific bugs were immune to antibiotics, finally reaching a conclusion a few weeks ago.

But before publishing, he sent a small prompt to the co-scientist… which reached the same conclusion in 48 hours. The Professor was out of words as he was adamant about this research being unpublished, meaning that the AI had arrived to the same conclusion without being inspired by the Professor’s study.

TheWhiteBox’s takeaway:

In this newsletter, we are great defenders of what I believe is AI’s biggest superpower: the ability to discover new patterns in data that lead to new discoveries.

In our 2025 predictions, we predicted an AI-based scientific discovery. Although this doesn’t quite meet the prediction, as humanity found that pattern first, it’s encouraging to see that some AIs can perform this pattern matching for us. I feel pretty good about my odds of being right.

LLM EVALUATION
The World’s Best LLM-as-a-Judge

In the NVIDIA story above, we discussed RL rewards as crucial for frontier AI model learning. However, as mentioned, they are difficult to define in non-verifiable domains like math or codingFor example, how do we define a reward to teach a model to write more creative science fiction? This is subjective at best and “impossible” to measure in most instances.

One common way to circumvent this is to use LLM-as-a-judge systems, in which the same AI (self-judge) or an auxiliary one critiques the reasoning model's thoughts to give it feedback and steer it during problem-solving.

Now, startup Atla has released Selene 1, the world’s best LLM-as-a-judge model explicitly trained to serve as an evaluator. It beats every single frontier model (for this particular task, of course).

TheWhiteBox’s takeaway:

LLM-as-a-judge models are the industry’s best bet in non-verifiable domains beyond maths or coding. However, based on how top AI labs are focusing on verifiable domains, it seems they aren’t really that compelling.

But maybe the answer all along was to develop better LLM response evaluators. If so, if AIs truly start conquering non-verifiable domains, then the ‘feel the AGI’ claims could start making some sense.

CHINESE AI
Alibaba’s New Amazing Reasoning Model

After some time hinting on their new model, Alibaba has hit a homerun. They have presented Qwen QwQ 32B, their first reasoning model release, which by the way is Apache 2.0 licensed (totally free to use and commercialize, an absolute gift).

The model, despite its smallish size (just 32 billion parameters, more than twenty times smaller than DeepSeek R1), fares incredibly well against the latter, just a month after its release.

With hardware like the Apple Mac Studio (more on that below), you can run a frontier reasoning model from home without paying hefty OpenAI API fees.

TheWhiteBox’s takeaway:

Despite actively comparing it with DeepSeek’s model, Alibaba gives credit where credit is due and acknowledges that the training method was identical to that of DeepSeek: outcome-supervised verifiable reward reinforcement learning.

That was a mouthful, but what does that even mean?

The DeepSeek (and, to be fair, Allen Institute of AI) playbook is to define an RL training pipeline in which you feed the model problems that can be verified as correct or wrong and simply let the model try stuff (guess), verify, and repeat.

As we discussed in the NVIDIA story above, the reward is automatically verifiable here (the issues mainly involve math and coding tasks that can be automatically compiled), so we have a reliable method to conduct lengthy explorations where the model investigates various ways to solve a problem until it arrives at the solution.

QwQ 32B is further proof that “guess verifiable responses, verify, repeat” is the prime frontier AI reasoning model training method right now. But what’s most remarkable about this release is how small the model is. This opens the ‘SOTA at home’ door, where people can run frontier models inside their computers without requiring Internet access or paying hefty API fees.

And if this paradigm is happening (which I firmly believe it is), this is great news for five companies offering consumer-end AI hardware: Apple, NVIDIA, Qualcomm, ARM, and Microsoft, but mainly the first two.

HARDWARE
Apple Presents a Monster AI Computer

Apple has presented a new computer, the Mac Studio, which can cost upwards of $14k for the most cracked specs, that aims to capture the growing consumer-end AI market.

Interestingly, they offer two form factors depending on your needs. The M4 Max is for more CPU-based tasks (less parallelizable), while the M3 Ultra is for AI workloads (less powerful CPU cores but a more parallelizable and powerful GPU architecture).

TheWhiteBox’s takeaway:

Don’t be misled by the numbering; the M3 Ultra is actually a much more powerful computer (and more expensive), at least for AI. Unlike gaming, where workloads are much more bottlenecked by total core throughput, AI workloads (particularly inference) are much more memory-bottlenecked.

But why?

Without going into too much detail, AI workloads require constant data transfers between cores and memory. Here, the most important number is memory bandwidth, which measures how many GigaBytes can be sent per second in/out of memory.

Additionally, due to model size and the fact that, by architecture, LLMs are run entirely for every prediction, they have to be stored in volatile memory (RAM) instead of the hard disk. Thus, RAM size is the other key driver of whether your models run on your computer.

And in both cases, Apple achieves an astonishing record for consumer hardware, offering up to 512 GB of RAM (for reference, smartphones typically have 6-12 GB, and laptops rarely exceed 16 GB). Moreover, this is unified memory (meaning GPUs do not have to access data through the CPU), enhancing retrieval and writing speeds.

To make matters more incredible, they offer an insane 800+ GB/s memory bandwidth, which, if sources are correct, is larger than what NVIDIA plans to offer with its Digits supercomputer, which is expected to be released in May (although its price point will be 4 times smaller).

But is this computer worth it, based on the price tag?

As outrageous as the price seems… it could be? The number of people running AI workloads at home is growing quickly (including myself).

The fascinating thing is that with the fully-specked Mac Studio, you can run the entire DeepSeek R1 model without a problem on this computer (and, naturally, the QwQ 32B model we have just covered, too). This means you can run state-of-the-art models on this computer, something no one would have believed a few months ago.

And the crucial thing is that model sizes continue to decrease. The days of running a swarm of agents at home helping you do your work aren’t far; this hardware is as good as they get for that (personally, I still want to wait until NVIDIA ships the Digits supercomputer to decide what’s best for me).

CHATGPT
A $20,000/month ChatGPT

According to rumors, OpenAI is planning three new pricing tiers for ChatGPT agents, ranging from $2,000/month to $10,000/month and even $20,000/month. Yes, those numbers seem to be real.

The MSFT ( ▼ 3.34% ) owned company seems to be readying the release of new agent products that will allegedly be sufficiently powerful to justify the insane prices:

  • Business professionals ($2k/mo)

  • Advanced developers ($10k/mo)

  • PhD-level researchers ($20k/mo)

TheWhiteBox’s takeaway:

Until now, nothing OpenAI (or anyone, for that matter) has shown justifies a $2,000/month price tag, let alone $10k or $20k (I still haven’t felt the urge to pay the $200/month Pro subscription either). Without more information, the ‘PhD-level researcher’ claim is embarrassing, like saying that a database is as smart as a PhD simply because it knows more.

Are they hiding a breakthrough model powered by GPT-4.5, like o4 or something (GPT-4o powers o3) that finally automates a full-time developer job?

Or are they simply trying to squeeze as much juice of the ‘OpenAI remains the best product’ as they can before they are commoditized (which, unless this internal breakthrough occurs, feels totally inevitable)?

We’ll have to wait and see.

VOICE AI
Sesame Will Leave You Speechless.

A new product named Sesame has gone extremely viral in recent days. The reason? It feels scarily human. This AI is a conversational chatbot that matches how humans speak with uncanny accuracy, making it almost impossible to guess whether it’s an AI or not.

And you don’t need to trust me; you can try it for yourself for free.

TheWhiteBox’s takeaway:

As someone who earns a living with AI, I’ve dabbled with numerous conversational AI tools. But none comes close to this one, not even OpenAI’s Advanced Voice Mode, which feels like buggy compared to this model.

I’m honestly speechless (pun intended).

That said, please remember to resist the temptation to anthropomorphize these things. They are simply sequence-to-sequence models, like ChatGPT, but with an encoder/vocoder architecture that lets them speak. This one has been particularly well-trained to capture the nuances of voice, but remember, it’s still an emotionless machine!

Thanks for reading, and see you on Sunday!

THEWHITEBOX
Join Premium Today!

If you like this content, by joining Premium, you will receive four times as much content weekly without saturating your inbox. You will even be able to ask the questions you need answers to.

Until next time!

For business inquiries, reach me out at [email protected]