In partnership with

THEWHITEBOX
TLDR;

Welcome back! This week, we take a look at stellar releases, including OpenAI’s Codex Max and Google’s Nano Banana Pro. Trust me, their capabilities are going to impress you, particularly the latter.

We also cover NVIDIA’s earnings, going way further than most analysts and headlines to show that NVIDIA continues to rock, but its foundation and competitive advantage may be eroding.

Additionally, we’ll take a look at other news, including regulation, new ways to measure progress, and more.

This week comes packed with insights. Enjoy!

THEWHITEBOX
Things You’ve Missed for not being Premium

On Wednesday, Gemini 3 Pro made its debut and stole the show, and we covered the key takeaways.

But we also talked about another big release, Grok 4.1, the lies being told about data center water consumption, and several other product and market news items of interest.

Simplify Training with AI-Generated Video Guides

Simplify Training with AI-Generated Video Guides

Are you tired of repeating the same instructions to your team? Guidde revolutionizes how you document and share processes with AI-powered how-to videos.

Here’s how:

1️⃣ Instant Creation: Turn complex tasks into stunning step-by-step video guides in seconds.
2️⃣ Fully Automated: Capture workflows with a browser extension that generates visuals, voiceovers, and call-to-actions.
3️⃣ Seamless Sharing: Share or embed guides anywhere effortlessly.

The best part? The browser extension is 100% free.

PUBLIC MARKETS
NVIDIA Presents Quarterly Earnings. Bombs Them.

Two days ago, we had NVIDIA’s quarterly earnings, the seminal event in public markets these days because it’s what most investors use as the “market’s compass” to see whether to run for the hills or continue believing in the AI buildout (recall that the AI buildout happening in the US does not only affect US companies, but has profound impact on Asian markets).

Luckily, the results were record-setting again. NVIDIA reported record revenue of $57.0 billion for the quarter, up 22% from the previous quarter and 62% from the same period a year ago.

  • Gross margin (top-line revenues minus direct costs of building their products) on a GAAP basis was 73.4%, up about one percentage point from the previous quarter but modestly down year-over-year.

GAAP stands for Generally Accepted Accounting Principles, a set of accounting rules public companies must follow when presenting results. Companies also share non-GAAP results they believe better reflect their business's real revenue/cost structure.

  • Net income reached approximately $31.91 billion, and diluted earnings per share were $1.30, both significantly improved from the prior year (i.e., the company is more profitable)

  • They also generated strong cash flows, with operating activities bringing in $23.8 billion in the quarter.

Unsurprisingly, the data center segment once again drove the majority of growth. Revenue in that segment reached $51.2 billion, up 25% from the previous quarter and up 66% year-over-year.

Within that, compute revenue was $43.0 billion (up 27% sequentially and 56% year-over-year), and networking revenue was $8.2 billion (up 13% from the prior quarter and 162% year-over-year).

The gaming segment, NVIDIA’s traditional main business, grew 30% year-over-year to $4.265 billion, though it slipped slightly (-1%) versus the prior quarter.

Professional visualization revenue (GPUs for design and other rendering stuff) was $760 million, up 56% year-over-year and 26% sequentially, and automotive revenue hit $592 million, up 32 % year-over-year and 1 % from the prior quarter.

Recall that NVIDIA has partnered with Uber to create a fleet of autonomous driving cars, so this is a critical segment to follow over the next several quarters.

On the cost side, operating expenses rose:

  • GAAP operating expenses (e.g., wages, R&D, etc.) increased 36% year-over-year and 8% sequentially; non-GAAP operating expenses were up 38 % year-over-year and 11% sequentially.

  • Inventory holdings rose to $19.8 billion as of quarter end, up from prior periods, reflecting extended commitments for supply and lead-time parts.

  • The company also noted that multi-year cloud service agreements stood at about $26.0 billion, up substantially from $12.6 billion in the prior quarter, supporting long-term revenue commitments.

But the important parts came with the guidance data (how they believe they’ll perform over the next quarters (recall that NVIDIA is famous for always offering only next-quarter guidance).

For forward guidance in the fourth quarter of fiscal 2026, NVIDIA expects revenue of approximately $65.0 billion, plus or minus 2%.

But the real change came when they introduced guidance for the full year 2026. More formally, the company said it has “visibility” into roughly US $500 billion of revenue tied to Blackwell and Rubin through the end of calendar 2026.

In layman’s terms, they are projecting $500 billion in revenues from just Blackwell and Rubin GPUs (their current and next year’s generations of data center GPUs), a value that would put them well over Apple’s revenues, while being the most profitable company by far amongst Big Tech.

To put into perspective, that puts their forward PS (price-to-sales) at just 10, and forward PE (price-to-earnings) at ~20 (where they maintain net margin).

In other words, the multiple at which NVIDIA is valued today is “just” 10 times larger than its future sales, and 20 times over its future profits (if they maintain profitability margin), making them appear cheap at current market valuations relative to the price of the average large-cap US company (forward PE of 21).

In a nutshell, these appear to be excellent numbers (the company rose a lot in the aftermath of the earnings call).

But are they?

TheWhiteBox’s takeaway:

The first thing we must highlight wasn’t even mentioned: Google. 2026 is poised to be the TPU’s breakout year, the year Google starts officially competing with NVIDIA and AMD in the data center business.

TPUs, Google’s alternative to NVIDIA/AMD GPUs, are cheaper and more performant on a performance-to-cost ratio, and also boast the most powerful system-level supercluster on the planet, Ironwood.

In layman’s terms, the most powerful AI hardware system you can use today, measured in raw performance, is not NVIDIA’s Blackwell servers; it’s Google’s TPUv7 Ironwood pods that scale up to +9,000 TPUs working together.

This isn’t a true apples-to-apples comparison due to the topology differences (the way GPUs in the cluster communicate), but NVIDIA’s largest cluster on sale is 72-GPUs, quite the difference.

But beyond this potential new entry in Google, there are NVIDIA-specific things that should be flagged as concerning.

For once, inventories are rising quite a bit. Inventories are $19.8 billion as of 26 October 2025, versus $10.1 billion at the start of the fiscal year, so they have roughly doubled in nine months.

Larger inventories aren’t the end of the world if those products are eventually delivered and charged, but when we’re talking about so much money, it carries undeniable risk and repercussions:

  1. NVIDIA is at higher risk of expending costs on goods that are eventually not sold (like the H20 write-offs after China canceled the orders).

  2. NVIDIA is being forced to commit capital to secure manufacturing supply from TSMC way more in advance (TSMC “forces” NVIDIA to book demand way before it used to), significantly increasing the time lag since NVIDIA pays for those allocations before it charges for those products.

  3. It makes NVIDIA less cash liquid.

Accounts receivable (products delivered but NVIDIA has yet to receive payment) are $33.4 billion at quarter‑end, up from $23.1 billion at the prior fiscal year‑end and $17.7 billion in the same quarter a year ago. This is 59% of the current quarter’s revenue.

This isn’t dramatic because NVIDIA is still increasing free cash flow, but it signals the increasing dependence on a few customers that are simply telling NVIDIA, “I’ll take longer to pay you back, deal with it.” Hold this thought for later.

As for gross margins, they are slightly better than the previous quarter but worse than last year’s, at ~73% versus 75% in 2024.

This is expected, as Blackwell is a rack-scale system (meaning it’s much larger and with many more parts) and, importantly, includes a higher percentage of HBM memory for every chip, which gives the HBM suppliers (mainly SK Hynix in NVIDIA’s case) much more power to increase prices, and NVIDIA has to live with it.

NVIDIA doesn’t design or manufacture memory. HBM allocations (the amount of memory per chip) are increasing because reasoning models, the frontier of AI today, all require much more memory during generation due to longer average response length.

And with higher competition, both from Google and with AMD potentially “catching up” with their first rack-scale server, the MI450x series, gross margins will undoubtedly continue to fall (I’m pretty sure this is already priced into AMD and Google’s prices, so I’m skeptical of this being a winner trade, unlike HBM, which still appears terribly mispriced).

And finally, what’s possibly the most significant risk: customer concentration. Four direct customers account for 22%, 17%, 14% and 12% of accounts receivable, a combined 65% of outstanding trade receivables—we both know who these companies are.

The company also discloses that it generates “a significant amount” of revenue from a small number of indirect customers buying through cloud or OEM partners, and that one unnamed AI research and deployment company contributed a “meaningful amount” of revenue this quarter via cloud channels.

In a nutshell, this is bad because when a company is so exposed to a small subset of customers, things go very wrong if even one of them fumbles. We just saw this with CoreWeave: a single customer delay caused the company to miss the year’s guidance, which analysts expected, sending the stock crashing. Only one customer is delayed.

The problem is that I believe 2026 could very well be the year NVIDIA experiences a situation like that, so it'll be interesting to see what happens if it does. But for now, NVIDIA remains strong enough to sustain the weight of the entire market on its shoulders.

HARDWARE
Trump Administration to Ban AI State Regulation

As reported by The Information, the White House is preparing an executive order directing federal agencies to challenge or override state laws regulating AI.

The draft order would establish an “AI Litigation Task Force” within the US Department of Justice to sue states for laws that the federal government argues interfere with interstate commerce or are preempted by federal regulation.

The US Department of Commerce would be tasked with reviewing state AI-regulation laws within 90 days and could withhold federal broadband and other funding from states whose laws are judged overly burdensome.

Meanwhile, the Federal Trade Commission would assess whether state laws that require AI systems to alter their truthful outputs or compel disclosures conflict with the FTC Act, and the Federal Communications Commission would initiate proceedings to establish a uniform federal standard for AI disclosures, potentially pre-empting conflicting state rules.

The objective is to prevent a patchwork of divergent state regulations and instead promote a unified national AI policy. The plan reflects significant support from the tech industry, which views varying state laws as a compliance burden. However, the order remains in draft form and has not been officially announced by the White House.

TheWhiteBox’s takeaway:

The heavy influence of Silicon Valley billionaires in the Trump Administration is undeniable. That said, personally, I think they are right on this one. Let me explain.

I understand the importance of respecting the state’s freedom to do what it thinks is best for its citizens. But it doesn’t help the case to look at the first innings of some of these regulations, especially those from California, which come across as nothing but outrageously stupid, arbitrary, naive, and clearly overly political.

In California, the now de facto law known as the Frontier Artificial Intelligence Act, signed into law by Governor Newsom on September 29th, is obsessively x-risk biased, meaning they want to protect from AI’s catastrophic risks; literally trying to “protect” humanity from AI taking over the world; from fears nobody can fully explain how and why they would happen, but sound very scary.

This is obviously just a pathetic way of laying the groundwork for a future open-source ban.

Put simply, this type of regulation can be translated as: “We are going to design laws that prevent all but a few selected organizations from building this hazardous technology that is so dangerous that only a few selected beacons of light should create it.”

Put another way, they want to ban open-source so they can have free rein over AI without having to compete on price with free software.

If that’s the way California intends to protect its citizens, making them pay more by creating artificial barriers to entry so that Anthropic (an AI Lab that isn’t shy of wanting regulatory capture to protect its business) and others can hike up prices for eternity, that’s a very “interesting” way of protecting Californians.

That’s my core issue with AI regulation in general: for the most part, it’s just regulatory capture branded as protection.

By the way, Democrats aren’t the only ones in favor of state regulation. Ron DeSantis, Florida’s Governor, has shared the same bias, so this is a bipartisan view.

Instead, if the US is truly at an AI war with China, it needs to let companies thrive because that’s precisely what China is doing with its Labs. China’s CCP has a firm grip on its AI startups and can influence things like what chips are used for training or inference, yet it gets out of the way when it comes to letting its companies grow and compete.

Yes, I know, this gives considerable leverage and power to already mighty entities like Big Tech, but I don’t see other way to truly have a chance of defeating state-level Chinese AI if US counterparts have to deal with 52 different state regulations, ones left-leaning, the others right-leaning, and at various degrees of permissiveness and accountability, all the while competing with strongly underappreciated Chinese Labs that are much more sophisticated that you may realize.

If that scenario materializes, AI Labs like OpenAI would likely have to train AI models for specific states, an absolute nightmare and a very reckless way to drain even more of the already extremely scarce cash reserves these companies enjoy.

And it’s not like the US doesn’t have a clear example of how you can regulate yourself out of the race. There’s a reason Europe has been left behind.

Regulation has killed Europe’s chances at the most critical race in the last century. Let not the US fall into the same mistakes. That said, I would call for quite stringent federal-level regulation that makes sure AI Labs can’t trample on US citizens’ rights (like an ‘AI tax’ that covers the revenue losses from potential massive AI-led job displacement), while also preventing clearly unfair scenarios like seeing OpenAI getting bailed out if it gets too extended for its own good.

But doing so in 52 different ways is not the best approach.

Book a 1:1 Coaching Call with Me

Book a 1:1 Coaching Call with Me

This is a 1:1 coaching session in which we'll cover your concerns over the AI industry, on your terms and on the topics of your choice. The only thing you have to do is book the call on my calen...

$249.99 usd

AINOMICS
Stanford’s Hazy Research Launches IPW Benchmark

A new metric, intelligence-per-watt (IPW), has been introduced by a Stanford group to measure how efficiently local inference systems (model + hardware) turn energy into useful “intelligence”.

The idea is simple: as demand for AI skyrockets, our centralized data-centres face scaling, cost, and energy bottlenecks. This raises the question of shifting more inference to local devices.

In their study, the researchers evaluated over 20 modern local-language models (<= 20B parameters) across real-world single-turn chat and reasoning queries—about 1 million in total. They ran the models on both local accelerators and enterprise-grade hardware, always with batch size = 1 to reflect real-world local deployment (meaning the model is serving a single user—Hyperscalers can serve hundreds of users simultaneously per cluster).

Their findings are noteworthy: local language models (LMs) already correctly answer about 88.7% of the chat/reasoning queries, with accuracy improving about 3.1× from 2023 to 2025.

Importantly, local inference efficiency (IPW) improved about 5.3× over the same period—split roughly into 3.1× from better models and 1.7× from better hardware. However, local accelerators still lag: one comparison showed running the same model on a laptop-/device-level accelerator had about 1.5× lower IPW than on enterprise hardware.

But what to make of this?

TheWhiteBox’s takeaway:

The bottom line is that, by making “intelligence per watt” a north-star metric, model designers and hardware makers can help shift more AI workloads from data centres to local devices (phones, earbuds, glasses, laptops), and alleviate some of the energy constraints we are going to face over the following years.

The truth is that we soon won’t have a choice in this regard. I’m a firm believer that local inference hardware is going to become a precious asset in the following years (especially as markets like DRAM are extremely tight, which means that consumer-end memory hardware is going to increase in price by a lot), as AI clouds struggle to keep up with demand.

Please make no mistake, I’m not saying there is no bubble; my primary concern with the industry is not demand for AI workloads, but the monetization of the business models is what is much less clear than incumbents will make you believe (e.g., ChatGPT or Gemini may be seeing huge demand (Gemini processes +1,300 trillion tokens per month), but only single-digit monetization of that demand).

REAL-WORLD USE CASES
Predicting the Weather

Few prediction use cases are as important or affect our daily lives as much as weather forecasting.

Now, Google DeepMind (with Google Research) has developed a new weather-forecasting AI model called WeatherNext 2 that is faster (about 8 times) and higher-resolution (down to hourly) than their previous model (which was already SOTA).

WeatherNext 2 can generate hundreds of possible weather scenarios from a single starting point, using just one TPU (which means it’s pretty small), while traditional physics-based models would take many hours on a supercomputer.

The model uses a novel architecture called a Functional Generative Network (FGN). The FGN injects “noise” into the model’s internal representations, ensuring forecasts remain physically realistic while capturing a wide range of possible outcomes.

In other words, they make the model’s life “harder” by introducing noise in its internal representations (this is like telling a human who’s trying to predict that tomorrow is going to be sunny just because today it is too, with things like “but what if the evening gets cloudy?”, “What if the wind picks up?”), forcing it not to jump to conclusions too fast and instead consider that the weather is a messy thing to predict.

Even though the model is trained only on individual (“marginal”) weather variables (like temperature at a location, wind speed at a height, humidity), it learns to generate coherent joint predictions (how many variables vary together across space and time), which is important for forecasting complex events (for example region-wide heat waves or wind-farm power output).

In testing, WeatherNext 2 outperforms the previous WeatherNext model on 99.9% of weather variables and lead times from 0-15 days.

The technology is already being integrated into products such as Pixel Weather, Google Search, and Google Maps (including Maps Platform’s Weather API), with wider roll-out expected in the coming weeks.

TheWhiteBox’s takeaway:

This is the kind of thing that proves my overall bullish sentiment about Google: no AI Lab has a deeper reach than DeepMind. Their work extends beyond Generative AI text models, as with OpenAI, Anthropic, or xAI, and they release frontier-level models across many other areas, from weather to protein folding.

I feel like I’m beating the drum too much already, but I just want to make sure the message goes through: Google is dominating.

MODELS
Testing the Longest Possible Task on LLMs

Source: METR

One of the leading indicators of progress in frontier Generative AI, at least as assessed by incumbents, is measuring a model’s time horizon, the longest task models can do successfully, measured as the amount of ‘human time’ it “saves”.

In other words, we are measuring the longest human task that can be automated using AI (e.g., if the model’s score is 2 hours, it means the model can automate a task worth 2 hours of work).

And GPT-5.1 Codex Max, which we dive into in the next section, sets a new record of more than two hours (remember, these are tasks with zero human involvement), while Kimi K2 Thinking, arguably the best open-source model to date, scores just below an hour.

We have yet to see the results that Gemini 3 Pro got.

But why is this important?

TheWhiteBox’s takeaway:

Stupid maths benchmarks aside, that no one really cares about or barely defines what progress is (I talk about this more here), real progress should be measured by the degree of human job automation that an AI achieves (not saying we have to celebrate humans getting rapidly automated, but it’s an undeniable measurement of economically valuable output from AIs).

And this does just that. Also, it’s a great way to discern the gap between closed and open source models, which in this case is quite large, as GPT-5.1 Codex Max (terrible name, by the way) doubles Kimi K2 Thinking.

This is, of course, being championed by anti-open-source incumbents as clear proof that open-source models aren’t getting any closer to closed-source (a topic that has lately been used as a proxy to measure US vs China AI progress).

However, this doesn’t even get close to telling the complete picture. For one, it’s not proven at all that US models are more “advanced” than Chinese ones; the former are just bragging about having more computing power than the latter. In terms of algorithmic progress, you can even make an argument for Chinese models being “better” as they are way more efficient.

That said, today, compute per task is the most significant predictor of progress, so this is just a showcase of compute superiority, not algorithmic superiority.

And what’s worse, this comparison completely overlooks the most significant enabler of AI adoption: on-the-job training. That is, actual enterprise AI adoption will come as companies fine-tune open-source models by training them on their specific tasks, a potential model guaranteed to beat even the most cutting-edge closed-source model.

Examples like Cursor’s Composer 1, or the business offering of Thinking Machine Labs (they let you fine-tune open-source models on your task in a very straightforward way), are perfect examples of this. And guess what these examples all have in common:

They are all using Chinese open-source models.

So, if my thesis is true, and fine-tuning open-source models will be the primary driver of adoption, we’re about to see Fortune 500 companies running most of their AI workloads on Chinese models.

If that happens, if we really want to use closed/open as a proxy for AI geopolitics… who’s really ahead?

CODING
OpenAI’s New GPT-5.1 Codex Max Aims to Live Inside the Software Stack

OpenAI has released GPT-5.1 Codex Max, a new version of its coding-focused model designed to handle longer and more complex development work.

The model is trained with a “compaction-native” approach (more on this below), offers an “extra high” reasoning mode, and is built to run autonomously for more than 24 hours at a time.

OpenAI reports that Codex Max delivers around an 8% improvement over the standard GPT-5.1 model on internal code pull requests, suggesting it is tuned for reviewing and generating real-world production code rather than just solving isolated coding challenges.

It also shows a considerable improvement on the ‘Terminal bench’ benchmark, rising to 58%, higher than Gemini 3 Pro’s score released a few days ago, showcasing how fast things have been moving these past few weeks.

The model is also allegedly much faster, mainly thanks to a 30% reduction in its response length for the same performance, meaning it requires less “thinking” to reach the same response.

Here, “thinking” is measured as the number of tokens the model generates during a response. AI models “think more” by allocating more compute to the task, which means more tokens. Thus, a reduction for the same performance means the model “thinks smarter.”

The release is aimed at situations where a model is expected to work across entire codebases, manage multi-step changes, and operate as a persistent development assistant rather than a simple autocomplete tool.

Moreover, by emphasizing long-running autonomy and greater reasoning depth, OpenAI defines Codex Max as a system that can handle refactors, migrations, and other tasks that unfold over many hours or days, with less need for constant human guidance.

TheWhiteBox’s takeaway:

Probably the most worthwhile change is the introduction of a feature called “compaction,” which allows the AI to work across multiple context window equivalents (i.e., if the model’s context window is 200k tokens, it can work with 400k or even 600k tokens).

In a nutshell, it’s a system that autonomously compresses context by pruning its history while preserving crucial context over long horizons.

This isn’t something new; Anthropic has been doing “compactions” for quite some time, but it’s unclear who does it better. Either way, this is vital for coding tasks where context can span large amounts of code, meaning that AI models struggle with tasks like refactoring (seeing the big picture of the code and making it more accurate, concise, and readable), unless features like compaction come into play.

Personally, I’ve grown to become a massive fan of OpenAI’s Codex models, and this is coming from someone who isn’t shy about criticising OpenAI for all its obvious flaws. However, this newsletter is not for me to showcase my biases, but to objectively call the shots as they actually are.

And honestly, they’ve become my go-to for coding, ahead of Claude or Gemini (even after the release of Gemini 3 Pro).

Gemini 3 Pro seems particularly good for highly specific, high-compute-requiring tasks, but I feel more comfortable with Codex as my daily driver for code.

For example, yesterday I added a fairly complex feature to my automatic invoice reconciliation software, which processes my invoices and bank transactions and pairs them in various ways.

This time, I wanted to have the transactions be stored in an SQL database (they were being read directly from the bank’s Excel file), and add two things:

  1. A unique hash identifier that takes date, invoice value, and transaction description fields, concatenates them, hashes the string using SHA256, and uses this as the unique identifier. This way, whenever my system tries to upload new transactions that may be duplicates, the hashing mechanism automatically identifies the duplicates and drops them.

Hashing is an irreversible, uncompressible process that guarantees the exact hash combination for a particular string (generating an ID such as ‘e220c723ff7f2bfa27e0c9ebf964bc3e0067b8557b908aa434249db95b965335‘) that can’t be reversed and guarantees uniqueness.

If this sounds too esoteric, let Einstein explain it more visually:

Source: Author using Gemini 3 Pro Image (more on that below)

  1. Adding several LLM fields, including “llm_explanation,” where an LLM (Gemini 2.5 Flash Lite in my case, as it’s a simple task) examines the transaction description field, and having access to my supplier database, tries to guess which supplier might be receiving the payment. This way, I’m no longer matching solely on invoice value; I'm also matching on transaction descriptions without requiring hardcoded rules (e.g., I can now prevent an invoice for meat from being matched to a transaction for a beer payment).

To do this, I first worked on the implementation plan with Codex (this wasn’t even Max, by the way), and Codex itself suggested the hashing system to avoid duplicates (I asked it to use UUID, and the model actually suggested the hashing method to prevent duplicates).

Then, I fed the implementation plan into another Codex instance and guided it through the entire process (more than seven steps and thousands of line edits).

It worked like a charm on the first try. The model even wrote the unit tests (I checked if the model was bullshitting me, but they were legit). All tests passed, and the feature worked right out of the gate. I kid you not, I wrote zero lines of code.

A word of caution, though, I still don’t believe in vibe-coding platforms. I still had to intervene multiple times and call bullshit when I saw it (I understand Python code very, very well and know crap when I see it). I don’t think these models are remotely close to true, no-strings-attached, no-verification-needed models… yet.

Furthermore, I know how to prompt these models. The secret to good execution is nothing but assembling a good plan and ensuring the model executes changes on a self-contained, step-by-step basis. That said, this is literally the definition of productivity; I would have taken weeks (as coding is not my primary job and my schedule is tight), and I pulled it off in a day.

IMAGE GENERATION
Nano Banana 2 is Here

Nano Banana Pro, the proxy name for Gemini 3 Pro Image, is Google’s new image-generation model and, to me, the most impressive AI release in a big, big while, way more impressive than Gemini 3 Pro itself.

This is, by far, and I can already say this without a doubt, the most advanced and smartest image-generation model the world has ever seen (and it’s also way faster than the closest rival, GPT-1 Image).

One of the most highlighted capabilities is infographics, such as the one above generated by Jeff Dean with the simple prompt: “Show me a chart of the solar system and annotate each planet with one interesting fact“, or the one I myself generated earlier with Einstein (here, the prompt was “Using Einstein as the teacher, draw a set of comic vignettes where Einstein explains the hashing part in the following text: <text in the previous section>”).

You can try nano banana 2 in Google’s AI Studio (which requires an API key) or in the Gemini app.

TheWhiteBox’s takeaway:

No more textual mistakes; extremely accurate with the instructions… It’s truly a step change in what one can do with AI.

This model has profound implications for many sectors: marketing, UI/UX design, or writing, because it’s the first image-generation model that can actually reason. This is a subtle yet significant change in how we build image generation models.

Traditional image-generation models take in an input and perform a ‘conditioned image generation’, where the model would perform a diffusion process (taking a noisy image and iteratively removing the noise while conditioning on your instruction, so that the ending result semantically matches your input).

“Draw a cat.” Source: NVIDIA

This is not how Nano Banana (nor OpenAI’s GPT-1 image) work. Instead, these are autoregressive, just like ChatGPT, meaning the image is predicted chunk by chunk, where each chunk is made of pixels, not words.

This introduces a unique feature: the AI model's ability to reason “during” generation. That is, Nano Banana will first reflect on the user’s input and over its own outputs.

As the model can generate both text and images, it can first prepare an output plan, generate the output, and then review it, all in the same sequence of text and image tokens:

This is the superpower that enables these models to truly “adhere” to the user’s request and ensure that the generated text is semantically valid. The king in text changes every week, but what’s clear by now is that Google is the king for all the other modalities (at the very least image and video), and it’s been a while already.

Closing Thoughts

Another week packed with announcements, particularly the flashy releases of Gemini 3 Pro and Nano Banana Pro, as well as OpenAI’s top coding model.

This week has also renewed faith in the AI market, thanks to another stellar show by NVIDIA. However, we must not ignore the growing concerns hidden beneath the eye-catching headlines; the fate of the industry rests on the shoulders of a company with decreasing gross margins, extreme customer concentration, and being weaponized by the US in its “war” with China. So far, it’s been a solid foundation, but it’s not as secure as one would think.

And if NVIDIA catches a cold, the public markets (and the US economy) will get pneumonia.

Finally, trying to end on a positive note, the recent plethora of model releases shows that progress is still intense; AI models are unequivocally getting better. But markets reminded all of us these past few weeks that patience is decreasing, and they want results, aka revenues, fast.

For business inquiries, reach me out at [email protected]

Keep Reading

No posts found