- TheWhiteBox by Nacho de Gregorio
- Posts
- AI's Gold Medals, Layoffs, & More
AI's Gold Medals, Layoffs, & More


THEWHITEBOX
TLDR;
Welcome back! This week, we have OpenAI/GDM’s historic IMO gold medal that has more than meets the eye, yet another SOTA model coming from China, and Decart’s unique proposal for real-time video editing.
We also discuss Meta’s AI-optimized concrete and CNBC’s report on the effect of AI in layoffs.
Finally, we take a look at an AI-powered OCR tool, Perplexity Comet’s impressive agentic capabilities, and how video generation models are transforming the ad market.
Enjoy!


VIDEO GENERATION
Decart’s New LSD Model is Incredible

Several months ago, I discussed a company that was attempting to create playable video games by utilizing next-frame prediction.
Similar to how Google’s Genie model worked, it would take a concatenation of past frames and a user’s action (what to do next) and generate the following frames in consonance to past frames and desired action (i.e., if the user wanted to turn right, the model would create frames depicting the world after moving to the right).
That same company has now announced a Live-Stream Diffusion (LSD) model that takes in a video stream and edits it to your liking, as shown in the above GIF.
TheWhiteBox’s takeaway:
With the expected progression of these models, anyone with access to the Internet will soon be capable of creating amazing edits of their family and friends’ videos, marketing campaigns, and more.
But as I was saying earlier, Decart’s efforts go beyond video editing to actual video creation.
Last week, we discussed how Grok 4 has become proficient in creating video games, making the case for the video industry to be one of the most significantly impacted by AI developments.
However, one thing is for models like Grok 4 to create code that brings games alive. Another is for models like Decart’s, which take an image and create entire, playable worlds around it in real-time.
That is, while Grok 4 might write the code just like humans would, Decart’s real-time video generation models create the game frame by frame, without code, and in real-time (sub-40ms latency). Feels almost like magic, and it’s getting better.
FRONTIER MODELS
OpenAI and Google DeepMind Take Gold in IMO
Models developed by OpenAI and GDM achieved a landmark result in the 2025 International Maths Olympiad, both earning gold (although only GDM’s result was verified by the organization).
The interesting aspect here was the approach used, as both were generalist models, meaning they were not exclusively trained for this competition.
TheWhiteBox’s takeaway:
This indicates that AIs are improving at solving complex problems they have not encountered before, but this claim is slightly nuanced. While there’s a lot to celebrate here, let’s not forget that:
Humans still got better results
We can’t guarantee whether this proves AIs are actually really intelligent or just very sophisticated pattern matchers.
Regarding the second point, it’s noteworthy to clarify what I mean. With Large Language Models (LLMs), it’s always a question of whether they have recalled the solution or actually created it from scratch.
In fact, Google acknowledged the heavy use of domain-specific training, and prompt engineering scaffolding including tips and hints, pretty much confirming my suspicions.
Let me explain. In this case, problems were somewhat novel, so there was at least some level of novelty. However, these models have seen, for lack of a better term, “everything,” and thus, it’s highly likely that they have seen similar problems in the past, at least all the examples from past IMO exams, so there’s a strong chance the models are still relying heavily on memorization.
If you’re a purist like me, this is not really undeniable proof of intelligence elicited by the AIs, but a very sophisticated composition of new solutions based on a plethora of previous examples part of the model’s knowledge.
With that said, we must not discredit the achievement, which is highly impressive. In fact, the speed at which we are progressing is dizzying. As this tweet shows, in 2021, we expected AIs not to take gold in IMO until 2043. In 2022, that number dropped to 2029, and last year the prediction was 2026.
Well, it’s 2025, and we have already done it, which, philosophical conversations about intelligence aside, proves that, if you have the data, the AI most likely can solve it.
NON-REASONING MODELS
New Qwen 3 Takes Gold, Too

Alibaba has released a new version of its flagship Qwen3-235B-A22B (meaning the model has 235 billion parameters of which 22 billion activate per prediction), and it’s quite possibly the best non-reasoning model on the planet (yes, not a week after I said the same thing about Kimi K2).
However, the truth is that it outperforms not only Kimi K2, but also DeepSeek V3 and even Claude 4 Opus—Opus is a hybrid model that alternates between reasoning and non-reasoning; here, they are comparing it with the non-reasoning version.
TheWhiteBox’s takeaway:
Notice the pattern? All top open-source models are Chinese, not a single one isn’t.
I’ve already ranted about this several times, so I’ll spare you from another one. But it’s very concerning that the US is so willingly surrendering the open-source battle without a fight.
It’s also interesting to note that they have abandoned the hybrid approach, as seen in models such as Claude 4 or Gemini 2.5, releasing a non-reasoning model and, soon, a reasoning-specific model. It’s unclear why they decided to do this.
The model supports tool calling and MCP servers, so it’s a particularly strong agent (just like Kimi K2 is too). China is killing it.
FRONTIER MODELS
New Humiliating Benchmark for AI

EpochAI has released a new version of its FrontierMath benchmark, featuring several new, challenging math problems curated and vetted by some of the world's most accomplished mathematicians.
The results are poor, but not terrible, as models like o4-mini or Grok-4 (not shown) score double digits but below 20%.
Additionally, as noted by the research lab, some of the problems that were solved involved using correct yet unjustified assumptions, indicating a significant room for improvement.
TheWhiteBox’s takeaway:
I want to emphasize that in these types of tests, the process—how the models solve the problems—matters more than the solution.
Suppose a model is fed the solutions or is exposed to very similar issues. In that case, the fact that these models receive top marks is irrelevant (in an effort to prove intelligence) because they still rely on memorization. The previous story about the IMO gold medals perfectly illustrates this.
To test real intelligence, a model should be capable of saturating the benchmark without any data contamination or haven’t been trained exhaustively on similar problems (again, like the IMO gold models).
However, I want to highlight a key positive aspect of these benchmarks. If models solve them, it means that, even if researchers have exposed the model to similar problems to achieve this, in the end, we are still improving models; we are still seeing AI models solve extremely challenging problems.
Put another way, we must not confuse the interpretations of the result with the fact that the result is achieved. Having models that can solve research-level maths problems is already a victory because it shows progress.
Therefore, it’s key that benchmarks are focused on solving tasks that have economic value, to ensure labs don’t lose time saturating benchmarks with tasks no one cares about.


DATA CENTERS
Meta’s AI-Optimized concrete
Meta has partnered with Amrize, along with the University of Illinois Urbana-Champaign and general contractor Mortenson, to develop an AI-optimized, low‑carbon concrete mix for its Rosemount, Minnesota data center.
Utilizing Meta’s open-source Bayesian optimization tools (BoTorch and Ax), the model designs concrete formulations that balance short-term and long-term strengths, achieve faster curing times, and reduce the carbon footprint.
The customized ECOPact mix achieved a 35% reduction in embodied carbon while meeting stringent performance criteria, including compressive strength, set time, workability, and surface finish. Successful slab-on-grade tests enabled its deployment in an operational section of the data center.
Interestingly, Meta has made the AI model and underlying data open-source, available via GitHub.
TheWhiteBox’s takeaway:
One of the most interesting news stories of the week sheds light on the fact that AI is more than just LLMs and how astute Big AI (my way of describing Big Tech) is with its AI efforts.
AI is extensively used in their own supply chains, especially when it comes to the trickiest part of them all, the data centers. Besides optimizing concrete, Google has been using Reinforcement Learning models to manage its data centers for years, to the point that an entire company, Phaidra, spun off from these efforts.
In any situation in which you have a bunch of data you want to make predictions on, AI is most likely to have a say.
JOB MARKET
AI and Layoffs. A Death Duo?
An article by CNBC expresses skepticism about the recent layoffs many companies are doing, which could signal that AI is playing a significant role in those layoffs, especially when looking at where layoffs are occurring in most of these companies: HR or customer support, areas where AI’s benefits are clear.
TheWhiteBox’s takeaway:
While I don’t doubt AI’s effect on layoffs, I believe they are highly overblown. But the reason is that executives are overestimating AI’s current real capabilities for disruption. As I mentioned on Sunday, AI agents are already ‘impressive’ enough for ‘impressionable’ executives to see this as a turning point that AI is here.
The reality is quite different, as AI agents still struggle with long-horizon tasks, the type of tasks that are truly required to be automated in enterprises.
That said, I understand that enterprises have to explore these tools extensively and start envisioning a future where most of their internal tools and jobs have ‘logic’ outsourced to AI.
Put simply, the more of your internal workings you outsource to AI at some point, the more operationally efficient you’ll be because AI costs trend downward and will eventually drop to basically zero.
Enterprises’ most significant opportunity is not adopting AI to offer yet another AI product or service; it’s using AI to trim costs and thus being more competitive (by dropping prices). The more I examine this, the more I favor the idea that AI will be highly deflationary across all industries.
AI will not only drive higher productivity but also lower prices, which is huge for the world economy that is severely in debt, by acting as a cooling mechanism that allows for widespread spending cuts.
I am very aware that inflation is necessary to reduce the costs of servicing the debt, so this isn’t entirely uniform. But I believe most Western countries are facing years of painful cuts and desperately need to lift purchasing power before cuts begin.
Just to give you an example of the dire situation of some of the West’s largest economies, the Spanish government's social security liquidity piggy bank, the amount of money it has to pay entitlements and benefits in the case it runs out of money, is not measured in years or months, but in days.
In fact, in the purest sense, it’s negative. Spain’s ‘contribution gap’, the difference between the money being collected in social security taxes and the money paid in pensions and other benefits, shows a yearly deficit of an astonishing 63 billion dollars, an incredible 27% deficit.
In plain English, that means that a quarter of my dad’s pension (and soon, my mom’s) is paid with government debt. And for a country with no monetary sovereignty, this is just a death sentence waiting for us Spaniards.
France’s situation is far worse, which has forced it to announce a grim 51 billion dollars per year in spending cuts, and all this amid growing pressures to increase spending on defense to a quite frankly unattainable 5% of GDP (not even the US spends that much).
In a nutshell, Governments urgently need massive deflation so that they can justify the huge spending cuts they are destined to commit at some point, to offset some of the pain pensioners, public workers, and the citizens in general, will face when push comes to shove.
Will AI arrive on time? For some, maybe, for others it’s already too late. And where will ‘smart money’ go?
Simple, in three directions:
Service businesses with brand equity acquired by PE that will then be “repackaged and sold”
AI hardware and model companies, which will accrue most of the value of AI (not application-layer companies except for agent tooling startups)
AI-resistant assets like land, gold (if gold fusion doesn’t destroy gold’s value), lifestyle and leisure businesses, or AI compute.


OCR
LlamaParse OCR Works Great
Although models like GPT-4o, Gemini 2.5 Flash, and other multimodal models can extract content from images very well, they may still suffer from issues with certain layouts, tables, languages, or specific fields.
I’ve been testing LlamaParser by LlamaIndex, and I must say I’m convinced. I’m planning to integrate LlamaParser into my OCR pipelines, extracting all text, and then sending it to the LLM in text-only format, which will significantly reduce mistakes. This is not a sponsorship or anything, but a product that works, and you can test for free at this link.
TheWhiteBox’s takeaway:
I’m fairly certain this system still relies on LLMs under the hood, but with the appropriate scaffolding and, most likely, some fine-tuning.
If you’re a regular of this newsletter, it’s no secret to you that even small models, with the appropriate prompt engineering and training on them, are guaranteed to outcompete even frontier models.
AD CREATION
With Good Prompting, Video Models are Ad Generators

An AI prompter has released a series of videos showcasing some incredibly impressive Veo 3 videos that could very well be used for professional advertising. The link includes the prompt used.
TheWhiteBox’s takeaway:
Are ad agencies doomed? The answer is it depends. This looks really good, but it’s just your average IKEA-type ad.
AI models can be creative within the boundaries of their knowledge. If the GIF above sounds familiar, it is because that precise ad was already created in the past, and the AI retrieved it from its knowledge and recreated it.
AIs can’t create things they have not seen. At best, they can compose ‘known knowns’ to create new stuff, but this ‘new stuff’ is always a composition of two or more already-existing things.
If you want an AI to create something entirely new (i.e., a wholly original and innovative ad like nothing we have ever seen before), you won’t be able to.
That said, it’s safe to say that most of the new ads you see these days are copies of the same ten things ad marketers know work, so it’s a pretty unoriginal field, perfect for AI to make a substantial dent.
So, are ad agencies doomed?
Most are, but those who can create original content will be just fine (and handsomely rewarded, as everyone else recycles the same AI-generated ideas and people start to yearn for human creations).
DECLARATIVE BROWSERS
Perplexity Comet’s Impressive Agentic Capabilities
People are starting to share interesting examples of what you can do with Perplexity’s Comet browser, the allegedly first agentic browser.
The underlying model appears to be capable of interacting with the websites on your screen, performing several actions based on your commands.
TheWhiteBox’s takeaway:
We’ve been discussing the declarative paradigm (where software focuses on declaring things rather than doing them by leveraging AI) for quite some time, and we are finally starting to see products emerge from this principle.
ChatGPT Agent, Perplexity Comet, Manus, Genspark, the number of agentic tools is growing fast. Still, all of these products suffer from the same issue: they only work well with short-range tasks, where productivity increases (the amount of time saved) aren’t substantial, if at all.
However, I trust these tools will increase their length horizon, and are clearly the predecessors to real agents that can handle entire jobs; they simply aren’t ready yet.
Closing Thoughts
What a week of “historical achievements”!
Everywhere you look, AI models are getting better, more diverse (concrete optimizers), and even gold medalists!
But if there’s one thing I want you to take away from all this is that, despite the incredible results some AIs are obtaining in humanity’s hardest exams, the same limitations remain: most of AI’s impressive capabilities are not because we are making AIs smarter, but we are making them appear smarter thanks to higher quality training data.
Simply put, you can make a teenager appear much brighter if you ask them to memorize PhD-level problems instead of those expected of their age, but that doesn’t mean that the kid is now more intelligent; we have just made them memorize ‘smarter data’.
Appearances matter a lot in our world!
Covet that thought, for this industry is going to try and fool you not once, not twice, but as many times as it can to make you see AI not for what it really is, but for what it appears to be.


Give a Rating to Today's Newsletter |
For business inquiries, reach me out at [email protected]