- TheWhiteBox by Nacho de Gregorio
- Posts
- AI Hackers, Weird Watch Phenomena, & More
AI Hackers, Weird Watch Phenomena, & More


THEWHITEBOX
TLDR;
This week, we have a lengthy list of important news worth covering. From Google’s landmark release of AlphaGenome, an AI super hacker, and the rise of model merging, to Replit’s insane revenue growth and Zuckerberg’s shopping spree at OpenAI.
Finally, we also examine new products like Gemini CLI, the unusual behavior of AI models with watches, and how video tools can revive memories.
Enjoy!
THEWHITEBOX
Things You’ve Missed By Not Being Premium
On Tuesday, we explored amazing research and models emerging from Japan and China, Google’s new robotics model, and a planetary-scale inference stack.
We then moved to discuss Thinking Machines’ strategy, Anthropic’s court win, and new products from ElevenLabs or Google, among other important news.


HEALTHCARE
Google Open-Sources AlphaGenome

Google has released another DNA-based model, called AlphaGenome.
The system ingests stretches of DNA up to one million base pairs long (meaning the model’s token granularity goes up to the nucleotide level, with every three consecutive nucleotides giving you an amino acid, which are the basic construction units of proteins, essential for life).
It simultaneously predicts thousands of regulatory properties, such as gene expression levels, chromatin accessibility, 3D contacts, splice junction usage, and more, at single-base resolution.
The advance matters because most disease-associated variants lie in the 98 % of “non-coding” DNA whose function has remained opaque. For reference, Nature’s news coverage hails it as a way to probe the genome’s “dark matter”, likening the leap to what AlphaFold did for protein structure. And let’s not forget that work led to Nobel Prizes.
TheWhiteBox’s takeaway:
Before we answer what the primary goal of AlphaGenome is, let’s first understand how it works because, surprisingly, the similarities with ChatGPT-type models are uncanny.
In fact, they are, in essence, the ‘same’ model just applied differently, or at least both are based on the same algorithmic principles.
AlphaGenome, like ChatGPT, is a sequence-to-sequence model that maps a sequence of input tokens to a sequence of output tokens. Here, tokens are the base-pairs in a DNA genome, while in the case of ChatGPT, the tokens are words/subwords.
Thus, tokens are the simplest unit of semantic information that the sequence carries.
Just like words carry a ‘meaning’, amino acid base-pairs do too. But just like words share information with each other to build the sentence’s meaning, base-pairs do the same. Put another way, the order and presence of these tokens affect other tokens, just like ‘red’ in ‘the red carpet’ affects the meaning of ‘carpet’ (makes it red).
Hence, what AlphaGenome does (in the same way as ChatGPT) is to make these base-pairs talk to each other, as to find patterns such as ‘when this base-pair follows this base-pair, this happens.’
In the case of AlphaGenome, ‘this happens’ could refer to illnesses and other aspects we are very interested in understanding.
This is why Google describes the model as a ‘sequence-to-function’ because by understanding the properties and dependencies of the genome sequence, it maps them to biological functions.
There’s a reason AlphaFold gave the AI industry the first-ever Nobel in Chemistry (to Demis Hassabis, Google DeepMind’s CEO). Now, AlphaGenome tries to deepen our understanding of a new area of biology and, with that, opens the door to curing many illnesses. And who knows what else.
CYBERSECURITY
XBOW’s AI Surpassed All Humans in HackerOne

Another day, another ‘job’ where AI is kind of better than us already.
XBOW, a cybersecurity company, has announced that its autonomous hacker AI bot has surpassed all humans on the HackerOne platform as the number one hacker, measured by the number of identified vulnerabilities in large company software.
As shown in the graph above, the model’s reputation on the platform has skyrocketed to the top in just nine months.
The tool, a fully autonomous AI penetration tester, can identify zero-day vulnerabilities at an unprecedented scale. The platform also includes verifiers, which can be other Large Language Models (LLMs) or programmatic engines (automatic verification) to verify the authenticity of the identified vulnerabilities.
AI will create bugs through clumsy code writing by novice coders, but that doesn’t mean other AIs can’t correct them.
TheWhiteBox’s takeaway:
Cybersecurity revolves a lot around the idea of pattern matching; when a software deviates from what’s expected (doesn’t follow certain practices), it’s likely to be a bug, or worse. Thus, it’s not surprising that AIs are great at it.
Are pentesters the new software engineers, in the sense that they will be rapidly replaced?
As with all jobs, the answer is always the same: it depends. The best hackers will have a job, because some bugs or cybersecurity threats will be too out of distribution for an AI to identify them (too different).
But for those issues that are common and seen by an AI, there’s no reason to believe AI won’t identify them. And faster than us.
Thus, these AIs will soon displace mediocre hackers because they’ll have similar or worse performance, while crucially, AI costs will continue to decline.
But this becomes a more economic question at this point. Why will AI costs decrease?
Let’s not forget that AI is built on top of two things:
Models
Energy
Models are rapidly commoditizing, and you know what happens to commodities: price is all that matters.
That’s the reason why OpenAI decreases o3’s cost by 80% from one day to the next, and why ChatGPT’s first model, GPT-3.5, was hundreds of times more expensive than top models today despite being much worse.
Consequently, the price per token, the primary model economic metric, will continue to decline.
As for the latter, energy, not only will it continue to decrease in price due to AI pressures, but it’s also suffering from intense deflation pressures because it’s one of the main drivers of inflation, and humans suffer as a result.
Therefore, the world is pointing to dropping token and watt prices to almost zero, and that also means AI prices dropping to practically zero, which is why AI will have a huge deflationary effect across many industries.
So, whether you retain your job in an AI world won’t be a business case question; that’s a lost cause. It will only depend on whether the AI can do the job. Because if it can (and that’s saying something, because many jobs have edge cases no AI can dream of right now), unless anti-AI regulation saves you, you’re doomed.
FRONTIER RESEARCH
The Rise of Model Merging

TML’s CEO and Co-Founder: Mira Murati
Two weeks ago, while discussing the most exciting research, I mentioned model merging (I also explored other fascinating approaches) as one of the techniques gaining traction among labs, from Sakana to Cohere.
Now, it seems that this will be one of the crucial techniques used by Thinking Machines Labs, the hot startup founded by mostly OpenAI star researchers, to bring their vision to fruition.
Model merging is essentially a form of combining parts of different models to build a new, more powerful, and general model —a synergistic approach where the sum of all parts is greater than the sum of the individual parts (for more details on this, click the previous link).
I guess I’ve earned some bragging rights for the prescience and timing of my publication.
Continuing my round of self-praise, as I previously discussed, TML also appears to be betting on customized Reinforcement Learning as another of its bread-and-butter techniques.
Essentially, it involves applying RL (trial-and-error learning) to specific areas of interest, such as an industry or a single business metric. This is the primary method behind some of AI’s most successful use cases, like Deep Research, but applied on a more business-centric plethora of use cases.
Combined, they could mark the advent of a totally new approach to AI, one that rejects the idea of a ‘God AI’ or a single model for all, and instead envisions progress as a cumulative effort between individualized models.


VIBECODING
Replit ARR Increases 10x in 6 Months

Amjad Mosad, Replit’s Founder and CEO
Replit has announced that its annual recurring revenue (ARR) has increased to $100 million in just six months, up from $10 million at the end of 2024.
Awe-inspiring growth is driven by the Replit Agent, which we discussed nine months ago when it was released.
This vibe-coding agent enables you to build entire apps using text or your voice, without coding knowledge required. Now, Replit joins the select group of AI companies displaying some of the fastest revenue growth in history.
TheWhiteBox’s takeaway:
Surprisingly, I’m bearish.
I stand by what I have been saying lately: these AI revenues will be short-lived. Vibe-coding is clearly in demand, which makes it a desirable market for model-layer companies like OpenAI, Anthropic, or Google, which will happily compete with these much smaller startups.
And when push comes to shove, I don’t believe Replit or any other vibe-coding company will manage to compete with these guys. Therefore, Replit appears to be the type of company that will either get acquired or outcompeted.
And with the massive adoption that background agents (software agents running in the background) are experiencing, I don’t see a world where these guys don’t eventually deploy their vibe-coding platforms and eat the Replits and Lovables of the world for breakfast.
For instance, incumbents already dominate the background agent market, and it’s a matter of time they move into the market of no-code app-building.

AI WARS
Zuck’s OpenAI Shopping Spree
According to the Wall Street Journal, Meta has successfully poached three top OpenAI researchers, including Lucas Beyer, all three from OpenAI’s Zurich offices.
All three are widely renowned computer vision researchers, behind some of the industry’s most important papers and models, such as PaliGemma.
Since Llama 4’s flop back in April, Zuck has gone on a massive buying spree, acquiring Scale AI and attempting to poach researchers with offers that, in the words of Sam Altman, may have reached $100 million as a signing bonus and total annual compensation above that.
AI researchers are now quite literally movie-star-caliber workers.
TheWhiteBox’s takeaway:
Again, another great example of bubbly behavior from the AI industry.
But if you’re really AI-pilled and believe AI is the defining factor of your success as a company, hiring key talent that can deliver on that promise for a fraction of a fraction of your yearly cash flow is a logical thing to do.


RESEARCH AND PRODUCT
The Weird Things Watches Say About Models
Recently, I came across an interview that stated image-generation models would always draw watches showing 10:10 as the time and that they were incapable of displaying other times.
I was surprised and tested it.
And, well, in my case, the situation is, indeed, true. As shown below, the watch is indeed not showing the time I requested and, instead, shows approximately 10:10:

I insisted on a couple more tries, to no avail (others have managed to generate the correct times, but it requires prompt engineering when it shouldn’t). I then tried with Grok’s image generation, simply asking to draw a watch.
And guess what time it showed:

And you may wonder, why?
TheWhiteBox’s takeaway:
To me, this is a neat way to illustrate how AI models, at their core, struggle severely to generate new content from their training data.
The ‘obsession’ these models have with generating that particular time is not a real ‘obsession’, it’s just a training diversity issue; most watch photos available, done by companies publishing them, consistently show that time because it’s the most aesthetic time to show. Therefore, most images in the model’s training set showing watches display that particular time.

This is what Google showed me after searching for ‘watch’. All 10:10 except well, you know.
And as we have mentioned several times, AI models are highly constrained by their training data because they were trained through imitation.
If you train a model to imitate a training set, wouldn’t you expect that the model, when asked to generate data, would generate similar things to what it saw during training?
The takeaway here is that this is a visual way observing what to me is AI’s most significant limitation and would-be most outstanding achievement; going out-of-distribution (OOD), or in this case, if their training data shows 10:10 watches all the time, understanding what a watch is, what the pass of time is, and thus learn that watches display other times besides that one.
OOD typically requires something truly different from what the model has seen. But if the model is really overwhelmed with 10:10 watches, showing other times would count as OOD for me because it’s really something very different to anything it has seen before, as incredible that may sound.
Put another way, to current models, a watch isn’t anything but an object that shows 10:10 as time (which means it doesn’t really know what they represent), even though they can trick you into believing they understand watches by telling you every single fact about them.
Fine, but when it comes to proving they know what they claim to know, you only see a model that has not internalized what watches really are.
VIDEO GENERATION
VibemotionAI Looks Very Good
A new AI startup has emerged from stealth mode with Vibemotion, a tool that creates videos from assets, such as those used to explain a research paper. The link includes a video demonstration, and you can sign up for the waitlist.
I’m not sponsored or anything, I’m just showing it because it’s something I would consider using.
TheWhiteBox’s takeaway:
I’m not particularly interested in the resulting product (an explanation video), but the fact that the model generates impressive visuals, similar to what Napkin AI offers—a text-to-visual tool—means the product does solve an actual problem.
CODING
Google Launches Gemini CLI

Google was late to the party, but arrived.
Just like Anthropic and OpenAI did earlier, it has launched a CLI coding agent (one you can interact with via the command line of your computer).
This form factor has become one of the most important revenue generators for these companies, and now Google is offering an extremely ambitious free plan for the tool, with up to 1,000 free daily requests, which is absolutely insane; the tool is basically free.
TheWhiteBox’s takeaway:
I’ve been testing the tool for the last couple of days, and it’s really, really good. For starters, it’s much faster than Claude Code and, importantly, basically free unless you’re a heavy user.
Google appears to be poised to launch a tough battle against its competitors by engaging in a price war, one that OpenAI may struggle to compete in, and Anthropic likely can’t compete in.
We often tend to assume OpenAI is Anthropic’s biggest enemy, but it’s not; it’s Google, which could render its entire business worthless (OpenAI has too much distribution and brand to be eliminated at this point).
At the current pace, AI could soon have us all living in a Google world.
VIDEO GENERATION
MidJourney Surprises For Good

MidJourney’s first video model has been officially benchmarked, and the results are pretty positive, listing the model at fifth overall, only behind the best Chinese models and Google’s Veo3.
It is capable of doing some extraordinary things that I suggest you try; it has become very proficient at creating these videos. In the GIF above, Alexis Ohanian, Reddit founder, gave MidJourney a photo of him and her late mother, and the result is pretty touching, I have to say.
It’s also a very clear ‘false memory’ meaning it’s an animation, not a reenactment of the past. Please be very careful about what you make of these models, please.
That said, if you have some cherished memories saved as photographs, you might want to consider giving this model a try.
In a similar fashion, Higgsfield has also released an image-to-video model that looks very good, too.
TheWhiteBox’s takeaway:
In all honesty, I couldn’t care less about the category; I’m not a video-generation guy. Besides, if you examine the generations, they all feel like hyper-realistic fantasies; something just doesn’t feel right.
With that said, we won’t need much better models to pose a serious threat to Hollywood, as these issues are less and less visible; my criticism is more around this notion that these models understand the world.
False, they don’t.
They do imitate the world fairly well, but clearly don’t understand cause and effect, simply because they generate outright impossible moments.
But you may ask: do we understand cause and effect?
For the most part, yes, our world model guarantees that, but it’s safe to say that we can’t fully predict every movement in every muscle as a horse sprints. But here’s the thing, we don’t need to; we just need to predict what will happen (the horse will continue moving forward).
And this is why I feel more and more compelled by Meta’s view that world models could never be generative (meaning video-generation models will never understand the world), because that forces them to understand every single detail, including the irrelevant ones (like knowing the exact angle of the horse’s calf muscle, to predict what the horse will do next.)
I apologize for the long rant, but here’s the takeaway: video-generation models will be helpful in films and videogames, but I don’t believe they hold the key to world models (which is the real objective of these labs with these models).

THEWHITEBOX
Join Premium Today!
If you like this content, by joining Premium, you will receive four times as much content weekly without saturating your inbox. You will even be able to ask the questions you need answers to.
Until next time!
Give a Rating to Today's Newsletter |
For business inquiries, reach me out at [email protected]