- TheWhiteBox by Nacho de Gregorio
- Posts
- The Second 'ChatGPT Moment' is Here
The Second 'ChatGPT Moment' is Here


FUTURE
The Second ‘ChatGPT Moment’ is Here

While I’m not their biggest fan, OpenAI has just made history.
They have presented GPT-4o’s native image generation, allowing their model to express itself not only with words, but also with images (this is not what it was doing earlier, as you’ll learn today).
The lives of millions of people, including not only artists or designers but many other jobs, have changed forever. And most of them aren’t even aware yet.
Today, we’ll explore the following:
A variety of examples that illustrate industry-leading these results are,
Why this is a pivotal moment for AI multimodality, double-clicking on their revolutionary approach (OpenAI has innovated significantly, and this time is notably more transparent about it),
Why this represents an extinction event for many AI startups, even billion-dollar ones.
How will it impact the lives of millions, and what can you do if you are one of them?
In the future, we will look back on this week the same way (or more) than we did on ChatGPT’s launch in 2022, as a seminal moment for technology.
Here’s why.
GPT-4o Native Image Generation
Okay, hold on. What has actually happened?
In short, the headline is that GPT-4o can now generate images natively. The keyword here is ‘natively’ because what ChatGPT was doing earlier is not what it’s doing now.
For the first time, AIs can express themselves using images. But what does that even mean?
As they say, one image is worth a thousand words, so I’ll let the images do the talking. The first powerful capability to note is that the model can transform the style of any image into the one you request.
For instance, it can transform a pixelated character into any style you request while respecting every minor spatial detail, such as pose, colors, and even face gestures:
But if that’s impressive to you, we are just getting started. It can also adapt visual templates to your product or theme. Below, the model takes in a cosmetics ad and transforms it into a bakery one.
In just one prompt, an advertisement for your bakery business. Of course, you may wonder:
The user must be trading speed and ease of use for control, right? Well, not quite, as you can send the model clear spatial instructions, to which it thoroughly complies:

I mean, your ad campaign image is really one prompt away:

But what if you want to crop objects out of an image? No problem, you’re just a prompt away from that too:

But what if you want to bring your drawings to life? ChatGPT got you here, too.

Impressive right? But maybe you want the opposite, see yourself as a Cars film character (notice how the car’s gesture is eerily similar to the guy’s gesture on the right, just as the user asked):

But we can go even crazier, people are literally creating films:
A Severance scene, according to several different animation styles
The Lord of the Rings in Ghibli style
These films are created using GPT-4o as the frame generator, and using tools like Kling or Hedra to animate it into films.
And this is just the tip of the iceberg of what people are doing. Fascinatingly, GPT-4o image generation can also take HTML/CSS code as input and generate the resulting UI:

In other words, you can simply ask for a functional user interface, and the model will generate a UI image that strictly adheres to your exact code.
Do you realize how impressive that is?
Furthermore, the model is also great at generating text on images, a tough problem that has bamboozled researchers for years:

And on and on.
I mean, I could continue for the remainder of this newsletter, but I think you get the point:
GPT-4o image generation has just elevated what can be done with AI by several orders of magnitude.
If ChatGPT democratised access to AI assistants, but offered no meaningful change to our lives beyond interesting conversations, GPT-4o image generation is actually life-changing for many jobs and industries.
Now, anyone can create impressive images, ads, films, outpaints, you name it, just using natural language prompts.
But wait.
If you’re into AI, you know that image generation has existed for years. Before, it was simply a fun tool, and a professional one only if you were a professional at image/video editing, giving you okay-ish generations such as the one below:

Back then, tools like Photoshop or Figma were mandatory to do anything remotely valuable. But now, with GPT-4o image generation, that same prompt will give you this:

What has OpenAI done to make such a tectonic shift?
Simply put, as legendary companies like Bell Labs once did, they did not improve something existing; they redefined the technology and what can be done with it.
What this “Thing” Really Is
For once, OpenAI has shared information on what they did to achieve such industry-defining results. And they did so in the most OpenAI way possible, with an image generated by the new GPT-4o:
So much information in a single image. Most of it will sound like gibberish to you, but let me translate to you what this image says:
“Everyone was doing images one way. Well, they are wrong, and we are about to turn this industry upside down and, in the process, killing all image generation companies that aren’t us, Google, xAI, or Anthropic.“
Sounds bold, but that’s precisely what that single image implies. Here’s why.

Subscribe to Full Premium package to read the rest.
Become a paying subscriber of Full Premium package to get access to this post and other subscriber-only content.
Already a paying subscriber? Sign In.
A subscription gets you:
- • NO ADS
- • An additional insights email on Tuesdays
- • Gain access to TheWhiteBox's knowledge base to access four times more content than the free version on markets, cutting-edge research, company deep dives, AI engineering tips, & more