THEWHITEBOX
Welcome to the Acceleration

Things are going faster than I expected. It’s not even three months since the prevailing view was that AI-generated code was “slop,” and we’re now seeing the narrative shift completely. It turns out, the researchers who were warning us and were laughed at simply had access to models we did not.

But now we have access to them, and they were indeed not exaggerating. In the past week, I’ve built two entirely new apps and made insane progress on a third one I’ve been working on for quite some time, with literally zero lines of code written by me. All AI.

Building software is now more about knowing what you want and how to convey it than about knowing how to do it, to the point seasoned developers are proclaiming “most productive day ever” one day after the previous.

The era of declarative software that I’ve been promoting for more than two years is now upon us. I can’t hide my excitement, but we need to be realistic too.

This article covers what AI is today and what it isn’t, my personal guide to vibe-coding apps, and where all this takes us.

Let’s dive in.

Why The Acceleration is Here

Ever since OpenAI’s GPT-5.2 Codex and especially Anthropic’s Claude Opus 4.5 dropped in the last innings of last year, the speed at which you can build software has skyrocketed.

It Just Works.

It’s not that models are faster; it’s that they simply work. If you write the correct prompt, it’s very likely they'll get you something that is at least 70% of the way there. In some cases, if the project is simple enough, you can get the models to one-shot it.

The key seems to be that AI Labs have finally cracked the code of long-horizon execution; the models can now work for several minutes, sometimes hours (see diagram below), and still execute decisively.

If you demand 80% success rate, the longest task falls to ~1 hour, still impressive though.

This diagram shows the longest task LLMs can execute a given percentage of the time, measured in human hours (how long it would take an expert human to do the task).

Importantly, prompting is no longer about holding the model’s hand all the way; it’s about getting the one prompt you are going to send them right. You are no longer defining constraints, methodologies, or best practices; you are defining goals.

Prompts are now works of art in a way; they take hours to write because they force you to really “see” what you want to build in advance. They are iterative, and funnily enough, done well, it is a part of the process that takes longer than the code writing itself.

In fact, one way to know you’re doing your job is that, from start to finish, the prompt-building part took probably more than 75% of the total building time, with the remaining 25% being the AI actually writing code, and your UAT testing.

To prove the point, the two images below show a prompt I wrote a few weeks ago for Claude Code (I’ve since moved on to another tool, more on that later).

How long is it? 1837 words. A pain to write, but boy did Claude Code execute the refactoring well.

As we’ll learn later, I nowadays approach prompting differently to make it more manageable, but the point still stands: prompting is now very, very important, and being good at it will take time and requires effort.

Critically, this is not human substitution; it’s acceleration.

You are going to see (and will have seen) a lot of gibberish around AIs taking over the world. No.

If anything, these AIs are more tools than they’ve ever been; they are extremely independent while, in a weird way, also extremely dependent on your guidance. They are great executioners, but desperate for your command.

In a nutshell, AIs are finally tools that really empower you to do something about the ideas you’ve been rambling on for some time. It’s a technology that rewards action like nothing we’ve ever seen.

But it isn’t magic.

What it isn’t

As incredible as it may sound, I’m still not convinced at all that these models are actually intelligent. Or let me put it another way, whatever “intelligence” these things have, it’s not human.

Which is to say, if we frame intelligence in the human way, these are still next-word predictors. They are flawless in some regards, still stupid as a rock in others.

They are the ‘cognition imitation’ equivalent of calculators, capable of doing things humans could never, executing complex tasks at insane speeds, while also being just that, next-word predictors in the same way a calculator can compute flawlessly while still remaining a piece of plastic that won’t drive your kids to school or basically anything that isn’t computing calculations.

But what if we are simply developing a new type of intelligence?

One that is ‘statistical’, meaning it’s not bound by computational constraints like humans are—we can’t just try everything 1,000 times—and gets to the same outcome, but just more inefficiently? But I digress; I haven’t thought about this remotely enough, and this is not the topic of today, so I’ll leave it here as a thought for you to entertain.

As I mentioned earlier, these AIs are quite needy. They are designed to help you, quite desperately, actually. That doesn’t mean things can’t go very wrong, to the point the agent can actually conspire against you or act recklessly, as we saw last Thursday in our Premium news rundown, but that’s due to reward hacking and poor prompting on your behalf.

But is this Artificial General Intelligence, or AGI? Hell no.

Let me make it very clear, this isn’t a sudden leap in intelligence that we are seeing; it’s a sudden leap in utility, the perception that, suddenly, these AIs are truly empowering (if you want them to, which is another story we’ll discuss later).

But enough pontification. What can we actually do with these models?

My Vibe-Coding Guide

Now I’m going to explain to you how I build apps in hours.

It’s a process that approaches building apps like a walk in the park (as you’re about to see, I mean this in the literal sense), uses several modalities (voice, text, images, and ultimately code, although I do zero coding), several different products, and several providers.

Simplified much, the future of coding is all but coding.

But before we discuss that, let me get one thing out of the way. The best coding model in the world right now is not Claude, it’s GPT-5.3 Codex. The best agent model in the world right now is not Claude, it’s GPT-5.3 Codex.

Choose your AI wisely

Claude Code (the agent-harness product Anthropic uses across its Opus 4.6 models) is very powerful but acts like a drunken sailor. Capable, yet sloppy and deceiving.

It still reward-hacks way too much (e.g., to meet its goal, if it can pass tests, it may simply delete the tests and count itself as passed), to the point I just don’t trust it.

People swear by its name, but I don’t. Also, I’m seeing more and more expert developers proclaiming the OpenAI ‘sorpasso’ (Claude has been the best coding AI for most of the last year or so), so I feel quite confident about the claim.

But why? If you’re a regular, you probably know why.

Today, nothing matters more than compute. Particularly, the biggest edge today, what sets apart models and product offerings the most, is inference-time compute. That is, the longer a model thinks on a task, the better the outcomes.

And in this regard, OpenAI is in a league of its own. No other models think as much as OpenAI’s; they just don’t.

And before you accuse me of bias, many of you know I’m a huge Google bull (and have a very decent part of my wealth in that stock), but emotions can’t get in the way of facts; OpenAI models have much larger inference-time compute limits and thus are better for complex tasks, period.

I even upgraded to both Gemini Ultra and ChatGPT Pro and have only remained on the latter (although we’ll be using Gemini for another thing as we’ll see later). I love Google and Gemini and still believe they are the most likely winners, but I need to call balls and strikes.

What type of advisor would I be otherwise?

But another point I’ve made multiple times is the stupidity of declaring there’s such a thing as a “best model.” Each model has its strengths and weaknesses.

I do recommend sticking to ChatGPT’s Thinking models for the hardest stuff, but they are also painfully slow compared to Claude Opus—GPT-5.3 is snappier, though—but the fact that they think for longer incurs a latency penalty, so Opus can be a better option in other cases.

Underlying data distributions matter a lot, too, so there could be cases where Grok is your best option, or Gemini 3 Flash if you want to bridge quality and cost.

The only clear thing in AI’s future is that it’s multi-model.

Iterative Ideation.

My vibe-coding guide has five big steps: iterative prompt ideation, prompt polishing, iterative UI design, execution, and testing, involving several different tools, modalities, products, and processes.

The “bad thing” of this new prompting paradigm is that it requires a lot of work from us. Human participation is minimal in the “doing” phase, but it’s more crucial than ever in the “thinking and declaring” phase.

I have a lot of writing muscle in me, so I’m kind of used to writing a lot when I need to, but most of you have probably reacted in horror to seeing a 2,000-word prompt. But guess what, there’s a solution: just because it’s us, humans who declare, that doesn’t mean we have to do it alone.

So, to start off, once I decide I want to build ‘x’, the first thing I do is take a walk with my dog. Literally.

But, for this particular walk, my dog and I aren’t alone.

logo

Subscribe to Full Premium package to read the rest.

Become a paying subscriber of Full Premium package to get access to this post and other subscriber-only content.

Upgrade

A subscription gets you:

  • NO ADS
  • An additional insights email on Tuesdays
  • Gain access to TheWhiteBox's knowledge base to access four times more content than the free version on markets, cutting-edge research, company deep dives, AI engineering tips, & more

Keep Reading