- TheWhiteBox by Nacho de Gregorio
- Posts
- You're Overspending in AI. Let's fix That
You're Overspending in AI. Let's fix That

For business inquiries, reach me out at [email protected]
PREMIUM CONTENT
Things You Have Missed…
Last Friday, on our Premium news rundown, we reflected on Sam Altman’s latest mysterious comment, analyzed the release of the new Gemini model, and what Morgan Stanley’s AI success tells us about the future of software.
We also discussed Amazon’s ambitious robotics plans, OpenAI’s privacy concerns, and Meta’s substantial power contract, among other news.

TREND OF THE WEEK
You Might Be Considerably Overspending on AI. Let’s Fix That

You are overpaying for AI. Period.
And the reason is none other than your ChatGPT subscription. Not only do they force you to a single provider, which is already severely suboptimal because there’s no such thing as saying, “OpenAI is the best option for everything,” but you could be getting better outcomes at five times less of a price.
Or more.
You might say, well, it’s just $20 bucks, but what if I tell you that you could be spending $3 or $5 at most per month if you optimize your stack while actually getting better results? That’s hundreds of dollars in yearly savings per user, or even thousands if you’re currently paying for several subscriptions (ChatGPT, Gemini, Grok), all while also giving models the capacity to take action.
And let’s not open the enterprise subscription can of worms. In fact, if you don’t follow this approach, you will soon see yourself spending hundreds of dollars a month when you could be spending ten times less.
Thus, today, I present to you what I believe is the best combination of models you can leverage today, while ensuring that, combined, they cost less than your ChatGPT Plus subscription ($20/month). Furthermore, I will list the precise features that warrant paying the subscriptions for some of these products, providing you with a clear-cut algorithm of what to buy and when.
Let’s dive in.
The Goal? A Menagerie of Agents
The writing is on the wall. There will be no such thing as a “winner takes it all” event in AI, nor a model that is consistently better than all the others in every category.
It will be a menagerie.
All for one, and one for all.
Even when discussing the idea of Artificial General Intelligence, or AGI, the end goal of most AI Labs, they are starting to hint that this AGI thing, this ‘God AI’ could be a conglomerate of different specialized agents, not just one model, as posited by Greg Brockman, OpenAI’s President.
In other words, AGI could be a synergetic relationship between an AI super coder, a ‘know-it-all’ generalistic agent, a maths prodigy, and such.
The reason is that, as we’ve covered countless times, AI still suffers from a depth problem, where specialized models outcompete far smarter, generalized ones in their domain.
Disclaimer: By specialized models, I still mean foundation models, which are trained on a large amount of data and various topics, but crucially, then post-trained for a specific domain.
It has been widely proven that training a model from scratch on single-domain data is easily beaten by foundation models. So I’m not saying we shouldn’t train foundation models like the GPT or Gemini families, but these, despite being generally smarter, can be outcompeted by smaller foundation models with the right training despite being, in overall, ‘dumber.’
Using a human analogy, a woman trained in all the sciences who then gets a PhD in Physics largely outcompetes a woman trained solely in Physics, because the former learns key knowledge and intuitions from other domains that transfer well to Physics (Chemistry, Maths, etc.); the point is that the PhD in Physics woman is better at Physics than any “all-sciences” person who did not undergo specialized Physics training, even if the generalist is smarter.
Thus, it takes no genius to predict that being an AI user will mean using several models in your daily life.
And if this is true, and maximizing AI usage will be about wisely choosing your model per use case, there’s literally no chance an AI Lab controls all of these.
Nonetheless, we are already experiencing a precise categorization of use cases for each prime Lab.
Anthropic has committed to code and MCP-like agents (more on this on last week’s post and also later in this piece), with superior agentic capabilities (tool-calling et al).
OpenAI is still putting up a fight in code (to little success, though) but is the overwhelming leader in consumer-end use cases (Internet search, random questions, companionship, etc.), knowledge-based use cases, while also having the most powerful models for edge cases in o3 and o4-mini.
OpenAI’s models are probably also the most widely used for education, although this is just mere speculation on my side.
Perplexity is fully committed to search and computer agents, and it has the best search API (more on that later).
Google has a more general approach, offering the best workhorse models for day-to-day tasks, particularly in terms of performance-to-price ratio, while excelling in coding. Ah, and they are running away with it in video generation.
Black Forest Labs is focused solely on image generation and offers the best controllability by far (control over what you want the AI to create) with their Flux Kontext models.
Grok offers great performance-per-cost, as Google, but is a latecomer whose shining moment will come with “AI for engineering;” xAI has access to all Tesla, SpaceX, Neuralink, Starlink, and other Elon Musk ventures, all heavily engineering-focused.
And the Chinese open-weights models offer a path to dirt-cheap daily AI if you have the hardware for it; they are the go-to models for open-source implementations right now.
The trend is clear. Nobody will take the entire market.
Yes, as I’ve stated multiple times, I’m particularly bullish on Google and they have all to win the general AI race, but that doesn’t mean Google will win it all; with Google, I’m just making the case that investors are severely undervaluing Google’s AI business.
Divide and Conquer
What all this means for you and me is that you're undercutting your success with AI tools if you’re only using one of them. Perhaps you are paying for multiple subscriptions, making the right move performance-wise, but you are overpaying significantly for that performance.
Thus, my job is to tell you that there’s a better way, one where you have access to all models from a single platform and, crucially, only pay for what you use, with no overspending, all the while turning your AIs from mere chatbots into actual agents.
I’m literally about to explain to you how to get more from AI with less. Looks worth it, right?
Before proceeding, It’s highly recommended to first read last week’s Leaders’ post to gain a better understanding of concepts such as APIs, the ideal power-user setup, and MCP agents. Here, I’ll strictly focus on what I use each model for, but if you want to apply this yourself and actually run agents in your day-to-day work, it’s all explained in that post.
This is what an AI Power User Looks Like
The first clear message I want you to internalize is that if you want to have the three things below, meaning AIs that are:
Agents (that can execute tools)
Optimized for each use case, meaning you can switch between providers,
And cheap,
You will need to access these models through the APIs instead of the apps each provider offers.
But what is an API?
In simple terms, it’s a programmatic way to access these models, one that, in theory, requires coding expertise on your side. In other words, you can access these models without having to download and pay the subscription to the ChatGPT app, and instead embed ChatGPT models into different applications and only pay for what you use.
It’s a more effortful way of accessing the models, but one that allows you to switch providers easily (more control) while also giving you greater control over your spending.

Luckily, we can make this transition seamless and requiring zero code.
You should pay only for what you use.
The key message of this piece is that the esoteric concept of ‘API’, often reserved to savvy developers, can be easily leveraged and is key to saving a significant amount of money.
Using OpenAI as an example, you can access OpenAI’s models through the ChatGPT app by paying a fixed, $20, or $200/month subscription, or access the same models through their APIs on a pay-only-for-what-you-use basis, with full control and usage metrics.
But as I was saying, it requires coding on our side, right? Luckily, as we learned last week, we don’t need a single line of code to access the APIs.
In fact, we simply need a frontend like Codename Goose and the model’s API key (a secret, very long string of numbers that authenticates you to the provider).
Put another way, we already have available apps that connect to these APIs without requiring any technical effort on our part.
The crucial benefits of this approach are two:
Using a provider-agnostic application like Goose means you can access all models from the same application (imagine a ChatGPT app but with access to other models) and swiftly alternate between models with a few clicks. If you want to maximize performance, there’s really no better way.
This turns your AI workloads into a “pay only for what you use” pricing model, meaning that if you only make $2 worth of interactions with AI models, you are billed $2, not $20 (or $200 like ChatGPT Pro or Claude Pro, or Gemini Ultra, which is even more expensive).
And the funny thing here is that you might think that $2 worth of AI interactions is spent quickly, but that intuition is actually dead wrong. Taking Gemini 2.5 Flash in non-thinking mode as an example, that’s more than three million output tokens this model should respond to you for Google to charge you $2.
In layman’s terms, Google’s ‘Gemini 2.5 Flash non-thinking’ model, which is an absolute beast, could send you more than two million words, three times Harry Potter’s entire saga, and the charge would still be less than $2.
Yes, I can confidently say that you are overpaying for your ChatGPT Plus subscription ($20/month), let alone the Pro one ($200/month), unless you meet certain specific criteria that we’ll discuss later.
So, without further ado, let’s cover how I use AI daily.
One Interface, All Models
As I’ve explained, I now primarily use Goose as my chat interface, which I covered how to install last week.
As mentioned, it’s a desktop application that you can install on your computer, allowing you to connect to most providers, from OpenAI to Google, and access all most reasoning and non-reasoning models.

Interacting with Google’s Gemini 2.5 Flash and Perplexity search with Goose.
Crucially, it also allows you to connect these same models to MCP Servers, also covered in that post previously linked a couple of paragraphs above in great detail.
In a nutshell, MCP Servers are tools exposed to AIs in natural language, meaning the AI model can take action by making requirements in English for those tools.
So, by using Goose instead of having one more paid subscriptions, you have:
Access to all models consolidated into one single point of access,
In a usage-based billing system
And with native integration to tooling to turn the models into agents.
So, how do I optimize performance for cost?
Choosing What’s Best For the Task
Although I wrote an entire piece on how you can use AI the way I do, with all the tricks, best practices, bells, and whistles, when I say ‘I use AI,’ I mostly mean the following use cases:
Internet Search. I heavily rely on AI models to speed up searching, both for narrow and deep searches.
Local Search. I use AI to search for files on my computer.
Editorialization of my writing. I don’t use AI to write because writing is a necessary thinking tool for me, but that doesn’t mean I can’t use AI to critique my stances on things to identify loose points, refining messages, and so forth.
Chat with my data. I use AI as a conversational peer to discuss esoteric topics, such as the latest research on AI.
Create images.
Code.
Business use cases, such as invoice reconciling, involve using AIs to perform OCR and text-to-SQL conversions to make natural language queries to my business’s database, among others (we’ll leave these for another day).
However, I can guarantee that the top four are, by far, the ones I rely on the most. And if that’s also you, which is very probable, let me tell you: you are definitely overpaying (as so was I only until recently).
In my case, the models I use are as follows.
My Menagerie of AI Models
As you might expect, I will cite several models. Later in the piece, I’ll describe how to streamline the access to all the APIs in one place to simplify things even more.
To start, you may be asking: which is my default model, the one I use for most tasks?
My default model: Gemini 2.5 Flash
In my case, my default model is Gemini 2.5 Flash, specifically the May 20th version (codenamed ‘gemini-2.5-flash-preview-05-20’ in the API.
It’s my default model for most interactions, especially those related to knowledge tasks, as it excels in these areas (second-best model overall in LMArena, surpassing all OpenAI models). Additionally, it's notable that it ranks first in Math, despite not being Google’s flagship model. Other good models for this are Claude 4 Sonnet or GPT-4o.
But where this model shines is in cost. It’s outrageously cheaper than frontier models like o3 or even compared to OpenAI’s equivalent in GPT-4o or Claude 4 Sonnet.
In fact, if you use this model for most of your tasks, your monthly budget is guaranteed to stay well below the $20/month you’re paying for your app of choice. And not by a few cents; I’m talking multiple times cheaper.
For reference, I’ve been using this model for a couple of days, and I have yet to spend a single dollar.
My Search tool: Perplexity’s MCP Server
For search, I’ve fallen deeply enamored by Perplexity’s MCP Server. While I haven’t been shy of being skeptical of Perplexity’s business, that doesn’t mean the product isn’t legit.
In fact, I would go so far as to say its search API is the best in the world, meaning it provides the best search results by far, even surpassing both ChatGPT and Gemini (the most widely used ‘AI for search’ models, which is also quite impressive).
Therefore, most of my daily interactions with AI involve asking Gemini 2.5 Flash things and this model calling Perplexity when necessary.

But what sets Perplexity apart for me is that it has mastered complex queries.
One of the great things about AI-powered search is that you can send the model highly specific requests, and these models manage to find all the required sources to meet the request, including a full response with included citations.

Or you can opt for much deeper, nuanced searches that get the model into deep research mode:

But isn’t Perplexity another subscription-based tool? How am I using a Gemini model to use Perplexity?
Simple, because we are using Perplexity’s MCP Server. Thus, you don’t have to pay for Perplexity Pro (which would mean another $20/month subscription), and instead use the best AI-powered search engine as a tool your cheap workhorse model uses.
In fact, the entire setup takes just five minutes, and from then on, you'll enjoy the best search experience without paying the full subscription price and without sacrificing performance (quite the contrary).
Get a Perplexity API key and install Perplexity’s MCP Server into Goose following this detailed guide (Full Premium subscribers only).
In terms of deep research tools, Perplexity is also a strong candidate here (as shown above), as the MCP Server includes a query analyzer that automatically adapts to how advanced the search effort is using a ‘Query Complexity Analyzer.’
Put another way, you don’t have to do anything; the Perplexity tool adjusts the search complexity, going deeper if necessary.

Perplexity MCP Server architecture. Source
Moreover, if you manage to have a local open-source model as your workhorse, you would then only have to pay for the Perplexity service, making the end setup even cheaper because Perplexity’s MCP Server is very cheap (I have yet to spend my first half a dollar and it’s been days).
My Writing Editor: Gemini 2.5 Pro
As for which model is better for writing and editorialization, I must say that GPT-4.5 takes the crown.
Sadly, that model is not available in the API and will be sunsetted next month for being too expensive for OpenAI to serve.
But besides that model, I tend to prefer Gemini models over OpenAI’s. The reason is that they are less prone to sycophancy, which naturally leads to them being harsher critics, which is precisely what I need.
My Coding Partners: Claude, Gemini, GPT-4.1, and o3
For those interested, for coding I use many models via Cursor, paying the $20 subscription because the product offers much more than just a chat interface.
Specifically:
I find Claude 4 Sonnet and Opus work great for refactoring and bug finding, especially with Claude Code, an absolutely fabulous tool if used appropriately.
Gemini 2.5 Pro is my execution workhorse
I use o3 for overall code planning or executing tough tasks
If you’re not looking to code extensively and need a one-off code script, you can use GPT-4.1 in OpenAI’s API through Goose. It’s a great model.
However, you’re probably thinking: “Well, I understand the appeal of all this, but I don’t want to have to manage five different API Keys, billing sources, and such.” Luckily, we have a solution for that: OpenRouter, a tool that allows you to access all models with a single key and with native integration with Goose.
Put another way, instead of having to set up connections to OpenAI, Anthropic, Google, and others in Goose, you can set up one single OpenRouter account and access all the former’s models from one place (although paying a slightly higher price per token).
Here’s a step-by-step, detailed guide to setting up OpenRouter and configuring it in Goose (Full Premium Subs only).
However, as we saw last week, the entire idea of using Goose shines the most with the opportunity to utilize several different models in one single app, added to the connectivity to thousands of other tools through the MCP protocol.
Besides Perplexity’s MCP Server, which we have discussed in detail today, among the many we will be discussing over upcoming posts, we have:
BrowserBase. We will learn how to create our first computer-use agent without having to pay a $200 ChatGPT Pro subscription
Blender. Create 3D objects and scenes with ease.
And others like Postgres, Google Maps, or from official ‘marketplaces’ like Zapier (Gmail, Salesforce, Excel…)
A word of caution before you go too crazy with MCP Servers. Avoid non-official servers, as they may include prompt injection attacks, which could also affect official ones, such as GitHub.
In my case, until higher security controls are met, I stick with very popular servers only, ideally coming from the model context protocol team at Anthropic.
All things considered, there are some circumstances where you have no other option but to pay for the subscription.
So, although the goal of today was for you to determine whether you actually need the various subscriptions, there are a handful of features that warrant paying for the subscriptions if you value them.
For this, it’s worth it.
Native image generation
Tools like ChatGPT or Gemini offer native image generation, meaning the model can generate images using the same model that generates text. This means the model understands images in a much more profound form, unlike image-only AI models, which will simply create the image you ask for.
Instead, these models can generate diagrams, text on images, and other more sophisticated outputs than a simple image-generation model can. If it helps, think of these models as ‘as smart’ as a text model, with the only difference being that instead of generating text tokens (words), they are generating image tokens (pixels).
To serve as an example, the diagram in this post used to describe how to access OpenAI’s models requires GPT-4o native image generation, but the thumbnail does not require that level of intelligence and can instead be generated by an image-only model.
Therefore, if you’re looking to convey complex messages with images (i.e., generating diagrams, architecture designs, etc.), paying $20/month for ChatGPT or Gemini is worth it.
Video generation and understanding
Currently, frontier image generation models like Veo 3 are only accessible through the app, not the API (which only supports Veo 2 for now). Both OpenAI and Google only offer these models in the higher-priced subscription tiers, such as ChatGPT Pro or Google AI Ultra, which cost at least over $200 per month.
If you’re a creative that uses AI to generate video sequences, this is your only option. As for video understanding (sending AIs videos and asking questions about them), there’s only one player here, and that’s Google (both in the app and the API, but the latter requires coding).
Deep Research
One of the most powerful features of these models is currently available exclusively in the apps. Despite all major labs offering the feature (OpenAI, Anthropic, xAI, Google), they all require access via a subscription.
It’s a costly feature to serve to users, so naturally, they hide it inside an overpriced subscription (and with rate limits).
And that’s pretty much it; everything else these models offer can be achieved in some form or another.
Closing Thoughts
I see part of my job in this newsletter to level you up.
And I can’t think of a better way than introducing you to agents (last week) while also giving you direct insight as to what models I use for each particular task, optimizing performance while also cutting costs.
That said, at the end of the day, as these models are mere representations of their underlying data distribution (the data they were trained on), I will always encourage you to try for yourself and decide what’s best for you.
However, the fact that you are probably overspending is likely, so I hope this post helps you cut the waste quickly.
Until next time!

THEWHITEBOX
Premium
If you like this content, by joining Premium, you will receive four times as much content weekly without saturating your inbox. You will even be able to ask the questions you need answers to.
Until next time!

Give a Rating to Today's Newsletter |