Go 0 to 100 with Agents Today

AGENTS & OPEN-SOURCE
Go 0 to 100 with Agents Today

Welcome back! Today, I want you to ‘feel the AI.’ For once, it’s not about reading about it but actively using it in ways you did not know were possible.

I’ve been championing open-source and covering agents for a long time, so it’s about time I provided you with the means to see it for yourself.

By the end of this piece, with a few simple steps, you will have built the following systems:

  • “Chat with your Local LLM”: The basics of running open-source, including having your first LLM installed on your computer and talking to it, using tools like Ollama and Codename Goose.

  • “Chat and Act on your Local System”: Using Goose and ChatGPT, you will do things like rearranging your local folders or an AI that can traverse through your files and see what's in them, like checking whether it’s an invoice and extracting the invoice value, to find what you need in seconds. Your personal smart searcher.

  • “Agentic Gmail”: Using Goose and Zapier, you will have your first ever MCP interaction, where you instruct an LLM to draft an email for you and store it in your Gmail account.

  • “Play your Creation.” Finally, we will create a quick snake game you can run on your computer.

And did I mention that all this will be done with exactly zero lines of code or any technological expertise required?

We will also cover the broad theory behind why all these works, giving you my best shot at visualizing what the future of software will be.

And even if you’re a business executive or investor who isn't particularly interested in running local models, by the end of this post, I will have opened your mind to what’s possible with AI with minimal effort on your part.

In the AI era, success won’t be a matter of intelligence but of high agency—the willingness to just do things.

Let’s dive in.

The Future of Work

If you’re subscribed to this newsletter, I assume you are at least mildly convinced of the importance AI will have in our future. But by the end of this post, you will be fully convinced.

But first, let’s establish the foundations of the future of work from a clear and understandable perspective: from tokens to agents.

Tokens.

To understand what we’re dealing with, to understand the ‘AI economy,’ we need to understand the most basic component: tokens.

Most AI models you interact with today are neural networks with a very particular structure: most of them, if not all (at least all Generative AI models), are sequence-to-sequence models.

But what is an ‘AI model’?

An ‘AI model,’ or more specifically, a ‘generative AI model,’ is just a map between inputs and outputs that takes in a sequence of words, a sequence of pixels (images/video), or spectrogram frames (audio), and returns you another sequence of words, pixels, or spectrogram frames.

Not all AI models are generative, but most AI models you read about these days are, like ChatGPT, because of how powerful they are.

But how do AIs read and write, see and draw, or listen and speak?

The point here is that AIs work by decomposing the data into semantic units of information, which we refer to as ‘tokens.’ They are simply a way of decomposing data into a form that a machine can understand.

This is how ChatGPT breaks sentences into tokens. Source

This allows them to process the data and also generate new forms of that data, such as when ChatGPT produces new words or when a robot’s AI brain generates new action tokens that represent actuator movements that lead to the robot actually moving.

The takeaway here is that modern AI models really do not care about the data type as long as it can be tokenized; as long as we can take data and break it into tokens, an AI should be capable of processing these tokens, learning about them, and generating new ones that are similar to the ones it learned about.

  • If we show ChatGPT a lot of text, it learns to recreate that text,

  • If we show it images or video, it learns to recreate these, too,

  • And if we show an AI humanoid a lot of robotic movements, it learns to imitate them very well.

Tokens in, tokens out.

Fun fact: In the film ‘The Imitation Game,’ which covers Alan’s contributions to deciphering Enigma, the name of the film has little to do with the actual plot but instead refers to how Alan Turing conceived Artificial Intelligence would be.

He nailed it.

Knowing this, do you wonder how ChatGPT can read words, see images or videos, and listen to sounds?

The moment the AI knows how to break data into tokens, it can process and understand it, so the form in which you send that data to the model becomes irrelevant; what’s important is the message the data transmits, not the structure it has.

Just like we humans, the only thing that ChatGPT cares about is the underlying semantics, the meaning conveyed via words, images, or sound.

Put another way, we humans don’t care if we see a dog or hear their bark; both convey the same semantical information: a ‘dog.’ Tokens are the way to make models ‘data type agnostic’ and instead help them focus on what matters, answering: ‘What is this data telling me?’

What this means is that ‘tokens’ are a data-agnostic, universal unit of information; it’s our way of telling AIs that the data type does not matter, only the underlying semantics (what the data says), does.

Therefore, in a world where AI is woven into the fabric of all things leads us to the idea of ‘token factories’ that Jensen Huang, NVIDIA’s CEO, is so adamant about pushing: the idea that we have a set of GPUs (or other accelerated hardware) that produce tokens; everything around us, all “smart” devices, will be processing and generating tokens all the time.

In a way, you can consider them ‘intelligence factories’ that produce text or images for ChatGPT to respond, videos for Veo to display, or action tokens for your future at-home chorus robot to clean your dishes.

All things considerd, it’s not surprising that everything you will read here today and every single AI news you read these days centers around this concept of a ‘token’.

  • Models are charged based on processed and generated tokens

  • Billion-dollar data center contracts are signed based on the number of tokens that the ‘factory’ will produce per second.

  • Elon Musk’s dancing robots are generating thousands of action tokens that guide the movement of the actuators, controlling the humanoid’s movements.

ChatGPT reads and speaks tokens, and Optimus sees and moves using tokens. Tokens are literally overtaking the world.

And one of my goals today is to make it very clear that you should own your tokens, or at least the majority of them, and that surrendering this token generation capability to megacorp third parties will backfire.

But more on that later.

But why understanding tokens is necessary to understand AI’s future?

Simple, besides the fact that AI is literally charged to you on a token basis, in a world where intelligence is produced at the rhythm of generated tokens whose net cost will converge to zero, the amount of waste and opportunity for disruption are endless (you’ll see what I mean in a second).

Understanding the concept tokens is, in itself, an opportunity.

So, with a better intuition of AI’s most basic element, what does this AI-centered future look like?

When AI Takes the Pilot Seat

The biggest change that AI introduces to software and knowledge work, in general, is that we now have systems that can process instructions in natural language as our primary means of communication with them, and they can be programmed using English.

Or you can use your own native tongue, although models are currently extremely English-biased.

While most current software operates according to strict rules defined by the back-end engineers that determine what can be done in a program, we are now swiftly replacing this rule-based backend with an AI model that can take in instructions and execute them accordingly.

In layman’s terms, the ‘rule book’ that guides the behavior of software is now a digital file you can talk to, and which can take action on your behalf if you allowed it to (the so-called ‘agents’).

This leads us to the following important concept. What is an agent?

Many people have misunderstood ‘agentic software’ as simply embedding AI into their workflows when, in reality, it is actually far different: it’s not embedding AI into our software; it’s providing our software as tools to the AI.

Understanding this is a trillion-dollar opportunity if you’re willing to take the risk: no software on this planet is built that way, and all will have to do that transition at some point. That screams an opportunity for disruption of the multiple-trillion software industry.

But what does this new software form factor look like? The closest thing we have currently embodying that vision are MCP Servers, but let me be more specific.

Real agentic software is when the AI is in charge of decision-making around how to approach a problem. Put another way, if we consider the roles of the human, the AI, and the software in the agentic paradigm, it’s more or less like this:

  • Human: In charge of goal-setting. It’s the high-level thinker. Our role is to declare what we want to be done. But importantly, while we decide what we want, we don’t determine how it’s achieved.

  • AI: Processes the human request, makes a plan on how to act on that request, and executes.

  • Non-AI Software: Provided as tools to the agent to execute the plan.

Instead, what most people currently envision as agentic software is actually AI workflows, where human-written software still defines what can be done or not done, and AI fills in the gaps in areas where we think it can do a good job.

For example,

  • one thing is for the AI to automate your company’s invoice reconciliation process based on prompt instructions (i.e., here are my invoices and bank transactions, please reconcile both);

  • another is to have a human organize the process and utilize AI only in specific instances, such as processing a PDF file.

But if both give the same result, why does this matter?

While both might yield the same result (what the human intended to achieve), true agentic software (the former) is significantly more cost-effective because we are relying more and more on AI, whose unit of cost, tokens, is decreasing in price by orders of magnitude per year.

On the other hand, AI workflows require a much more profound level of human participation, which translates to a higher demand for human capital, resulting in significantly higher operating costs, lower margins, and, ultimately, less margin for price decreases, aka less capacity to compete in a rapidly commoditizing sector, aka a one-way ticket to bankruptcy.

And that’s a terrible spot to be in when you are competing with a 10-people startup that has built a competitor to your product in 2 weeks and whose most meaningful operating cost is tokens… which are falling in price to the tune of 80% per year.

To be more specific, inference costs (running these models) have fallen by 99.7% in just the last two years alone.

So, if agentic software is it, which form factor does it have? What is the ideal personal agentic setup for this new era, and what can I build with it today?

My Ideal Agentic Setup

If we buy into this idea that agentic software is the future of software (again, it’s not only a matter of modifying the way we do things “just because,” but simply for being a fundamentally cheaper way to get things done which is what actually drives change), we need to understand what that means for you and me in terms of how we get things done ourselves, even if we aren’t actually creating software but just using it.

In other words, agentic software isn’t only an opportunity for entrepreneurs; it might be a matter of survival, a basic skill, in the upcoming job market.

AI IT job postings are up 448%, while non-AI IT jobs are down 9% over 7 years. Source

In this future (which you are going to see and start building for yourself today), we are going to need five things: Powerful hardware, a local token generator, an agentic client, agentic resources, and agentic tools.

For more information on all of these, read Notion piece {🫡 My Ideal Agentic Setup}—only Full Premium subs.

You will be astonished by the numerous options I am presenting to you today regarding what you can already do with AI. Just the sensation of talking to your computer is already magical. And, again, with no coding or technical expertise required.

Let’s do it!

Subscribe to Full Premium package to read the rest.

Become a paying subscriber of Full Premium package to get access to this post and other subscriber-only content.

Already a paying subscriber? Sign In.

A subscription gets you:

  • • NO ADS
  • • An additional insights email on Tuesdays
  • • Gain access to TheWhiteBox's knowledge base to access four times more content than the free version on markets, cutting-edge research, company deep dives, AI engineering tips, & more