In partnership with

Create How-to Videos in Seconds with AI

Stop wasting time on repetitive explanations. Guidde’s AI creates stunning video guides in seconds—11x faster.

  • Turn boring docs into visual masterpieces

  • Save hours with AI-powered automation

  • Share or embed your guide anywhere

How it works: Click capture on the browser extension, and Guidde auto-generates step-by-step video guides with visuals, voiceover, and a call to action.

FUTURE
Create Your First AI Personal Assistant Today

I hope you have a couple of hours to play because today is an important day.

Using a tool that requires no coding, you will build your very first AI-powered email assistant. It will read your emails, categorize them, and, using combinations of different models and tools, create ready-to-read Notion articles for daily reads on the latest news in the AI industry.

In addition to offering new daily content for you, you will:

  • Understand why the time is now, not tomorrow, to start with agents, understanding key terms such as “context engineering”, key agentic capabilities, and complex context management systems.

  • Learn several best practices being applied by incumbents, from Anthropic to Moonshot, for agentic applications

  • Understand exactly how I built one from scratch, including the configuration of all individual pieces, prompts, and strategies.

And did I mention no coding required?

Let’s dive in.

Setting the Stage. Why Now?

We’ve been talking about agents for months, so why only now am I putting my money where my mouth is?

They Finally Work.

Well, before, this industry was all talk, no action, a concoction of empty promises about different elements that sounded great individually but did not work well in combination.

But now, finally, I can safely say agents work. Let me first explain to you why now is the time, and then we’ll discuss our assistant in detail.

It’s all about “Context Engineering.”

When you hear the term ‘AI engineer,’ you must immediately think of context engineering.

Yes, no more prompt engineering; the new trendy term (and much more accurate, really) is ‘context engineering.’ Or as the saying in the industry goes, “everything is context engineering.”

The reason is that, well, it’s the only thing AIs need, actually.

In fact, I believe AIs are going to eat more and more pieces of the cake, meaning that a lot of the ‘scaffolding’ or ‘harness’ (technical terms to describe everything that is built on top of an AI model to make it work, the raison d’être of most of AI startup these days), will soon be not necessary.

In this interview, Noam Brown, Head of Reasoning at OpenAI, the Diplomacy world champion and one of OpenAI’s most valuable researchers shares this precise view, too, so my bearishness is not a wild hunch, the people at the bleeding edge of the technology are screaming it at you (and to clueless VCs).

Thus, to me, the future of humans in knowledge work jobs will resort to two things:

  1. Set goals as a user,

  2. Ensure the AI has the necessary context to solve the task.

What I’m implying is that most of the economic value will accrue to model-layer companies, and I genuinely believe application-layer startups (most AI startups) are simply living on borrowed time except for those that have understood that the only place AIs will consistently require external help is in ‘deterministic bubbles’ (areas of your business workflow where AI probabilistic prediction is not required and just introduces errors—why use o3 to compute the product of two large numbers when it can simply use a calculator).

I discuss this at length here if you want to understand better why I’m so bearish on the future of SaaS companies and AI startups.

Accepting this, it is clear that humans need to optimize for context engineering and only for context engineering.

Context engineering is indeed crucial. However, to avoid steering too far away from today’s topic, we’ll leave deeper reflections for another day. However, if you can’t wait, I want to highlight the 12-factor agent framework.

Fine, but… we kind of knew all this already, and we have just chosen a better way of describing it. However, we haven’t answered yet:

Why are agents finally ready?

All the Stars are Aligned

Despite being ‘the only thing that matters’ since September last year, for months, I struggled to find a reason to use reasoning models like OpenAI’s o3.

I saw how they solved incredibly complex math problems, but I don’t wake up every day asking myself what the remainder of 72023 divided by 1000 is.

But boy, was I wrong about what these models are for.

Their biggest use case is none other than agents, making their general stable releases (fully supported, high availability, and higher rate limits offered by AI Labs) a sign that they are now trustworthy backends for agentic software.

Besides reasoning models, another reason agents are now within reach is better context management.

Let’s cover both.

Reasoning Models Don’t Reason, But Are Incredible Agents

As you probably know, ‘reasoning models’ are AI models that ‘think for longer’ on tasks than your average AI model, generating chains of thought (step-by-step problem solving) that allow them to solve very complex problems, even for the smartest amongst us.

While they are called ‘reasoners,’ which only generates debate on whether they actually reason, to me, these models should be regarded as agents. That is, they are, no debate, the best AI agents on this planet, and it’s not even close.

Although I have covered in great technical detail what ‘agents’ are in the past, basically, they are AI models that have access to tools and task-specific context to take action on our behalf based on our instructions.

If you can ask a model to do something for you, like searching for cheap tickets to Honololu and buy them, that’s an AI agent because it will use travel tools to find that information for you and book the flight.

But, amongst all reasoning models, I want to highlight deep-research models in particular.

I’ve rarely been more excited about a release than when OpenAI recently announced the deep research API, meaning that you can now access these models programmatically.

I hate writing these words, but these models are, in fact, absolute game-changers.

I believe that, costs aside (they aren’t cheap), there’s really little reason for you to consider using other models than these for real complex tasks, which is why one of my goals with this personal assistant is that, under the hood, a deep research model is leading the way.

And the reason is hardly because they use hundreds of sources, that’s great for searching information but misses the point of what makes them so great. But more on that later.

But why am I so excited about these precise models?

Being more specific, the pillars of a viable agent (to which deep research models fulfill every possible premise) can be summarized as follows:

  • Long horizon. The agent can execute long-horizon tasks that may require minutes or hours, thousands of words of “thought,” and support for tool calls (the ability to call software tools), from using a search engine to find something on the Internet to Stripe’s tool for automatic payments, all carried out completely autonomously.

According to research by METR, models are following an exponential rate of progress in the length of the tasks they can perform successfully, surpassing 1.5 hours and doubling every 7 months.

Published in March, we now have o3-pro and o3/o4-mini deep research, so the length is surely longer by now.

  • MCP support. These models support the Model Context Protocol, or MCP. Having a standardized way to call tools is crucial. Otherwise, every tool is different, thus making the agent error-prone.

Think of MCP as a communication layer that allows AIs to communicate with tools using the English language instead of having bespoke, coded connections for each tool.

  • Goals. Deep research models, like any reasoning models, have been trained to achieve goals set by the user, making them great instruction followers (sometimes too good, but that’s another story).

  • Tool call chains. Probably the most underestimated capability of these models is that if your goal requires executing several tools, they can perform dozens of tool calls in the same chain of thought. What this means is that these models really go the extra mile to provide answers to your questions with a level of detail that, to me, exceeds that of most human analysts.

A multiple-tool-call chain of thought by o4-mini.

However, as I mentioned earlier, context management is also crucial, and we are making significant improvements in this area as well.

The Context Engineering Recipe Tastes Much Better Now

Our agentic systems are getting better at context via several improvements:

  • Better context handling. Agents will perform routine summarizations, pre-fetches, consistent user intent analysis, and content redactions, which significantly enhance model performance. Remember the mantra, “Everything is context engineering.”

A great example of this is another deep research model, Kimi Researcher, with the below context-management system that discards irrelevant information as the agent progresses throughout the reasoning chain, preventing the model from drowning in unnecessary detail and focusing it on what matters.

As mentioned by them, this allows the models to increase their iteration count (the number of times they can perform interactions, such as tool calls or user interactions) from 10 to more than 50, with an average of more than 70 search tool calls per trajectory.

In layman’s terms, the average request to this model will produce a report with, on average, 70 different sources, which is significantly better than what most people can produce in days in a matter of minutes.

  • Separation of concerns via multi-agents. Another big one is multi-agents. In other words, while you’re interacting with what appears to be a single model, that same model will instantiate ‘sub-agents’ to ensure separation of concerns (creating agents for each specific subtask). As the 12-factor agent framework suggests, despite growing context sizes, the shorter and specific the task, the better the model’s performance.

One great example of these multi-agent research systems was recently provided by Anthropic, explaining how their deep research capability works under the hood.

As you can see below, what appears to be a simple request actually spans several specialized agents and an orchestrator (the one you’re talking to in reality)

Divide et impera, or ‘divide and conquer’ was the prominent political and miliatry strategy employed by the Romans. Remarkable prescience, as that imperative remains perfectly attuned to AI more than 2,000 years later.

And, importantly, handling this complex diagram is abstracted from you (unless you want to build it), which makes agents better without requiring any additional action on your part (besides paying for it, of course).

There are other reasons why agents are starting to work, but let’s cut to the chase.

AI agents now have the “intelligence,” capabilities, and context management tools and best practices to work.

So, why don’t we walk the talk? Let’s build our first virtual assistant in less than an hour.

Your First AI Email Agent

Today, we are diving into the creation of an email AI agent that does the following:

  1. Retrieve, classify, and move emails between folders

  2. For a specific folder, read the emails, summarise them, ground them on verifiable sources with inline citations,

  3. Use o3 and other support tools like Perplexity to autonomously take all the context and write a complete article that summarises all the critical points to relieve you from the burden of having to go through those emails yourself with the confidence that the AI isn’t holding anything crucial back.

This is the first step in building a powerful virtual assistant to automate away all the tiresome, mundane tasks in your life and let you focus on what matters (in your job, or your family).

The time is now, what are you waiting for?

logo

Subscribe to Full Premium package to read the rest.

Become a paying subscriber of Full Premium package to get access to this post and other subscriber-only content.

A subscription gets you:

NO ADS

An additional insights email on Tuesdays

Gain access to TheWhiteBox's knowledge base to access four times more content than the free version on markets, cutting-edge research, company deep dives, AI engineering tips, & more

Keep Reading