THEWHITEBOX
The Agent Bible, Part 1

Nothing in AI is hotter today than OpenClaw-esque agents, these AIs that seem to run forever, know everything about you, and evolve with you.

I ignored them for a while, until I couldn’t. We talked about them here in healthy detail. But over the last week or so, I’ve been dabbling in agents more seriously, stretching them to their limits, trying to find what is real and what is not.

And I now have the answer to give you. And I have to say it’s actually pretty cool. Thus, today, I’m going to explain how I use AIs, including answers to the following key elements in agentic AI:

  1. How to optimize model selection (hint, it’s not Opus 4.6),

  2. How to understand context engineering, key context architecture and management ideas, and overall best practices for anyone wanting to get serious with agents. I will even explain how I structure prompts, with some examples you can use for yourself.

  3. Why I fear (or rather, know) we are soon going to see price hikes, and why you should be preparing for them,

  4. Tips, tricks, and recommendations I use to avoid the ugly side of these things (including a prompting guide for your guidance), including both my takes and, perhaps more interestingly, leaked lessons from a top closed AI Lab.

If context engineering is the most important skill you can learn in AI currently, by the end of this post, you’re going to be an expert yourself.

Let’s dive in.

Agents in First Principles

I’ve talked about agents more than enough, so I’ll cut to the chase here. Luckily for you, if you haven’t read my previous explanations, understanding what agents are isn't that hard.

Agents are AI models that execute actions. And, crucially, the quality of the action is a function of two things:

  1. The quality of the context we provide the model with. If the agent doesn’t have the correct context, it can’t execute well.

  2. The quality of the model’s knowledge and “intelligence”. If the agent doesn’t know what it doesn’t know, or isn’t smart enough to suggest good actions, the outcome won’t be good. As I always say, you can’t teach a dog to read, no matter how hard you or they try.

Handling the latter depends on choosing the right model for the task. You can, of course, choose to always use the smartest model possible, but as we’re going to see, AIs are not precisely cheap, so you’re going to bankrupt yourself eventually if that’s your approach. The harsh reality is that you need to learn to adjust model choice to the task complexity. We’ll tackle this in more detail below.

The former is much more under your control. As I always tell my clients, the only question you need to ask yourself whenever you interact with an agent is: Am I providing the model with the right context?

Besides having the right context at the right time, the other thing you must consider is whether the agent has the right tools to execute your request. Crucially, when we say agents execute actions, in reality, they just “declare intent”.

“All” the model does is process a sequence of text and previous actions and decide what to predict next, whether that’s a new word, or declare the need to execute a tool, such as Google search to make an Internet search, or Stripe to create an invoice.

Then, the system takes in that declaration, executes the chosen tool, and provides the tool's response to the agent (the execution trace), which guides the model's next steps.

So, to summarise, an agent is just a combination of three things: an AI model, a context harness, and tools.

And how do you choose the best model for you?

First Decision: Model

Ironically, although not the most important thing here, choosing the right model for the task is more crucial for your wallet than for performance.

Here, you need to search for models with the right price while still offering the characteristics an AI model must have to execute well as an agent.

Those are:

  1. Planning: The model must plan how to execute the task. This may read as obvious, but boy, would you be wrong to think this is easy for AI models. In fact, it’s one of the hardest capabilities to achieve from them

  2. Excellent tool-calling capabilities: The model needs to be able to see what the task is and what tools it has available, and correctly decide which tool is the right one (if any). This is surprisingly hard for models, so this automatically discards many, many of them.

  3. “Long horizoness”: The model has to be able to execute very long execution traces, sometimes executing dozens or hundreds of tools in a sequence.

  4. Long context windows. Agentic workloads are extremely long, so a model with a short context window (a short working memory, meaning the amount of context it can handle at any given time is limited) is useless

  5. Reliable. The model must be resistant to hallucinations.

  6. Cost-effective. The model must not bankrupt you in the process.

The list that comes out is surprisingly short: basically, Pareto frontier models. Notice that the keyword here is Pareto. You have to be out of your mind to be using frontier models for most agentic tasks, as most do not, and I can’t overstate this, require frontier-level intelligence.

That is, you should aim for middle-sized models with just-enough performance that don't waste money.

What models are these? Models like Gemini 3.1 Flash or GPT-5.4 mini for US models, GLM-5, Kimi K2.5, Qwen 3.5 27B (the Opus distilled version), as Chinese options.

What all these models have in common is that they are small enough to be reasonably priced, while being distilled directly from the frontier models, making them by far the best bang for your buck.

In fact, frontier models should be used rarely; they should be mostly used for research and particularly complex tasks like coding, which are most often not agentic.

Be that as it may, most agentic tasks do not require the spikey, borderline-savant capabilities that these models offer in areas like coding or maths. In other words, calling a tool to create an invoice does not require frontier-level intelligence, period.

You could push back on the planning side though: for very ambitious agentic tasks (model thinking for days on a task, or truly automating a considerable portion of our daily tasks), you would certainly need frontier-level models.

But here’s the thing, and one of the key takeaways from this piece: today's models plan things they actually cannot execute. Most of the tasks you can think of that would need frontier models are not viable… today.

Although we’ll talk about this in more detail next week, ironically, the reason is not that they can’t, but that the tools they need to execute aren’t ready. This will be a common theme over the next months and years; the digital world needs to become ‘agent-ready’.

But the biggest reason you probably want to avoid frontier models for agents is that their true costs are hidden from us, and you should definitely be getting used to frugality, because…

The inevitable price hike

One of the biggest lies we’re told is that AI is cheap. It’s not. You could, of course, argue that the “value” that these tools provide is worth every cent, but that’s a position of faith that is, by no means, represented in reality, aka revenues.

Uncontrolled use of AI is incredibly expensive, and that’s even considering the fact that we are being extremely subsidized.

Nonetheless, the viral image below shows the extent to which some people are milking AI subscriptions to levels beyond what is foreseeable. 9,200 deployed agents, 17,000 files touched, 1.1 billion processed tokens, a total estimated expenditure of $27,000 with a single Claude Max subscription.

The value is calculated using API prices, which are $5 and $25 per million input and output tokens respectively for the most expensive model, Opus.

Nonetheless, as noted by Anthropic itself, the average subscription costs them $180/month to serve, even though the average subscription is the Pro one at $20/month.

In other words, AI is way, way more expensive than we realize; we are just being spoiled by the AI Labs burning cash to maintain their skyrocketing growth and market share.

But, this begs the question: for how long? One thing is to subsidize to some extent, another is for Anthropic to take a 135x loss on a single subscription, as with the previous user.

That won’t last long.

In other words, eventually, once they have us all completely hooked, they’ll start raising prices, and we’ll realize the true extent of the cost. This could happen in a year or tomorrow.

Thus, you need to get your act together and start choosing your models wisely, getting used to talking to “worse” models that are “good enough” for the task.

You need to get out of your comfort zone, assuming not every request you have can or should be handled by GPT-5.4 or Opus 4.6. The reason is that this affects how you prompt them, what you share with them, and your patience; using “worse” models requires skills that have to be trained.

If you get used to using the best of the best, you’ll soon face a decision: more frugality or bankruptcy?

This leads us to what I believe will be a common theme amongst agents: open source.

Open-source, the answer to agents?

I don’t know about you, but I have a clear plan with agents: although it's not a reality today, I will eventually make sure my personal agent runs on open-source models. Not only is that cheaper, but it is also way more secure than trusting your personal data to these Labs.

Besides, I don’t want to fear the day when they decide to double the cost of each subscription.

As I was saying before, most agentic tasks, things like managing your email inbox, researching news, or creating invoices for your company, are completely doable with moderately intelligent models.

But enough about models, because the biggest lessons I can teach you today have nothing to do with the AI itself, but the system around it.

The Context Problem, Clarified.

Regarding context, it’s without a doubt the one component in the agent trifecta (model, context, tools) you have more control over, and the one you should put more care into.

This is called context engineering, and represents a very interesting dichotomy: it’s pretty fucking simple to understand, and more often than not, overcomplicated by everyone, but it’s nightmarish to implement correctly.

You would be surprised how stupidly complex people make context engineering out to be. Vector databases with hybrid BM25 implementations coupled with fuzzy matching and whatnot, all implementations competing to see who puts more jargon into a single sentence in hopes of appearing sophisticated enough for investors to pour money into your startup.

But in reality, all that complexity is actually performance-degrading. Instead, the most successful ‘context harnesses’, examples like Claude Code or Hermes Agent, make it much simpler: markdown files and a context file management system.

Context engineering is just adding the relevant context to the model’s prompt. That is, what your AI agent actually sees is a prompt that looks like this:

<System_prompt>

   <Behavior>
       The set of rules and instructions for the agent's behavior.
   </Behavior>

   <Tool_definitions>
   The list of tools that the agent can decide to use. This can (and should) be dynamic.
   </Tool_definitions>

   <Formatting_rules>
   Set of rules that define how the agent should respond.
   </Formatting_rules>

   <Context>
      <user_md>
      Explanations of "who the user is"
      </user_md>
      <memory_md>
      AI's scratchpad to add memories for future reference
      </memory_md>
      <boot_md>
      references to past conversations
      </boot_md>
   </Context>

   <User_prompt>
   The user's prompt.
   </User_prompt>

</System_prompt>

Every prompt you send to an agent should look in some way similar to what you’re seeing above; it’s not a rigid structure, you can decide what clauses to use, and how to structure it, but it should definitely be structured. You will later be able to download my literal prompts as guidance.

Of course, the key part here is what’s inside the ‘Context’ clause. Hence, “context engineering” is nothing but making sure that what goes in there is good context, simple as that.

Therefore, everything we are going to discuss below is just this: writing good prompts and markdown files for our agent, coupled with “separation of concerns”. This is the simplest form of context engineering and precisely the one that works best.

Context engineering has two sides: the AI side (how do we ingest that context in a digestible way), and the most important one: the source side (how we prepare our sources).

Interestingly, nobody mentions the second one, even though it's way more important. But first, let’s tackle the former, which everyone thinks makes them a “context engineer.”

The AI side

The AI side is what most people think of as context engineering: how you mold the context once it's available. And while people love to complicate this, you actually only need… three files: user, memory, and boot.

As for the first one, user.md, personally, this is how I think about what to put into it:

  1. Emotional context.

This is who I am, and this is how I want to be treated. Disclaimer: There’s nothing sweet about my interactions with agents. For all the love and care I yearn for from my close family and friends, I couldn't care less about how sweet my models are to me.

To me, rigor is the most important thing, and I’m very clear about it, although I know some of you don’t feel the same way, and that’s okay. But please be aware that asking models to indulge in “human-like” conversations turns them into hallucination machines, as you’re forcing them into a part of their response distribution that's valued not for accuracy but for sycophancy.

Furthermore, avoiding an overly sweet demeanor plays a double role for me: it helps me refrain from anthropomorphizing these agents because, guess what: that’s what your mind will be tempted to do all the time.

  1. Recent relevant context.

It’s great to see agents figuring out stuff about you by themselves, but it’s sure as hell a lot easier to just tell them and keep an up-to-date state of ‘what matters to you’. User user.md to explain the things that matter right now

You can also make it discoverable by the AIs. This is tricky today because every token of context matters because it’s certainly finite, so agents will struggle.

For example, it may take the model several emails to realize you’re a regular Huel customer and that it should track Huel shipments, if that matters a lot to you. Several emails mean the agent will spend a lot of money figuring this out, despite it being literally a sentence in a markdown file.

  1. Work context.

This one is pretty straightforward. Describe what your current job life looks like, as well as what your aspirations and ongoing projects are.

  1. Financial context.

Here’s where you give models access to your financial data. These models are increasingly capable of handling Excel files, so you can send them directly. In my case, however, I prefer not to do so, for two reasons:

  1. Handling Excel does require pretty advanced capabilities. You’re going to have to use the top models.

  2. They still hallucinate like crazy. Even the top models make many mistakes. Just like frontier models rarely hallucinate in chatbot assistant formats at this point, they are still considerably prone to making mistakes on spreadsheets.

Instead, although I haven’t yet fully implemented this, I’m planning to expose my financial data to these models via terminal commands, with background code handling the responses and having the model directly receive the actual answers. In other words, I’m planning to create a financial tool for my agent (more on this in the near future).

This goes to the core of one of the key recommendations I’m giving you today (I’ll discuss the others later): you should not force models to live in your world; instead, adapt your interfaces to them. This will make sense in a second.

This all ends in a user.md prompt, a written description of all I’ve explained above.

But things get much more interesting once we discuss the other files, because it’s here that we start to see agents’ real powers.

logo

Subscribe to Full Premium package to read the rest.

Become a paying subscriber of Full Premium package to get access to this post and other subscriber-only content.

Upgrade

A subscription gets you:

  • NO ADS
  • An additional insights email on Tuesdays
  • Gain access to TheWhiteBox's knowledge base to access four times more content than the free version on markets, cutting-edge research, company deep dives, AI engineering tips, & more

Keep Reading