In partnership with

TWB’s SPONSOR OF THE WEEK

The AI Agent Shopify Brands Trust for Q4

Generic chatbots don’t work in ecommerce. They frustrate shoppers, waste traffic, and fail to drive real revenue.

Zipchat.ai is the AI Sales Agent built for Shopify brands like Police, TropicFeel, and Jackery — designed to sell, Zipchat can also.

  • Answers product questions instantly and recommends upsells

  • Converts hesitant shoppers into buyers before they bounce

  • Recovers abandoned carts automatically across web and WhatsApp

  • Automates support 24/7 at scale, cutting tickets and saving money

From 10,000 visitors/month to millions, Zipchat scales with your store — boosting sales and margins while reducing costs. That’s why fast-growing DTC brands and established enterprises alike trust it to handle their busiest season and fully embrace Agentic Commerce.

Setup takes less than 20 minutes with our success manager. And you’re fully covered with 37 days risk-free (7-day free trial + 30-day money-back guarantee).

On top, use the NEWSLETTER10 coupon for 10% off forever.

FUTURE
Key Emerging AI Trends

A year ago, OpenAI launched the o1 model, marking the beginning of the era of reasoning models and sparking a new wave of excitement and spending.

A year later, though, improvements have already been showing diminishing returns, with GPT-5 confirming that sentiment (although GPT-5 Pro’s “bad release” was deeply exaggerated, it was not the step function improvement we had hoped for).

We need a new thing. But what is it?

Today, we are looking at the five trends that are generating more interest, excitement, and funding. We’ll first focus on the next frontier in AI, memory, and we’ll then move to others, including things like ‘humility training’, ‘eval rethinking’, or ‘programmatic prompt engineering,’ that are shaping research and the next wave of progress and opportunities.

In a nutshell, this is for those who want to know what’s next.

Let’s dive in.

The Next Frontier: Memory

I’ll just go and say it: Memory is the next frontier in AI. That is, providing AI models with an ample, dynamic set of memory features that allow them to know you more and better.

Because in a world of marginal improvements in “intelligence”, context makes a huge difference.

AI Engineering is Context Engineering

With AI Labs in charge of improving model priors (what a model can or can’t do, aka skill building), our job as practitioners is to ground the model.

Yes, GPT-5 appears very smart out of the box, but it’s a general model; it needs context to work on your task. Thus, AI engineering is not only about choosing what model works best, but making sure the model actually works.

People tend to summarise this as ‘prompt engineering’, which can easily mislead others into assuming AI engineers simply craft the best possible prompt (choosing the best words, the structure, etc.), when in reality, the key to whether an AI engineer succeeds at the job is if they get context right.

Context engineering is, therefore, a much more appropriate way of understanding what implementing AI means (in the Premium subscriber section, we’ll cover one of the most interesting projects in a while that allows you to automate prompt engineering using AI).

And when memory is done well, the model’s responses just feel different. Superior.

Just like your mom’s advice always feels more accurate than your friend’s, because your mom knows your deepest secrets and can tailor the advice to your particular context, AI models become truly transformative experiences when they have the appropriate context to help you.

The issue is that this is easier said than done.

Do we actually know how to implement memory?

As I briefly mentioned on Thursday, there are two memory types: parametric and in-context.

  • Parametric memory refers to the type of memory the model has learned during training or fine-tuning. The model has been optimized to learn this information, making it an intrinsic part of its knowledge.

  • In-context memory refers to the information we provide to the model in real-time, essentially serving as a ‘cheat sheet’ of what it needs to know about you or your task, which is included as part of the prompt, a process known as RAG (Retrieval Augmented Generation).

Context engineering mostly refers to the second, making sure the model has the appropriate context to tackle the task.

The issue is that, in both, this is more of an engineering problem than a technical one.

Focusing on in-context memory first, we have yet to determine the most effective way to apply RAG.

Everyone immediately links this to semantic search, connecting an LLM to a vector database where the model performs a semantic search. This essentially means we take the user’s query and find the context chunks in the model’s database that are more semantically similar to the user’s query.

If the user’s query is “Best places to eat croissant in Paris”, this semantic search will—theoretically—retrieve context chunks regarding good croissant places in Paris based on semantic similarity. For example, one chunk could read as “La Parisienne Madame won 2024’s croissant contest in Paris…”

This sounds amazing, but it works terribly badly in isolation. More advanced systems today combine semantic search with keyword search using BM25, meaning the retrieval system also takes into account keyword matches (while also factoring in term ‘rarity’ to avoid words like ‘a’ being considered important because they appear everywhere).

Other techniques include rewriting (semantic similarity is mostly a pipedream if you don’t actually rewrite context chunks so that they are similar to user queries).

Overall, RAG sounds incredible and is an extremely hyped approach because a significant amount of money has been raised in this particular area of AI. But reality is not that great.

For instance, both Cline and Claude Code, two extremely popular coding and agentic tools, do not even bother using semantic-based RAG and instead rely on a much less sexy approach: regex.

That is, they are literally augmenting model prompts using exact-string search. Using the Paris analogy, they literally search strings such as “best croissants in Paris” instead of trying the “fancy” semantic approach.

Sounds much less optimal, right? Well, that’s because it is, but the irony is that it works better in many cases, which is all you need to know about the current state of memory systems.

All this was just a very long way of me saying we haven’t yet found the way to properly do memory retrieval, meaning whoever gets it right will have a massive advantage.

But another important part of memory is knowing what needs to be remembered.

What is Memorable?

Besides having the best way to retrieve meaningful context for the model, a clearly unsolved problem, the other side of things is how we train models to participate in memory building actively.

That is, how do we make models see a user prompt and say, ‘Oh, this is something I should remember in future interactions'? Here, OpenAI appears to be SOTA, meaning their system works quite well and remembers very meaningful content and applies it very well, offering a truly great experience.

In my personal case, this has been the reason I continue to rely on ChatGPT for many tasks. The other great reason is the ChatGPT desktop app I can invoke with a simple keyboard stroke: ‘⌥’ + 'Space’.

This is the type of product-oriented stuff that dictates research nowadays. You might think that most of OpenAI or DeepMind’s budgets are spent on new architectures. Far from that, most of their research is guided by finding ways to make their products better. Product-oriented research and data center-oriented research are what matter now.

The issue is that these are the type of things that benefit from telemetry from data coming from hundreds of millions of users. There’s a reason OpenAI has paid $1.1 billion for StatSig: product experimentation is the new moat because, while AI itself is commoditizing, product-level features aren’t.

Importantly, it’s going to be very hard to open-source AI to compete at the product level. For instance, open-source AI will be competitive at the skill level, but memory features could open a massive gap in terms of user experience.

Unless Western Labs commit more strongly to open-source, only Chinese open-source research could save the day.

But what about parametric memory? Anything to be said there?

The Eternal Promise of Continual Learning

Dwarkest Patel, the popular AI and geopolitics podcaster (don’t get fooled by ‘podcaster’, this guy knows his stuff), regards ‘continual learning’ as the key to AGI.

But what is continual learning?

In simple terms, instead of having clearly differentiated AI training and inference regimes, meaning models are first trained, taught skills, and then run with no additional skill acquisition, continual learning refers to the methods that allow AIs to learn as they make inferences.

Sounds pretty straightforward, but remains one of the most important unsolved mysteries in AI. To date, no frontier model has this capacity, and no model in the pipeline is expected to have this for now.

GPT-5’s memory acquisition system, which we have just described, is a step in the direction, but most in the industry, including myself, believe true power comes when the model actually learns (modifies its internal weights, aka compression coming from fine-tuning).

Nonetheless, Sam Altman envisions this as GPT-6’s ultimate form, the world of hyperpersonalization, where the model becomes extremely tailored to your very particular needs.

The issue is engineering. Whenever you fine-tune a model with new data, you are automatically incentivizing the model to forget other stuff, as we are literally ‘moving’ the model toward a new distribution of data.

The reason is technical. We train models using MLE (Maximum Likelihood Estimation), which is a fancy way of saying ‘given the training data, what is the best model that maximizes the log-likelihood of the data?’, which can be expressed in plain English as ‘what’s the model that maximizes the chances of generating the data it has observed in training?

Therefore, in this new distribution, data that does not appear as frequently may be forgotten because it’s no longer required to achieve ‘an optimal model’. There are different ways we could fight parametric memory loss:

  1. Train larger models. The larger the model is, the more ‘space’ its latent space has to compress new information without having to forget other information. Put another way, small models are much more forgetful.

  2. Avoid complete fine-tunings. Using techniques like LoRA, heavily used in edge cases (constrained hardware), like Apple Intelligence, we only train a small subset of the model’s parameters, minimizing memory loss.

  3. In-context learning. The only way we truly know how to apply continual learning is by adding the memory to the prompt the model sees. The issue here is that we can run into ‘attention sinks’ where the model simply ignores data despite appearing in the prompt.

  4. A more sci-fi approach would be to map the model’s knowledge, finding the neuron circuits that store each piece of information, and simply freeze those weights so that the model doesn’t forget that. As mentioned, this is a pipe dream right now.

The other option is an engineering miracle: having enough compute to tailor model instances to each user and serve that model only to that user.

For instance, if that user never asks about maths, it’s no big deal if that model instance is fine-tuned toward the areas of interest for that user and forgets other stuff. This is a very likely future direction, because in all honesty, we know so little about these models that I’m not sure we will ever be able to decide what things the models learns or forget.

That said, this is not feasible in today’s standards, as in this scenario, as weird as it may seem, we are massively underserved in terms of compute; we need dozens or even potentially hundreds of gigawatts of data center compute for OpenAI and others to have the capacity to serve user-specific models.

This is an extremely bullish case for NVIDIA and AMD, but the real threat, besides investment not running out, is nothing but energy bottlenecks; either we start building power like there’s no tomorrow, or this ain’t happening in the next decade.

The threat of the AI industry being stagnated not by lack of research or engineering progress, but by hard bottlenecks like investment and energy, is very, very real. That is, we could soon be in a world where we have the technology, but not the means (if we aren’t already there).

But beyond memory, which could be seen as a more obvious ‘next big thing’, there are other areas of the industry that are much less talked about by the media, but thoroughly addressed by experts consistently as trends to monitor.

These include ‘humility training’, ‘the growing importance of professionalizing prompt engineering‘, the much-needed ‘reframing of evals’, and other insights you’ll hardly see elsewhere, starting with the Dead Internet Theory.

Dead Internet Theory

The Internet is undergoing a profound shift as AI-generated content and autonomous agents increasingly dominate online spaces.

A Growing Problem

Just to give a few examples of this:

  • 51% of all internet traffic in 2024 was automated, with 37% being malicious bots, a record high that continues into 2025 (Imperva Bad Bot Report 2025).

  • PerplexityBot traffic surged +157,490% (Cloudflare).

  • Real-time retrieval bots’ traffic grew 49% in Q1 2025 vs Q4 2024 (Washington Post).

  • One billion+ requests per month now come from OpenAI crawlers alone (TechRadar).

  • Over one-third of total web traffic in May 2025 comes from APIs and autonomous agents, not browsers (TechRadar).

All in all, bots already account for the majority of web traffic, with AI crawlers consuming enormous bandwidth and even outperforming humans in tasks like persuasion.

Worse, we have been lured into believing we can actually predict if content is AI-generated, thinking the problem is solvable, when it’s in fact not.

The issue is that many confuse their ability to detect ChatGPT’s particular “voice” with being able to detect AI content, when in reality the reason you can identify AI content is not based unique traits of that particular content that giveaway the source, but the much simpler explanation that, as millions of users leverage the same product, ChatGPT, which has a very distinct persona shaped by their training and fine-tuning, most of the open Internet sounds the exact same way and that is what makes us ‘click’.

Put another way, we have developed a capacity to identify ‘ChatGPT-sounding content’, not actual AI content.

In fact, if I started using Kimi K2 to write content on the Internet, almost no one would notice because it has a distinctly different tone compared to ChatGPT.

Nevertheless, ETH Zurich researchers, who were forced not to publish the research due to their unethical approach, tricked thousands of Reddit users by using AI models to persuade them of a given idea or topic. These people were not only persuaded, but they also failed to realize they were being persuaded to change their minds by an AI model. Models achieved several hundred percentage points of persuasion scores above humans.

Since most of these generations were created from carefully crafted prompts by researchers, this implies that even ChatGPT becomes unrecognizable if proper prompt engineering is employed (most AI-generated content stems from very lazy prompting).

The chances that AI-generated content undermines the very purpose of the Internet are high. So, what can we do? We can do something, but you’re not going to like it.

logo

Subscribe to Full Premium package to read the rest.

Become a paying subscriber of Full Premium package to get access to this post and other subscriber-only content.

Upgrade

A subscription gets you:

  • NO ADS
  • An additional insights email on Tuesdays
  • Gain access to TheWhiteBox's knowledge base to access four times more content than the free version on markets, cutting-edge research, company deep dives, AI engineering tips, & more

Keep Reading

No posts found