Laogege's Journal

AI's Big Problem: Why 'Context Engineering' is the New Frontier

I am Leo Wang, and today we're exploring a concept that's rapidly gaining traction in the AI community: Context Engineering. While some might see it as a new buzzword for an old idea, it represents a crucial evolution from what we commonly know as prompt engineering. The core idea is that as AI models become more powerful, the focus shifts from writing a single perfect instruction to architecting a dynamic system. This system's job is to continuously provide the AI with precisely the right information, in the right format, at the right time, to solve a given task. It’s about creating an environment where the task is plausibly solvable by the model. Unlike static prompt engineering, context engineering is about managing a flow of information, dynamically populating and changing the context based on the evolving needs of an AI agent. To truly grasp its importance, we need to understand the common failures that occur within an AI's context window. The first is 'context poisoning.' This happens when a small error or hallucination enters the AI's working memory and gets repeatedly referenced, leading the entire system astray. For instance, if an agent hallucinates a minor detail about its goal, that false information can propagate, causing the agent to pursue an irrational objective. The second issue is 'context distraction.' As the context window fills with information, especially from a long conversation or a history of repeated actions, the model can get distracted. It may start focusing on repeating past actions rather than generating novel solutions to the problem at hand. Research indicates that model correctness can begin to fall significantly as the context grows, with some open-weight models showing degradation after as few as 32,000 tokens. Then there's 'context confusion,' which occurs when superfluous content, like irrelevant tool descriptions, is included in the context. Studies show that providing an AI with too many tools, or tools that aren't relevant to the user's request, can cause it to generate low-quality responses or call functions unnecessarily. This is particularly problematic for smaller models. Finally, we have 'context clash,' a more severe version of confusion where new information directly conflicts with existing data in the prompt. This can dramatically degrade performance, with some analyses showing an average performance drop of nearly 40 percent when context is provided in conflicting, piecemeal turns instead of all at once. So, how do we solve these problems? This is where context engineering strategies come into play. The first is a familiar one: Retrieval-Augmented Generation, or RAG. This involves selectively retrieving and adding only the most relevant information to the prompt. For an agent with access to dozens of tools, RAG can be used to select a small, relevant subset for the task at hand, dramatically improving focus and performance. Another strategy is 'context quarantine,' which involves isolating context into dedicated threads for specialized agents, preventing cross-contamination of information in a multi-agent system. 'Context pruning' is the act of actively removing irrelevant or unneeded information. Think of it as a re-ranking system that filters a large amount of retrieved data down to the most essential pieces before feeding it to the AI. We also have 'context summarization,' which condenses long conversations or large documents to preserve the most critical information while freeing up space in the context window. This is essential for maintaining long-running interactions without succumbing to distraction. Lastly, 'context offloading' involves using external systems, like a database or a dedicated tool, to store and manage information, creating a form of short-term and long-term memory for the AI that exists outside its immediate context window. Ultimately, context engineering is about recognizing that a large context window isn't a magic bullet. It's a resource that must be actively and intelligently managed. This shift from static prompts to dynamic context management is a sign of the field maturing, paving the way for more robust, reliable, and truly intelligent AI systems.

Context Engineering: A Deeper Dive

Ok, so let me start off by saying this: The AI community loves to come up with new names for very old ideas, and this time the buzzword is Context Engineering.

This all started off with a tweet from Toby, who is the CEO of Shopify. He says: "I really like the term context engineering or prompt engineering. This is the art of providing all the context for the task to be plausibly solvable by the LM," and a lot of people agree with him.

So here's a tweet from Andrej Karpathy: "+1 for Context Engineering. Context engineering is the delicate art and science of filling the context window with just the right information for the next step."

I'm going to cover what context engineering means and how you can manage your context. But here are a couple of other takes:

Here is Anchor, he says: "As the model gets more powerful, I find myself focusing my efforts on context engineering, which is the task of bringing the right information in the right format to the LLM."

And here's another definition that I coded in one of my previous videos:

"Prompt engineering was coined as a term for the effort needing to write your task in the ideal format for the chat part. Although I don't agree with just the chat part, context engineering is the next level of this. It is about doing this automatically in a dynamic system."

I personally think prompt engineering actually covers all of these ideas. Prompt engineering is just not about writing a single set of instructions. You can dynamically populate that, and we have been doing this for quite a while, but we have yet another term: context engineering.

And I think we have been seeing this pattern for quite a while. So this happened with Retrieval Augmented Generation (RAG), which is essentially informational retrieval. We have been doing that for decades now.

Langchain's Perspective

Now here's an interesting article from Langchain. I'm going to cover another interesting article of how long contexts fail, which I think is a lot more relevant because it talks about different scenarios in which you're just filling up your context with irrelevant information and how to mitigate those. So we're going to cover that later in the video.

This article tries to make a case for context engineering. Now, according to Langchain, context engineering is building dynamic systems to provide the right information and tools in the right format such that the LM can plausibly accomplish a task, and the focus is that context engineering is about systems, not only user instructions.

The reason they say that the system is dynamic is so that, based on the needs of your agent, you can dynamically provide the context and change the context, and that dynamic context is going to come up with the right information and it will need the right set of tools.

Now, in order to convey the right set of tools and information, you need the proper format in which you are going to convey those instructions, and that's what we have been doing with Prompt Engineering.

But I think the most important part is can plausibly accomplish the task, which I think is very important. So whenever you are building an agentic system or any system on top of these LLMs, you need to look at the underlying model and figure out even if you provide the right context to this model, can the model actually accomplish the task?

Context Engineering vs. Prompt Engineering

But first, let's look at the difference between Context Engineering and Prompt Engineering based on what the Langchain team thinks they are trying to present: Prompt Engineering as a subset of Context Engineering.

So here, at the same even if you have all the context, how you assemble it in the prompt still absolutely matters. The difference is that you're not architecting your prompt to work well with a single set of input data, but rather to take a set of dynamic data and format it properly.

So it's just an exemption of prompt engineering for dynamic changing data and dynamically changing set of tools.

Now you're going to see a number of different articles coming up on context engineering, but the main idea is that you want to provide the most relevant information to your agent or model at the proper time, and if you stuff irrelevant information in the context of the model, the model performance is going to decrease.

Failure Cases in Context Windows

So let's look at some scenarios in which we are providing wrong information to the context of the model. I think in order to understand the need for context engineering, it's very important to understand the failure cases that can occur in the context window of your model.

This article is from True, who is an independent consultant, and he presents very interesting ideas on why we need to look at the context of the model. Even though if we have a long context LLMs, you just can't stuff things into the context of the model and pray that the LLM will be able to solve all your problems.

Context Poisoning

The first one is context poisoning, and this happens when hallucination or other errors make it into the context where it is repeatedly referenced. Now, the term itself was coined by the DeepMind team behind Gemini 2.5 and is presented in the technical report.

So they say that when playing Pokemon, the Gemini agent will occasionally hallucinate while playing. Now, the reason this happens is that there is hallucination or misinformation towards the goal of the agent.

So for example, if you have a multi-turn conversation and at a single turn there is hallucination, the model hallucinates regarding its goal, propagate throughout the conversation, and the model may start focusing on this hallucinated goal, which is going to result in irrational behavior from the agent.

I think these are very interesting ideas, especially if you're building agents, you definitely want to think about some of these.

Context Distraction

The second idea is regarding context distraction.

Now this happens when context grows so long that the model focuses on the context, neglecting what it learned during training. So if you are using a single agent or maybe even in a multi-agent system where you are sharing context, the agent is going to take certain actions throughout a multi-turn conversation.

It turns out that the agent can be distracted by repeated actions, and it could start focusing on those actions rather than trying to come up with novel ideas to solve your problem.

So for example, the Gemini Pro team said in this agentic setup, it was observed that as the context grew significantly beyond 100,000 tokens, the agent showed a tendency towards fearing repeated actions from its lost history rather than synthesizing novel plans.

And you probably have seen this with coding agents like Cursor. Sometimes they get stuck in an error or a bug and they are not able to figure out the solution, and in those kind of situations, you have to create a new session.

Now, the alarming thing is that this distraction ceiling is much lower for smaller open weight models. So for example, a data brick study found that the model correctness began to fall around 32,000 tokens for Llama 270B and earlier for smaller models, so you need to be aware you don't want to have repeated action union or context.

There are ideas on how to clean up other contexts of your lash language models. We're going to touch on some of those later in the video.

Context Confusion

Okay, the next one is context confusion, and this is when superfluous content in the context is used by the model to generate low-quality responses.

So this is critical, especially with agents where you have a number of different tools with tool descriptions.

So there are a couple of studies. In one of them, they found that every model performs worse when provided with more than one tool, and another study found that design scenarios where none of the provided functions are relevant.

We expect the model output to be no function call, but since they are in the context, yet all the models will occasionally call tools that aren't relevant at all, and this is specially worse for smaller models.

So if you stuff tool descriptions in the context, and even though the user request is not relevant to any of the tools at all, smaller models will tend to pick up a random tool just to try to use it rather than actually focusing on the user prompter query.

There also seems to be a limit on how many tools you can put in an agent. I personally recommend to limit it to 10 to 15. This is based on some of the conversations that I have with folks in industry, but here they referred to the study they offered LlamaTrip on 18 billion quantized Model 46 tools, and it actually failed on every single query. Now when they reduced it to 19 tools rather than 46, it had success in some of the calls.

Context Clash

The last one is context clash, and this happens when you include new information and tools in your context that conflicts with other information in the context.

Now this is a more problematic version of context confusion, so the bad context here is irrelevant. It directly conflicts with other information in the prompt, and this also actually kind of addresses how you prompt different models.

So you probably have seen articles on prompting reasoning models is very different from prompting non-reasoning models.

So for example, here's the proposed structure of how you are going to prompt 03 or 01 type of models, so you have your goal written format warnings and the context itself.

The team at Microsoft I think sales files did a study which shows the difference between providing all the context all at once. So you dump everything in the beginning of the conversation and then providing the same context or multiple different turns.

Now turns out this multi-turn shortered instructions is a bad idea for LLMs, and the reason is that you are progressively adding more and more context in multiple turns where some of the information may it looked like that it contradicts the prior information.

So here this is the sharded prompts yielded dramatically worst results with an average drop of 39 percent and the team tested with a range of models. Now 03 was the worst because it dropped from 98 percent to 64 percent.

Solutions for Context Management

Okay, so we talked about all the problems with fitting in the context, but now let's look at some of the solutions which will ensure that you have the right information at the right time that you can dynamically fit into the context of your agent or LLM.

Retrieval Augmented Generation (RAG)

And the first one is the good old RAG or Retrieval Augmented Generation, so this is an act of selectively adding relevant information to help the LM generate a better response.

Now this can help just beyond search. So for example, if you have an agent that has access to 50 tools, you can use rank based on the user query and the description of the tools to selectively choose a smaller subset which is relevant to the user query and that is going to be put into the context of the agent.

So instead of let's say 50 tools, the agent at that step is going to only see 10 tools and then it can probably generate much better responses based on properly using those tools.

Context Quarantine

Now the second idea is regarding context quarantine and is an act of isolating context in the dedicated threads each used separately by one or more LLMs, so this is tied to the idea of a multi-agent system, and this is tied to this idea of handoffs in a multi-agent system that was proposed by OpenAI, so you will build specialized agents with their own context rather than a global shared context.

Context Pruning

Then they propose context pruning, which is an act of removing air irrelevant on or otherwise unneeded information from the context.

So if you have built rack systems, probably re-ranking is a really good example of this that initially you retrieve that say 1,000 chunks and then you have the secondary re-ranking steps which further reduces the context that is going to go into the LLM.

So there is a specialized model called pruvents that was I think presented back in January 2025. It seems very interesting right? So basically it removes the irrelevant context by looking at the user query and then it presents that concise context to the model or agent.

Context Summarization

The next idea of managing your context is context summarization, so it's the act of boiling down and queued context into content summary. We have seen this with chat models, so ChatGPT does this even for some of RAG implementation you want to do that.

So let's say if you are reaching towards the end of the context with window you want to summarize some of the earlier conversation that has happened right. And that way you can preserve most relevant information that the LM is going to focus on.

Now interestingly enough going back to that Pokemon example. So even though the Gemini model has a 1 million Contacts window or in some cases I think they said it could go up to 10 million Contacts window seems like it has a working memory of 100 thousand tokens.

After that you start seeing context distractions, but context summarization is not easy because you need to make sure that you are summarizing only the relevant information. Otherwise that is going to result in context confusion and distraction.

Context Offloading

And the last idea is to use some sort of context off loaning mechanism, which is an act of storing information outside lm's context usually via a tool that stores and manages data.

You could potentially create short-term and long-term memory systems. Okay, so in this video we looked at context engineering some of the ideas relevant to how to manage your context I will be creating some more content on a practical example of context engineering although personally I still think it's just labeling some of the old ideas that we have seen before but do let me know what you think in the comment section below anyways I hope you found this video useful Thanks for watching and as always see you in the next one

🎥 Watch the Animated Story

📺 Experience the complete creation story in this beautifully animated video


You've successfully subscribed to Laogege's Journal
Great! Next, complete checkout for full access to Laogege's Journal
Welcome back! You've successfully signed in.
Unable to sign you in. Please try again.
Success! Your account is fully activated, you now have access to all content.
Error! Stripe checkout failed.
Success! Your billing info is updated.
Error! Billing info update failed.