## Understanding Context Engineering in AI
> 大家好,我是王利杰。今天,我们来深入探讨一个在AI圈子里迅速升温的新概念:上下文工程。很多人可能觉得这只是个新瓶装旧酒的营销词汇,但我的分析发现,它揭示了我们在构建高级AI系统时一个至关重要的环节。它已经超越了我们熟知的“提示工程”,正在成为决定AI应用成败的关键。那么,到底什么是上下文工程?简单来说,它不仅仅是写出一段完美的指令。它是一门艺术,更是一门科学,核心在于构建一个动态系统,在最恰当的时机,以最合适的格式,为大语言模型提供最准确的信息和工具。如果我们把提示工程看作是给AI一张清晰的任务单,那么上下文工程就是为AI打造一个智能的、动态的工作环境,确保它能真正理解并完成复杂的任务。要理解上下文工程的重要性,我们必须先看看当前大模型在实际应用中面临的几大“上下文陷阱”。第一个,叫做“上下文中毒”。想象一下,在一次多轮对话中,模型如果产生了一点小小的幻觉或错误,这个错误信息就会像病毒一样留在对话历史里,被反复引用,最终导致整个任务偏离轨道,让AI做出匪夷所思的非理性行为。第二个,是“上下文分心”。当模型的上下文窗口塞满了大量信息,尤其是重复性的操作记录时,模型会开始“分心”。它不再专注于解决核心问题,而是倾向于模仿历史记录里的重复动作,失去了创造性解决问题的能力。有研究表明,即使是强大的模型,当上下文超过一定长度后,其正确性也会开始下降,而对于小型开源模型来说,这个性能拐点来得更早。第三个,是“上下文混淆”。这种情况在AI智能体应用中尤其致命。如果你给一个AI智能体提供了太多工具,而当前用户的请求其实哪个工具都用不上,模型,特别是小型模型,往往会因为“困惑”而强行调用一个不相关的工具,导致输出质量低下。研究甚至发现,给模型提供的工具并非越多越好,超过一定数量反而会使它在所有任务上都失败。最后一个,也是最直接的,叫做“上下文冲突”。这指的是上下文中包含了相互矛盾的信息。比如,你在多轮对话中逐步给出指令,后面的指令可能与前面的信息产生了冲突,这会导致模型性能急剧下降。有测试显示,相比于一次性给出完整清晰的上下文,分步提供碎片化的指令,会让模型的成功率平均下降近40%。既然问题如此严峻,我们该如何应对?这就是上下文工程要解决的核心。目前业界已经探索出几种有效的策略。首先,是大家可能已经熟悉的“检索增强生成”,也就是RAG。它不仅能用于信息检索,还能在拥有众多工具的AI智能体中,根据用户请求,动态筛选出最相关的一小部分工具,从而大大减少“上下文混淆”的风险。其次,是“上下文隔离”。这个概念常用于多智能体系统,核心思想是为不同的专业任务创建独立的、专用的上下文环境,而不是让所有信息都挤在一个共享的全局上下文中,避免交叉污染。第三种策略,叫“上下文修剪”。这就像是为AI的记忆做一次“断舍离”,主动识别并移除对话历史中不相关或不再需要的信息。比如通过二次重排技术,从大量检索到的信息中,精炼出最核心的内容喂给模型。第四种,是“上下文摘要”。当对话变得非常长,快要触及模型的记忆极限时,系统可以自动将早期的对话内容进行总结,提炼出关键信息。这样既保留了核心记忆,又为新信息腾出了空间。最后一种,是“上下文卸载”。这相当于为AI建立一个外部的“长期记忆”和“短期记忆”系统,通过工具将信息存储在模型上下文之外,需要时再调用,实现无限扩展的记忆能力。总而言之,我们正在进入一个全新的AI应用时代。模型的规模不再是唯一的决定因素,如何智能、高效地管理和输送信息,也就是“上下文工程”的能力,正成为构建强大、可靠AI系统的核心竞争力。这不仅仅是一个新名词,它代表了我们对AI工作原理更深层次的理解,也为所有AI开发者指明了下一个优化的方向。
Ok so let me start off by saying this The AI community loves to come up with new names for very old ideas and this time the buzzword is context Engineering Now this all started off with the street or post from Toby who is the CEO of Shopify So he says I really like the term context engineering or prompt engineering This is the art of providing all the context for the task to be plausibly solvable by the LM and a lot of people agree with him So here's a tweet from Andrecorpathi plus one for Context Engineering Context engineering is the delicate art and science of filling the context with window with just the right information for the next step so I'm going to cover what context engineering means and how you can manage your context also But here are a couple of other takes so here is anchor he says as the model gets more powerful I find myself focusing on efforts on context engineering which is a task of bringing the right information in the right format to the LLLM and here's another definition that I code in one of my previous videos So prompt engineering was coined as a term for the effort needing to write your task in the ideal format for the chat part Although I don't agree with just the chat part part context engineering is the next level of this It is about doing this automatically in a dynamic system I personally think prompt engineering actually covers all of these ideas Prompt engineering is just not about writing a single set of instructions You can dynamically populate that and we have been doing this for quite a while but we have yet another term context engineering and I think we have been seeing this patron for quite a while So this happened with Retrieval Augmented Generation which is essentially informational retrieval We have been doing that for decades now now here's an interesting article from Langchain I'm going to cover another interesting article of how long context fail which I think is a lot more relevant because it talks about different scenarios in which you're just filling up your context with irrelevant information and how to mitigate those So we're going to cover that later in the video This article tries to make a case for context engineering Now according to Langchain context engineering is building dynamic systems to provide the right information and tools in the right format such that the LM can plausibly accomplish a task and the focus is that context engineering is about systems not only user instructions the reason they say that the system is dynamic so based on the needs of your agent you can dynamically provide the context and change the context and that dynamic context is going to come up with the right information and it will need the right set of tools now in order to convey the right set of Tools information You need the proper format in which you are going to convey those instructions and that's what we have been doing with Prompt Engineering but I think the most important part is can plausibly accomplish the task which I think is very important So whenever you are building an agentic system or any system on top of these LMS you need to look at the underlying model and figure out even if you provide the right context to this model Can the model actually accomplish as task Okay but first let's look at the difference between Context Engineering and Prompt Engineering based on what Langchin team thinks they are trying to present Prompt Engineering as a subset of Context Engineering So here at the same even if you have all the context how you assemble it in the prompt still absolutely matters The difference is that you're not architecting your prompt to work well with a single set of input data but rather to take a set of dynamic data and format it properly So it's just an exemption of prompt engineering for dynamic changing data and dynamically changing set of tools Now you're going to see a number of different articles coming up on context engineering but the main idea is that you want to provide the most relevant information to your agent or model at the proper time and if you stuff irrelevant information in the context of the model the model performance is going to Decrease so let's look at some scenarios in which we are providing wrong information to the context of the model I think in order to understand the need for context engineering it's very important to understand the failure cases that can occur in the context window of your model So this article is from True who is an independent consultant and he presents very interesting ideas on why we need to look at the context of the model Even though if we have a long context LMS and you just can't stuff things into the context of the model and pray that the LM will be able to solve all your problems The first one is context poisoning and this happens when hallucination or other errors make into the context where it is repeatedly referenced now The term itself was coined by the Deepmind team behind Gemini 2.5 and is presented in the technical report So they say that when playing Pokemon the Gemini agent will occasionally hallucinate while playing now The reason this happens is that there is hallucination or misinformation towards the coal of the agent So for example if you have a multi turn conversation and at single turn there is hallucination The model hallucinates regarding its goal propagate throughout the conversation and the model may start focusing on this hallucinated goal which is going to result in irrational behavior from the agent I think these are very interesting ideas especially if you're building agents you definitely Want to think about some of these The second idea is regarding context distraction Now this happens when context grows so long that the moral or focuses on the context neglecting what it learned during training So if you are using a single agent or maybe even in a multi agent system where you are sharing context the agent is going to take certain actions throughout a multi turn conversation It turns out that the agent can be distracted by repeated actions and it could start focusing on those actions rather than trying to come up with novel ideas to solve your problem So for example the Gemini Pro team said in this agentic setup it was observed that as the context grew significantly beyond hundred thousand tokens the agent showed a tendency towards fearing repeated actions from its lost history rather than synthesizing novel plans and you probably have seen this with coding agents like Cursor Sometimes they get stuck in an error or a bug and they are not able to figure out the solution and in those kind of situations you have to create a new session Now the alarming thing is that this distraction sealing is much lower for smaller open weight models So for example a data brick study found that the model correctness began to fall around 32 000 tokens for Llama 201405B and earlier for smaller models so you need to be aware you don't want to have repeated action union or context There are ideas on how to clean up other contexts of your lash language models We're going to touch on some of those later in the video Okay the next one is context confusion and this is when superfluous content in the context is used by the model to generate low quality responses So this is critical especially with agents where you have a number of different tools with tool descriptions So there are a couple of studies in one of them They found that every model performs worse when provided with more than one tool and another study found that design scenarios where none of the provided functions are relevant We expect the model output to be no function call but since they are in the context yet all the models we occasionally call tools that aren't relevant at all and this is specially worse for smaller models So if you stuff tool descriptions in the context and even though the user request is not relevant to any of the tools at all smaller models will tend to pick up a random tool just to try to use it rather than actually focusing on the user prompter query There also seems to be a limit on how many tools you can put in an agent I personally recommend to limit it to 10 to 15 This is based on some of the conversations that I have with folks in industry but here they referred to the study they offered Llamathrip on 18 billion quantized Model 46 tools and it actually failed on every single query Now when they reduced it to 19 tools rather than 46 it had success in some of the calls The last one is context clash and this happens when you accude new information and tools in your context that conflicts with other information in the context Now this is a more problematic version of context confusion so the bad context here is irrelevant It directly conflicts with other information in the prompt and this also actually kind of addresses how you prompt different models So you probably have seen articles on prompting reasoning models is very different from prompting non reasoning models So for example here's the proposed structure of how you are going to prompt 03 or 01 type of models so you have your goal written format warnings and the context itself The team at Microsoft I think sales files did a study which shows the difference between providing all the context all at once So you dump everything in the beginning of the conversation and then providing the same context or multiple different turns now turns out this multi turn shortered instructions is a bad idea for LMS and the reason is that you are progressively adding more and more context in multiple turns where some of the information may it looked like that it contradicts the prior information So here this is the sharded prompts yielded dramatically worst results with An average drop of 39 percent and the team tested with a range of models Now 03 was the worst because it dropped from 98 percent to 64 percent Okay So we talked about all the problems with a fitting in the context but now let's look at some of the solutions which will ensure that you have the right information at the right time that you can dynamically fit into the context of your agent or LRM and the first one is the good old rag or Retrieval Augmented Generation so this is an act of selectively adding relevant information to help the LM generate a better response Now this can help just beyond search So for example if you have an agent that has access to 50 tools you can use rank based on the user query and the description of the tools to selectively choose a smaller subset which is relevant to the user query and that is going to be put into the context of the agent So instead of let's say 50 tools the agent at that step is going to only see 10 tools and then it can probably generate much better responses based on properly using those tools Now the second idea is regarding context quarantine and is an actor isolating context in the dedicated threads each used separately by one or more LMS so this is tied to the idea of a multiagent system and this is tied to this idea of Handoffs in a multiagent system that was proposed by Openii so you will build specialized agents with their own context rather than a global share context Then they propose context pruning which is an act of removing air relevant on or otherwise unkneedy information from the context So if you have built rack systems probably re ranking is a really good example of this that initially you retrieve that say 1000 chunks and then you have the secondary re ranking steps which further reduces the context that is going to go into the LLM So there is a specialized moral called pruvents that was I think presented back in January 2025 It seems very interesting right So basically it removes the irrelevant context by looking at the user query and then it presents that concise context to the model or agent The next idea of managing your context is context summarization so it's the act of boiling down and queued context into content summary We have seen this with chat models so chatgpt does this even for some of RAG implementation you want to do that So let's say if you are reaching towards the end of the context with window you want to summarize some of the earlier conversation that has happened right And that way you can preserve most relevant information that the LM is going to focus on now Interestingly enough going back to that Pokemon example So even though the Gemini model has a 1 million Contacts window or in some cases I think they said it could go up to 10 million Contacts window seems like it has a working memory of 100 thousand tokens After that you start seeing context distractions but context summarization is not easy because you need to make sure that you are summarizing only the relevant information Otherwise that is going to result in context confusion and distraction And the last idea is to use some sort of context off loaning mechanism which is an act of storing information outside lm's context usually via a tool that stores and manages data You could potentially create short term and long term memory systems Okay so in this video we looked at context engineering some of the ideas relevant to how to manage your context I will be creating some more content on a practical example of context engineering although personally I still think it's just labeling some of the old ideas that we have seen before but do let me know what you think in the comment section below anyways I hope you found this video useful Thanks for watching and as always see you in the next one
### Introduction to Context Engineering
The AI community often introduces new terms for existing concepts. Currently, the buzzword is **"Context Engineering."** This originated from a post by Toby, the CEO of Shopify, who appreciates the term for its focus on providing complete context for tasks to be solved by Language Models (LMs).
Andrecorpathi echoed this sentiment, describing Context Engineering as the *delicate art and science* of filling the context window with precisely the right information.
### Defining Context Engineering
Several perspectives define Context Engineering:
* **Anchor:** Focuses on bringing the right information in the right format to the Large Language Model (LLM) as models become more powerful.
* **Previous Definition:** Prompt engineering involved formatting tasks ideally for chatbots. Context engineering elevates this to an automatic, dynamic system.
It's important to remember that *prompt engineering isn't just about static instructions*. It involves dynamic population, which has been practiced for a while, albeit now under a new name. This pattern is similar to what happened with Retrieval Augmented Generation (RAG), which has roots in information retrieval techniques used for decades.
### Langchain's Perspective
Langchain defines Context Engineering as:
> Building dynamic systems to provide the right information and tools in the right format such that the LM can plausibly accomplish a task.
The emphasis is on **dynamic systems**, adapting context based on the agent's needs. This requires conveying the right information and tools in the proper format, building upon what Prompt Engineering already does. The crucial aspect is ensuring the model can actually accomplish the task, even with the right context.
### Context Engineering vs. Prompt Engineering
The Langchain team presents Prompt Engineering as a subset of Context Engineering. Assembling the prompt with all necessary context *remains crucial*. However, the key difference lies in architecting prompts for dynamic data and toolsets, not just a single input set.
### Key Takeaway
The primary objective is to provide the most relevant information to the model at the *appropriate time*. Irrelevant information hinders model performance.
### Failure Cases in Context
Understanding *failure cases* is essential to appreciate the need for Context Engineering.
#### 1. Context Poisoning
This occurs when hallucinations or errors are repeatedly referenced within the context. The DeepMind team behind Gemini 2.5 coined this term, observing that their agent would occasionally hallucinate while playing Pokemon.
A single hallucination or misinformation early in a multi-turn conversation can propagate, leading the model to focus on a *hallucinated goal*, resulting in irrational behavior.
#### 2. Context Distraction
As context grows excessively long, the model may focus on the context itself, neglecting what it learned during training. Repeated actions within the context can distract the agent, causing it to prioritize those actions over novel problem-solving.
The Gemini Pro team noted that agents in their setup tended to repeat actions from their history rather than synthesizing new plans when the context grew beyond 100,000 tokens. Smaller open-weight models are even more susceptible, with a Databrick study indicating that correctness began to decline around 32,000 tokens for Llama 2 70B.
#### 3. Context Confusion
Superfluous content within the context can lead to *low-quality responses*. This is especially problematic for agents with multiple tools and associated descriptions.
Studies have shown that models perform worse when provided with too many tools. Even if none of the provided functions are relevant to a user's request, smaller models may inappropriately call a tool, showcasing *context confusion*.
Limiting the number of tools to around 10-15 is generally recommended. A study involving Llamathrip on an 18 billion quantized model failed with 46 tools, but showed some success when reduced to 19.
#### 4. Context Clash
This happens when new information or tools conflict with existing context. This is a severe form of context confusion, where irrelevant context directly contradicts other information.
Providing all context upfront versus in multiple turns impacts model performance. Multi-turn, sharded instructions can yield dramatically worse results. A Microsoft study showed an average drop of 39% in success rates when using sharded prompts, with models like 03 experiencing significant drops (e.g., from 98% to 64%).
### Solutions for Effective Context Management
Here are some solutions to ensure the right information is provided at the right time:
#### 1. Retrieval Augmented Generation (RAG)
RAG involves selectively adding relevant information to improve the LM's response. This extends beyond simple search. For instance, with an agent having access to many tools, RAG can selectively choose a subset relevant to the user's query.
#### 2. Context Quarantine
Context Quarantine involves isolating context in dedicated threads used separately by one or more LMs. This relates to multi-agent systems and the idea of handoffs, creating specialized agents with independent contexts.
#### 3. Context Pruning
Context Pruning removes irrelevant or unneeded information from the context. Re-ranking in RAG systems exemplifies this. Initially, many chunks are retrieved, followed by a secondary re-ranking step to refine the context.
#### 4. Context Summarization
Context Summarization involves boiling down accumulated context into a concise summary. This is seen in chat models and some RAG implementations. By summarizing earlier conversations as the context window nears its limit, the most relevant information is preserved.
#### 5. Context Off-loading
This entails storing information outside the LM's context, typically using tools for data storage and management. This can enable the creation of short-term and long-term memory systems.
### Conclusion
This video explored context engineering and relevant context management ideas. More content with practical examples will be created, although it can be argued that it’s just re-labeling old ideas. Share your thoughts in the comments.
🎥 Watch the Animated Story
📺 Experience the complete creation story in this beautifully animated video