There is a sentence by Andrej Karpathy that I have been quoting back at clients for nine months. He defined context engineering as "the delicate art and science of filling the context window with just the right information for the next step." That sentence is doing a lot of work, and I want to unpack what is hiding inside it.
For most of 2024, "prompt engineering" was where the craft lived. By mid-2025 the terminology had moved. Cognition, in a post that I still send to people, called context engineering "effectively the #1 job of engineers building AI agents." That framing has stuck. But the LangChain piece I keep recommending is honest about something: context engineering is not yet stabilized practice. It is emerging priority. We know it matters. We are still figuring out how to do it well.
What has changed in 2026 is that one specific layer of context engineering — memory — has detached from the rest and become its own product category. And that, more than any model release, is the architectural shift that will define enterprise agent stacks this year.
The four operations
Context engineering, in the LangChain decomposition, is four operations on the context window:
Write — putting things into a scratchpad, into memory, into a file the agent can read back later.
Select — pulling the right things in. This is the RAG conversation, but it's also tool-selection RAG, memory-selection RAG, file-selection RAG.
Compress — summarization, hierarchical compaction, the Claude Code "auto-compact" trick. Trade fidelity for room.
Isolate — keeping separate streams of work from contaminating each other. Sub-agents are an isolation strategy.
Three of these four — write, select, compress — touch memory directly. That is why the memory layer became the locus of so much vendor activity. If you are not solving these problems, your agent forgets what it knows, retrieves the wrong things, and runs out of context the moment work gets non-trivial.
What changed about memory
Through 2024 and early 2025, "agent memory" was a feature you implemented. Pick a vector DB, decide what gets embedded, write a retrieval function, hope the embeddings are good enough. Memory was a junior task because the architecture was assumed.
That assumption broke. The market split into five recognizably different design camps, and a piece by Marco on blog.bymar.co names them better than anyone else has: raw conversational recall, extracted-profile memory, reflective learning, context operating systems, and coding-agent memory. Each camp has multiple vendors. Each vendor is opinionated about which failure mode they are solving.
The five names that come up most often in the systems I see actually deployed:
Mem0 — extracted profiles. Sits between the agent and storage, watches conversations, distills facts. The "what this user actually wants" layer.
Honcho — reflective learning. Models the user's mental model. Designed for agents that need to understand whom they are talking to over time.
OpenViking — context operating system. ByteDance's volcengine team's open-source context database. Hierarchical, file-system-paradigm, supports skills as first-class objects.
MemPalace — raw recall, organized around the method of loci. "Don't preemptively shorten. Keep everything. Make it findable." Spatial structure over AI curation.
ByteRover — coding-agent memory. Hierarchical knowledge tree as inspectable Markdown files. Optimized for agents working in codebases.
These do not interoperate. They are not five vendors solving the same problem with different APIs. They are five products solving five different problems, all of which were previously bundled under one word.
Why this matters for architecture
If memory is a feature, you make implementation decisions inside an engineering team. If memory is a category, you make procurement decisions at architecture-board level. That is the change.
A concrete example: I worked with a financial-services team in Q1 who were building a customer-facing assistant. They started with a vector-DB implementation. By month two, they had four different memory needs: short-term scratchpad inside one conversation, long-term user profile across conversations, a learned policy memory that updated as the assistant got corrected, and a knowledge base of internal product documentation.
The vector DB could technically do all four. It was terrible at all four, for different reasons. They ended up with three memory systems running side by side, with a thin orchestration layer in front. That is the architecture now. Memory is no longer a singular noun.
The implication for context engineering
The point I want to land is that context engineering and the memory platform shift are the same shift, viewed from two angles.
From the engineer's seat, context engineering looks like a discipline: how do I fill the window well. From the architect's seat, it looks like a procurement question: which memory vendor or vendors am I committing to, and which failure modes do they each cover?
When I run workshops at Applied Futures, the team that consistently gets the most out of agent deployment is the team that has decided, explicitly, what each layer of memory is for. Not "we have memory" — what kind, for what failure mode, with what eviction policy.
The teams that haven't done that have an agent that occasionally forgets, sometimes hallucinates, and slowly drifts.
Three operational decisions to make this quarter
If you're running an agent program and you haven't separated memory into layers yet, three decisions will move you a long way:
Decide what your scratchpad layer is. Most teams put it in the agent framework (LangGraph state, etc.). That's fine. But write it down so it is reviewable.
Decide whether you need a profile layer. If your agent serves humans repeatedly, you probably do. Pick Mem0 or Honcho. Try both, in that order — Mem0 is the lower-cost baseline; Honcho is where you go if your agents need to model people, not just recall facts.
Decide whether you need a context-OS layer. If you have multiple agents sharing context, working on the same artifacts, drawing from the same skills library — yes. OpenViking is the leading open source path. Inspect it before you commit; the operational characteristics (Go runtime, C++ compiler in the build chain) are real considerations.
The series thread
Last week I argued that spec-driven development is what happens when the bottleneck migrates from prompts to plans. This week's piece is about what happens when the bottleneck migrates from plans to running context. Plans are static. Context is alive. Memory is what makes context survive across runs.
Next week: how MCP became the protocol underneath all of this — 97 million SDK downloads a month and counting.

About the Author
Jacob Langvad Nilsson
Technology & Innovation Lead
Jacob Langvad Nilsson is a Digital Transformation Leader with 15+ years of experience orchestrating complex change initiatives. He helps organizations bridge strategy, technology, and people to drive meaningful digital change. With expertise in AI implementation, strategic foresight, and innovation methodologies, Jacob guides global organizations and government agencies through their transformation journeys. His approach combines futures research with practical execution, helping leaders navigate emerging technologies while building adaptive, human-centered organizations. Currently focused on AI adoption strategies and digital innovation, he transforms today's challenges into tomorrow's competitive advantages.
Ready to Transform Your Organization?
Let's discuss how these strategies can be applied to your specific challenges and goals.
Get in touchRelated Services
Related Insights
The Plan Is the Product: What Spec-Driven Development Actually Changes
GitHub Spec Kit crossed 90,000 stars in May 2026. Here's why spec-driven development isn't documentation-first — it's a contract that survives agent re-runs, and why that changes how engineering works.
Legal Services: From Billable Hours to AI Value
How AI is reshaping legal services by automating routine tasks, enhancing research, and enabling data-driven insights.