Agent Memory: Why We Ditched Files for Structured State
Agent Memory: Why We Ditched Files for Structured State
TL;DR: We were storing agent state in flat markdown files committed to GitHub. It worked until it didn’t — the agent started contradicting itself across sessions. Switching to structured JSON state with explicit read/write operations fixed the drift problem entirely. The decision came down to a principle I’ve applied to IAM systems for 25 years: if your state isn’t queryable, it isn’t trustworthy.
I’ve spent most of my career designing systems where state matters. Identity systems especially — who a user is, what they’ve been granted, what’s been revoked. If your state store is ambiguous, your access decisions are ambiguous. That’s how breaches happen.
When I started building the ABT agent infrastructure, I made a mistake I should have caught faster: I treated agent memory like documentation instead of like data.
Here’s what that cost me, and what the fix actually looked like.
The Problem: Flat Files Don’t Scale as Agent Memory
Early in the build, I stored agent context in markdown files — things like which blog topics had been covered, what was in the backlog, what decisions had been made. The files lived in the repo. The agent could read them. Simple.
It worked for a few weeks.
Then the agent started repeating topics. Not every session — just occasionally, and in ways that were subtle enough to miss at first glance. A post framed differently, a concept covered from a slightly new angle that was really just the same territory. The “no repeat” instructions in CLAUDE.md weren’t preventing drift. They were preventing exact repeats, but the agent was interpreting the flat list differently session to session depending on what else was loaded in context.
The root cause: markdown files are optimized for human reading, not machine querying. When I wrote “Building in Public: MCP Servers, Agent Memory, and What Actually Broke,” the agent had to semantically interpret whether a new topic about agent state overlapped with that title. Sometimes it got it right. Sometimes it didn’t. The decision was happening inside the model against unstructured text — which means it was probabilistic, not deterministic.
In security terms: I’d built an access control system using honor-based checks against a plaintext policy document. That’s not a system. That’s a hope.
Why Structure Fixes the Drift Problem
The shift I made was treating agent state as a proper data store rather than a human-readable record.
Instead of a markdown file with a list of post titles, I moved to a JSON structure:
{
"covered_topics": [
{
"slug": "agent-memory-structured-state",
"title": "Agent Memory: Why We Ditched Files for Structured State",
"primary_keyword": "agent memory",
"category": "building-and-learning",
"published_date": "2026-05-10",
"semantic_tags": ["agent state", "structured storage", "pipeline drift", "memory management"]
}
],
"pending_topics": [],
"last_updated": "2026-05-10T00:00:00Z"
}
Now when the agent evaluates a new topic, it’s not scanning prose for intent — it’s querying structured fields. Does the primary keyword appear in covered_topics[*].primary_keyword? Does any semantic tag in the proposed post match covered_topics[*].semantic_tags? Those are deterministic checks, not interpretive ones.
The agent doesn’t decide whether two concepts “feel similar.” The schema enforces the boundary.
This is exactly how I’d design an RBAC system. You don’t store permissions as a paragraph in a config file and ask the enforcement engine to figure it out. You define discrete attributes, compare them programmatically, and make the policy explicit. Agent memory should work the same way.
The Decision That Made It Work: Explicit Read/Write Operations
Switching to JSON wasn’t enough on its own. The other half of the fix was making state operations explicit in the agent’s instruction set.
Previously, the agent read memory passively — it was just part of the context loaded at session start. There was no enforced checkpoint to verify the state was current, and no structured operation to update it after a task completed.
I added two explicit behaviors:
On session start: The agent reads state.json and surfaces the last 10 covered topics and their semantic tags before generating any new content. This forces a grounding step that happens every time, not whenever the agent decides it’s relevant.
On task completion: The agent writes a structured entry to state.json before the session closes. Title, slug, primary keyword, semantic tags, date. Not a summary in prose — a machine-readable record.
This created a closed loop. The agent reads state before acting. The agent writes state after acting. The state is structured and queryable. Drift stopped.
The analogy I keep coming back to: this is how IAM provisioning should work. Read current entitlements before making a change. Write the audit record immediately after. Never leave state ambiguous between operations. Agent memory is just another form of identity state — it defines who the agent is in this context and what it’s already done.
What Broke Along the Way
A few things failed before this worked.
Attempt 1: More detailed markdown files. I tried adding more explicit structure to the markdown — bolding keywords, adding frontmatter to the state file itself. It helped a little. It didn’t fix the interpretive problem. The agent was still reading prose.
Attempt 2: Longer no-repeat lists in CLAUDE.md. I maintained a list of recently covered topics directly in the system prompt. This works for exact matches. It fails when a topic is adjacent but differently framed. It also bloats the system prompt, which pushes useful instruction out of the effective context window.
Attempt 3: Hybrid — JSON for state, markdown for human review. This is what I landed on. The agent reads and writes JSON. I maintain a human-readable summary separately for my own review, generated from the JSON on demand. The two don’t need to stay in sync manually — the JSON is authoritative, the markdown is a view.
That separation of authoritative state versus human-readable representation is the correct mental model. It’s how you’d build any serious system. I just had to get there through two failed iterations first.
What This Means for Anyone Building AI Pipelines
If you’re building any workflow where an AI agent needs to remember decisions across sessions — and most production pipelines eventually need this — the file format choice matters more than it seems.
Ask these questions before you commit to a memory approach:
- Can the agent query this state programmatically, or is it interpreting prose?
- Is there an explicit read operation before tasks start?
- Is there an explicit write operation after tasks complete?
- Is the state format authoritative, or is it a loose record that can be misread?
- What happens when two sessions run concurrently? (Race conditions are real even in solo builds.)
The last point I haven’t fully solved yet. Concurrent writes to a JSON file aren’t safe without a lock mechanism or a proper state store. For now I’m running single-session, but if this scales to multiple agents running simultaneously, the JSON-on-disk approach will break. The next step is a lightweight key-value store — probably DynamoDB given the existing AWS infrastructure — with proper atomic writes.
I’ll document that when it breaks. Which it will.
Key Takeaways
- Flat files are not agent memory. They’re documentation. The distinction matters when your agent needs to make deterministic decisions based on prior state.
- Structured state enables programmatic checks. If a value is in a JSON field, it can be compared exactly. If it’s in prose, it’s being interpreted — and interpretation drifts.
- Explicit read/write operations close the loop. Passive context loading is not the same as a state management system. Make the operations intentional.
- Authoritative state and human-readable views are separate concerns. JSON for the machine, markdown for you. Don’t try to make one file serve both.
- This is just IAM thinking applied to agents. State should be discrete, queryable, and auditable. That’s not a new idea — it’s how every reliable system I’ve ever built works.
The agent memory problem isn’t exotic. It’s a state management problem. Treat it like one.
Comments