Building in Public: Week of April 27 — MCP Servers, Agent Memory, and What Actually Broke

by Alien Brain Trust AI Learning
Building in Public: Week of April 27 — MCP Servers, Agent Memory, and What Actually Broke

What Actually Happened This Week

No polished narrative here. This is what I built, what broke, and what I’d do differently.

This week at ABT Labs the focus was on two fronts: getting MCP (Model Context Protocol) servers running as proper, persistent infrastructure rather than ad-hoc tools, and running experiments on how to give agents meaningful memory without creating the kind of data exposure problems that should keep a security-conscious builder up at night.

One went better than expected. The other reminded me why I spent two decades telling enterprise teams to assume breach.


MCP Servers: From Toy to Infrastructure

If you’ve been following this blog, you know the ABT stack runs Claude Code heavily. What I hadn’t done yet was treat MCP servers as first-class infrastructure — versioned, documented, access-controlled — rather than scripts I spin up when I need them.

This week I changed that.

The shift in thinking is straightforward: an MCP server is a service that exposes capabilities to an LLM. That means it deserves the same treatment you’d give any internal API. It has a surface area. It can be misconfigured. If you’re running one locally for development and you haven’t thought about what it can read and write, you have a problem waiting to happen.

The specific work this week:

Locked down filesystem scope. The default posture for most MCP filesystem servers is permissive. I scoped mine explicitly to project directories. No access to home directories, no access to config files outside the project root, no traversal above the working directory. This sounds obvious. Most people don’t do it on first setup.

Added request logging. Every tool call that goes through my MCP servers now logs to a local file: timestamp, tool name, input parameters (sanitized — no credential values), and response status. This is the minimum viable audit trail. If something misbehaves — an agent taking an unexpected action, a tool getting called in a loop — I want a record.

Separated dev and prod configs. Running one MCP config for active development and a separate, more restricted config for anything touching real data or external services. Same principle as not doing your dev work in a production database. Obvious in the enterprise world, commonly skipped in the solo builder world.

None of this is novel. It’s basic discipline applied to a new context. The takeaway: if you’re building on MCP servers, treat them like microservices, not convenience scripts.


Agent Memory: Where It Got Interesting

The more consequential experiments this week were around agent memory — specifically, how do you give an agent useful persistent context without creating a data liability?

The naive approach is to dump everything into a context file and let the agent read it. I’ve been doing a version of this since the early ABT builds. This week I stress-tested it and found the seams.

What I tested:

I built a simple task agent that maintains a rolling context file — notes, decisions made, things to follow up on. The agent reads this file at the start of each session and appends to it as work progresses. Clean in concept.

What broke:

The context file grew. Not surprisingly, but faster than I expected. Within a week of active use it was large enough that loading the full thing was burning meaningful token budget before any actual work started. More importantly, I realized I had accumulated information in that file I wouldn’t want in a prompt indiscriminately — internal project decisions, specific tool configurations, things that shouldn’t be in every context window regardless of what task is running.

This is the memory problem that nobody talks about in the LLM hype cycle: persistent context isn’t free, and it isn’t neutral. Every token you load is budget you’re spending. Every piece of information you persist is a piece of information that can be exposed, leaked, or misused by a future prompt.

What I’m moving toward:

Structured, scoped memory. Instead of one flat context file, I’m moving to categorized memory stores with explicit retrieval logic:

  • Project state: Current goals, decisions made, known constraints. Loaded always.
  • Implementation notes: Technical decisions, code patterns, tool configs. Loaded when working on code.
  • Session scratch: Temporary notes from the current session. Written but not persisted across sessions by default.

The agent doesn’t get everything every time. It gets what’s relevant to the current task. This reduces token burn and — more importantly from a security standpoint — limits information exposure per interaction.

I’m not using a vector database for this yet. For a solo builder at this stage, that’s engineering overhead I don’t need. Structured markdown files with clear headers and a retrieval function that selects based on task type gets me 80% of the value at a fraction of the complexity.


The Security Lesson That Applied to Both

Here’s the through-line between MCP scope control and memory management: least privilege isn’t just an enterprise compliance checkbox. It’s sound architecture.

Every security professional I know has said “least privilege” so many times it’s become noise. Then we start building with LLMs and we hand the agent a filesystem, a context file with everything in it, and access to every tool we’ve wired up — because that’s the path of least resistance when you’re moving fast.

The problem compounds quickly. An agent with broad access and broad context will use both. Sometimes in ways you didn’t intend. Sometimes because a prompt pushed it in an unexpected direction. Sometimes because you forgot what you put in that context file six weeks ago.

The discipline I’m applying to ABT’s agent infrastructure is the same discipline I’d recommend to an enterprise security team evaluating AI tooling for internal use:

  1. Scope tool access explicitly. Default to restricted, expand deliberately.
  2. Log everything the agent does that touches external state.
  3. Treat persistent context as sensitive data. Know what’s in it and who can read it.
  4. Review your memory and config files periodically. They drift.

This isn’t about slowing down. I’m still moving fast. It’s about not building technical debt that becomes a security incident.


What’s Next

Next week the focus shifts to testing multi-step agent workflows — specifically, what happens when one agent hands off to another and whether the context and tool permissions propagate in ways I expect or ways that surprise me.

I’m also going to write up the MCP server logging setup in more detail. The implementation is simple enough that I can share it directly.

If you’re building with MCP servers or experimenting with agent memory architectures and have run into problems I haven’t covered here, reach out. The most useful building-in-public exchange I can have is comparing notes on what broke and how we fixed it.

Tags: #building-in-public#implementation#ai-tools#llm-security#automation#workflows

Comments

Loading comments...