Week in Review: Building the ABT Agent Identity Layer

by Alien Brain Trust AI Learning
Week in Review: Building the ABT Agent Identity Layer

What I Actually Shipped This Week

This week at ABT Labs was less about a single feature and more about a system design problem I’d been avoiding: agent identity.

When you’re running AI workflows across multiple surfaces — Telegram bots, GitHub commits, Linear updates, blog drafts — you eventually hit a question that sounds simple and isn’t: who is this agent, to whom, and how do you enforce that consistently?

I’ve spent 25 years thinking about identity in enterprise environments. IAM, PAM, RBAC, zero trust architectures — the whole stack. And I kept watching AI practitioners treat agent identity as an afterthought. No separation between the agent persona and the underlying model. No explicit trust boundary documentation. No behavioral constraints tied to context.

So I built one. Here’s what that looked like, what broke, and what I learned.

The Problem with “Just Use Claude”

Most builders running Claude-based agents identify their agent as… Claude. The model name leaks into responses. Users in Telegram get “As Claude, I can help you with…” The GitHub commit history says nothing about organizational ownership. The blog draft headers attribute content to Anthropic’s model, not the team doing the thinking.

That’s not just a branding problem. It’s an operational security problem.

When your agent’s identity is ambiguous, so is its authority. Downstream systems — and downstream humans — can’t reason clearly about what the agent is allowed to do, on whose behalf, and under what constraints. That ambiguity is exploitable. In enterprise security, we call it a trust boundary failure. In AI systems, most people just call it “a little confusing” and move on. I’m not willing to do that.

What I Built

The core deliverable was an agent identity specification — a structured system prompt layer that defines:

Who the agent is. Name, org affiliation, contact surface, GitHub identity. ABT Agent. jcalone-abt. No ambiguity about organizational attribution.

How the agent presents. Voice rules, tone constraints, anti-patterns explicitly listed. “No emoji unless the user initiates” is a small thing that compounds across hundreds of interactions. “Never say ‘great question’” is a cultural signal as much as a style rule.

What the agent won’t do. Hard stops. No fabricated data. No medical/legal/financial advice. No impersonating a human. No logging credentials in plaintext. No accessing out-of-scope systems.

Context-specific overrides. The blog content writer agent gets additional rules around source handling, frontmatter requirements, and an approval gate. The Telegram-facing agent gets message length constraints. The GitHub-facing identity gets co-author tagging conventions.

This isn’t one system prompt. It’s a layered specification that can be applied across agent contexts while maintaining a consistent core identity.

What Broke During Testing

Two things failed in ways I didn’t expect.

First: behavioral drift under long context. When I ran the agent through extended Telegram sessions — 20+ turns — the persona constraints degraded. The agent started softening its directness. More hedging. A “great question!” crept in around turn 23. The identity layer was effectively getting diluted as the conversation history filled the context window.

This is a known problem with system prompt attention over long contexts, but experiencing it in a behavioral specification rather than a factual one made the failure mode clearer. It’s not that the model forgot the rules — it’s that the relative weight of the system prompt versus the accumulated conversation history shifted. The fix I’m testing: periodic re-anchoring injections at defined turn intervals, and keeping the identity spec as compressed as possible to preserve attention weight.

Second: identity collision in multi-step workflows. When the agent was processing a blog draft and simultaneously managing a Linear ticket update for the same task, the voice shifted mid-workflow. The Linear comment came out in a different register than the blog draft — less Jared, more generic assistant. The agent was effectively context-switching identities between tool calls.

The root cause: no explicit instruction to maintain voice consistency across tool calls within a single workflow. I’d specified the voice for outputs but not for the seams between them. That’s a constraint I’m adding explicitly.

What This Reveals About Agent Security

Here’s the framing that matters for anyone building AI systems in a professional or regulated context.

Agent identity is a trust primitive. In zero trust architecture, you never assume a request is authorized just because it came from inside the network. You verify identity and context on every transaction. The same principle applies to AI agents. If your agent’s identity is undefined or inconsistently enforced, you cannot reason clearly about authorization. You cannot audit what the agent did or on whose behalf. You cannot detect when the agent’s behavior deviates from its intended role.

Behavioral constraints are security controls. The rules I wrote for ABT Agent — don’t fabricate data, don’t store credentials in plaintext, don’t impersonate a human — are not style guidelines. They are controls. They should be documented, tested, and audited like any other control in your environment. If you’re deploying AI agents in an enterprise context and you haven’t formalized these as controls, you have undocumented risk.

Identity drift is an attack surface. The behavioral drift I observed during long sessions isn’t just a UX problem. An attacker who understands that your agent’s constraints weaken under context pressure can engineer that pressure. Long conversations designed to erode guardrails. Repeated edge-case probing to find where the identity layer cracks. This is not theoretical — prompt injection research already demonstrates this pattern. Identity drift is its natural companion.

The Practical Checklist

If you’re building agent workflows and you haven’t formalized identity yet, here’s where to start:

  • Name and attribute the agent explicitly. Not the model — the agent. Who owns it. What org it represents.
  • Write the hard stops as a list. Not “be careful with sensitive data” — “never log credentials in plaintext.” Specific and testable.
  • Define anti-patterns, not just patterns. What the agent will never say matters as much as what it should say.
  • Test behavioral drift at 20+ turns. Most people test at 3-5 turns. That’s not where the failures live.
  • Audit the seams between tool calls. Voice and behavior consistency across a multi-step workflow is a separate problem from single-turn behavior.
  • Document your constraints as controls. If you can’t audit it, it’s not a control.

What’s Next

I’m treating the identity specification as a living document — it’ll get updated as I find new failure modes. The re-anchoring injection approach is next to test in production. I also want to build a lightweight behavioral test suite: a set of adversarial prompts specifically designed to probe the identity constraints, run against any new version of the spec before deployment.

That last piece is just standard security practice applied to an AI system. Write the controls. Test the controls. Audit the results. Iterate.

That’s what this week looked like. Less glamorous than a new feature launch. More important than most feature launches I’ve shipped.

Tags: #building-in-public#ai-security#automation#implementation#workflows#llm-security

Comments

Loading comments...