The Agent Trust Problem: Should Your AI Agent Write Code to Your Repository?

March 23, 2026 • by Alien Brain Trust • AI Learning

The Agent Trust Problem: Should Your AI Agent Write Code to Your Repository?

We just gave an AI agent write access to our GitHub repository.

This decision took two days of debate. Here’s what we concluded, how we mitigated the risk, and what you should do if you’re in the same position.

The Trade-off

Benefit: Agent can create branches, commit code, open PRs, and deploy without human friction.

One example from this week:

Agent drafts blog post (2 minutes)
Creates branch (10 seconds)
Commits markdown file (5 seconds)
Opens PR (10 seconds)
Total: 3 minutes, $0.02 in API costs

vs.

Agent drafts post and sends it to us
We manually create branch
We copy/paste the content
We commit, push, open PR
Total: 15 minutes of human time, $0 in API costs but context switch cost

Risk: If the agent’s permissions are too broad or its judgment is flawed, it could:

Commit secrets (API keys, credentials) to the repo
Overwrite production code by mistake
Create a PR that merges destructive changes
Spam the repository with thousands of commits

How We Evaluated the Risk

We asked three questions:

1. What’s the blast radius if the agent makes a mistake?

For content agents writing markdown: Low. A bad blog post can be edited or unpublished. A bad commit message is embarrassing but fixable.

For engineering agents modifying critical paths: High. A bad deployment breaks production. A bad permission change leaks secrets.

2. Can we detect and revert mistakes quickly?

For content PRs: Yes. We review every PR before merge. If the agent writes bad content, we catch it during review.

For production code: Yes, but slower. If the agent commits bad code, we catch it in review, but revert time is measured in minutes or hours, not seconds.

3. What’s the minimum permission set we can give the agent?

Read-only access to most of the repo (fine)
Write access only to 03-Marketing Materials/ (tight)
No access to credentials, infrastructure code, or production keys
No permission to merge without human approval
All commits must be attributed to the agent (audit trail)

What We Built

GitHub team-based permissions:

Agent is member of the “Content” team
Team has write access to: 03-Marketing Materials/ only
Team has no access to: 00-Corporate/, 02-Infrastructure/, 01-Course-Content/
Protected branch rules: No merges without at least one human review

In Paperclip (our agent control plane):

Agents can create branches and commit code
Agents cannot merge PRs (humans only)
All commits are logged with timestamp and reason (“Drafting W03 LinkedIn posts”)
Agents can only work on issues they’re explicitly assigned
If an agent exceeds rate limits or detects an error, it blocks itself and escalates

In practice:

Agent creates PR
PR appears in review queue
Human reviews (2-5 minutes)
Human approves or requests changes
Agent reads approval, takes action (merge or update)
Done

The human is always in the loop for final decision.

The Risks We Accepted

Risk 1: The agent commits credentials Mitigation: We use GitHub’s secret scanning. Any commit with a secret pattern (AWS key format, private key marker) is blocked before it reaches the repo. Plus: our git hooks scan for secrets locally.

Risk 2: The agent overwrites important files Mitigation: Protected branches. Agent cannot merge without approval. Plus: the files are in a tightly scoped directory. It can’t touch infrastructure code.

Risk 3: The agent spam-commits Mitigation: We can revoke the agent’s token immediately. GitHub API has rate limits (5,000 requests/hour). Beyond that, the agent’s requests fail and it blocks itself.

Risk 4: The agent’s prompt is bad, and it commits weird things Mitigation: All commits are reviewed. A weird commit is caught in review, not merged, and we adjust the agent’s instructions.

When Agents Shouldn’t Have Write Access

Don’t give write access if:

High-stakes code changes. Production infrastructure, auth systems, payment processing. Humans only.
Regulated environments. Healthcare, finance, legal. Agents + humans, but the human signature matters for compliance.
Ambiguous requirements. If the spec is fuzzy, an agent will commit what it interprets. You’ll spend time reverting. Clarify first, then let the agent write.
Untested workflows. Test the workflow with a human first. Once it’s proven, let the agent automate it.

When Agents Should Have Write Access

Do give write access if:

Clear, repetitive tasks. “Draft 3 LinkedIn posts per week following this template.” Agents excel here.
Low-risk output. Content can be edited. A blog post is never final until published.
Scoped permissions. Agent can only touch designated directories.
Audit trail required. All commits logged, attributed, timestamped.
Fast human review. You can review every PR in 5 minutes. If review takes 30 minutes per PR, the bottleneck shifts to you, not the agent.

The Broader Lesson

This is a trust boundary decision, not a technical one.

You’re deciding: “What parts of my system am I comfortable with an AI agent controlling?”

For us:

Agent controls: Content, blog posts, PR drafts
Agent cannot control: Merging to main, deploying, credentials, infrastructure

It’s not that agents are untrustworthy. It’s that they’re alien. They don’t have skin in the game. If an agent breaks something, it doesn’t feel pain. So we use guards: reviews, permissions, audit trails, rate limits.

These guards are the same ones we use for junior humans joining the team. The principle is identical: trust, but verify. And build walls around things that matter.

Next: If you’re running agents in your workflow, audit the permissions you’ve given them. Do they need write access? Or can they draft, propose, and hand off to a human for merge? The answer determines your risk profile.

Comments

Loading comments...