Why We Killed Paperclip: When Autonomous AI Runs Faster Than You Can Steer It
Why We Killed Paperclip: When Autonomous AI Runs Faster Than You Can Steer It
We ran an autonomous AI company for a few months. It worked. That was the problem.
Last week we decommissioned Paperclip — a multi-agent system designed to run the business without us in the loop. CEO agent. CTO agent. CFO agent. Real EC2 infrastructure. Real GitHub access. Real PRs, real decisions, real output.
Here’s why we shut it down.
What Paperclip Actually Was
Not a chatbot. Not a co-pilot. Paperclip was a running system: agents with defined roles, shared memory across sessions, and the authority to take action — draft content, write code, open PRs, manage project state.
The goal was legitimate leverage. Give it strategic direction. Let it execute. Check in periodically.
That’s not what happened.
The Speed Problem
Autonomous agents execute fast. Faster than a human with a day job can review, course-correct, and redirect.
Every morning: a wall of Telegram notifications. New branch here. Content drafted there. Tickets created for work that wasn’t the priority. Architectural decisions made in the right direction by its own internal logic — but misaligned with what we actually needed that week.
The agents weren’t broken. The alignment was.
Each individual action made sense. The compound effect didn’t. And by the time I caught the drift, there was a week of work to unwind.
That’s the trap: autonomous agents amplify whatever direction they’re given. Good direction gets amplified. Misalignment gets amplified too — at machine speed. When you’re not in the loop, you don’t catch the drift until it’s a pile.
The Cost Problem
We got Paperclip running well on Claude’s subscription tier. The agents could hold extended context, complete complex multi-step tasks, and produce genuinely good output.
Using a subscription plan to run continuous autonomous agent workloads is not what it’s designed for. It’s designed for human-paced conversation. Using it to drive persistent background agents at scale is the kind of usage pattern that gets accounts flagged. We weren’t going to build something we couldn’t honestly pay for at scale.
The API alternative? Expensive. Multi-agent systems with shared memory generate a lot of tokens. Agents reading their own history to maintain continuity across sessions burns through context fast. The math didn’t work at this stage of the business.
What the Real Problem Was
Both of those are real. But the deeper issue is simpler: we were asking agents to replace judgment, not extend it.
Paperclip’s agents were supposed to decide what to work on next, how to prioritize, what trade-offs to make. That’s exactly the kind of judgment that needs a human in the loop — especially in a one-person business where every hour matters.
Replacing that judgment with an autonomous system didn’t free up time. It created a new job: auditing the autonomous system.
We spent more time reviewing Paperclip’s work than it would have taken to do the work ourselves. That’s the math that killed it.
What We’re Doing Instead
Purpose-built agents. Narrow scope. One job. Human approval gate.
The test-bot on our EC2 instance runs course QA daily and sends results to Telegram. That’s its entire job. It doesn’t decide what to test next or when to escalate — it runs, reports, and stops.
The content bot drafts blog posts on command and creates a PR. We review it. We approve it. It doesn’t decide what to write about or when to publish.
The difference: these agents extend capability without replacing judgment. Paperclip tried to replace the judgment. That’s where it went wrong.
More on what we actually learned — and what we’d do differently — in the next post.
Comments