Building in Public: Fixing Our Content Agent's Repetition Problem
Building in Public: Fixing Our Content Agent’s Repetition Problem
TL;DR: Our automated content agent drafted “Prompt Injection Is the SQL Injection of the AI Era” three weeks in a row because it was checking local files for deduplication but missing open GitHub PRs. The fix was a
get_recent_coverage()function that checks both — plus a sub-topic rotation system so each pillar picks a genuinely different angle each week.
Every Saturday morning our Telegram bot wakes up, checks the content calendar, and drafts three blog posts for the coming week. It’s been running for months. The problem we found this week: it had been drafting the same post repeatedly.
Three consecutive weekly PRs. Three variations of “Prompt Injection Is the SQL Injection of the AI Era.” Different dates, slightly different descriptions, identical core concept.
What Caused It
The deduplication logic was checking the wrong thing.
existing_posts = {p.name[:10] for p in get_blog_posts()}
That line checks whether a date slot is occupied. It does not check whether the topic has already been covered. The bot was correctly avoiding date conflicts but had no memory of what subjects it had written about.
The second problem: the bot checked local .md files, but unapproved drafts live as open GitHub PRs on feature branches. The local repo (on main) never shows those posts. So from the bot’s perspective, prompt injection had never been covered — the draft was in an unmerged PR it couldn’t see.
The third problem: the pillar topic descriptions were too vague.
("threats", "An AI security threat, vulnerability, or attack pattern that enterprise security teams should know about")
That description is so broad that Claude can reasonably pick prompt injection every time. Nothing in the guidance said “not this one again.”
How We Fixed It
Fix 1: Check open PRs, not just local files.
We added get_recent_coverage() — a function that queries the GitHub API for open PRs with bot/content-* branches and extracts post titles from their frontmatter. It also reads recently modified local files by mtime rather than just checking date slots.
def get_recent_coverage(weeks: int = 4) -> list[str]:
# 1. Open PRs on private repo — titles from frontmatter
# 2. Local .md files modified within the last N weeks
# Returns list of actual post titles, not slugs
The key detail: we extract titles from frontmatter rather than parsing slugs. A slug like 2026-04-18-prompt-injection-is-the-sql-injection-of-the-ai-era loses information when you split on hyphens. The frontmatter title is the canonical source.
Fix 2: Pass coverage to the model in both turns.
The dedup list gets injected into both the system prompt and the user turn so the constraint is present throughout the full generation — not just at the start.
if covered_titles:
coverage_note = (
"\n\n## Recently Covered Topics (DO NOT REPEAT)\n\n"
+ "\n".join(f"- {t}" for t in covered_titles)
)
system_prompt = system_prompt + coverage_note
Fix 3: Sub-topic rotation per pillar.
We replaced the vague pillar descriptions with a PILLAR_SUBTOPICS dictionary — five specific angles per pillar, rotated by week number.
PILLAR_SUBTOPICS = {
"threats": [
"a specific CVE, vulnerability class, or attack pattern targeting AI systems",
"a supply chain or model poisoning risk for enterprise AI adoption",
"an adversarial prompt technique and how to defend against it",
...
],
...
}
Week 17 gets subtopic index 17 % 5 = 2 — adversarial prompt techniques. Week 18 gets index 18 % 5 = 3 — data exfiltration risks. Prompt injection can only recur every five weeks, and only if there’s nothing in the coverage window blocking it.
Fix 4: /coverage command.
Added a /coverage [weeks] Telegram command so we can inspect what’s in the dedup window without reading the code. Made it visible. If the bot is avoiding a topic, you can see why.
What We Learned About Agent Memory
The root cause here is a pattern I keep seeing in agent systems: the agent’s “memory” of what it has done doesn’t match the actual state of the world.
The bot remembered what was on main. The world had open PRs. Those are different things, and the bot’s dedup logic was blind to the gap between them.
The more general principle: agent memory needs to track the effects of the agent’s actions, not just the state of the primary data store. If your agent writes to a queue, a PR, a staging environment, or any system other than the one it reads from — that’s a blind spot in its self-model.
For us the fix was straightforward: query the PR API. For systems with more complex action graphs, this gets harder — and it’s worth designing for explicitly rather than discovering it from a pile of duplicate content.
This Week at ABT
Alongside the content agent fix, we also shipped:
- SEO soul file for the content writer agent —
agents/content-writer/SEO.mdwith title length rules, meta description spec, tag vocabulary, and H2 structure guidance. Both the Telegram bot and the/blog-postClaude Code skill now load it. - Terraform version bump — the GitHub Actions workflow was pinned to Terraform 1.6.0, which ships with an expired GPG key. Bumped to 1.12.1.
- PR #60 — SEO soul file wired in across both the agent and the Claude Code skill.
Key Takeaways
If you’re running autonomous agents that produce output and store it somewhere other than where they read from, audit the dedup logic. The question isn’t “what has the agent seen?” — it’s “what has the agent done, and can it see all of it?”
The fix took about two hours once we understood the root cause. The three weeks of duplicate content PRs took about five minutes to clean up.
Comments