Threat Modeling for AI: The Framework I Use Daily

by Alien Brain Trust AI Learning
Threat Modeling for AI: The Framework I Use Daily

Threat Modeling for AI: The Framework I Use Daily

I picked up formal threat modeling about fifteen years into my security career. STRIDE had been around since the late nineties, but I didn’t internalize it until I was doing architecture reviews for a large IAM deployment and realized I kept circling back to the same six failure categories — Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege — whether I was thinking about it in those terms or not.

Naming the pattern changed how fast I moved. Instead of freeform brainstorming, I had a checklist with teeth.

When I started building AI-native tools, I applied threat modeling to LLM pipelines almost immediately. What surprised me wasn’t that it worked. It was how directly it mapped, and how many teams I’ve watched skip it entirely because AI felt like a different category of risk.

It isn’t.

TL;DR: STRIDE threat modeling applies directly to AI and LLM systems. Each of the six threat categories has a concrete AI equivalent. If you run a threat model on your AI pipeline the same way you would on an API or identity system, you’ll catch most production risk before it ships.


Why Security Teams Skip Threat Modeling for AI

There are a few reasons this keeps happening.

First, AI tools often enter organizations through product or engineering — not security. By the time a CISO hears about a new LLM integration, it’s already in staging. Threat modeling didn’t happen at intake because the intake process didn’t flag it as infrastructure.

Second, AI vendors frame their products as software products, not attack surfaces. The conversation is about capabilities and pricing, not about what breaks under adversarial conditions.

Third, there’s a genuine knowledge gap. Threat modeling practitioners who know STRIDE cold sometimes don’t know enough about how LLMs work to map the threat categories. And the ML engineers who built the pipeline have never done a formal threat model in their lives.

The result is AI systems in production that have never been asked the basic questions: Who can tamper with this? What can be spoofed? What happens when it fails?


STRIDE Applied to AI Pipelines: The Direct Mapping

Here’s how I run through each category when I’m evaluating an AI system.

Spoofing

In a traditional system, spoofing means an attacker impersonates a legitimate identity. In an AI pipeline, the surface is wider.

An attacker can spoof the user — submitting requests under a legitimate user’s context. They can spoof the model itself in a multi-agent setup by returning fake responses that a downstream agent treats as authoritative. They can craft inputs that spoof trusted internal documents if your RAG system doesn’t verify provenance.

Question to ask: Can I verify who or what produced each input this system acts on?

Tampering

Classic tampering is data modification in transit or at rest. In AI, tampering hits at several points.

The training data pipeline is an obvious one — supply chain attacks against open-weight model fine-tuning are already documented. But in production systems, the more common risk is prompt tampering: injected instructions buried in retrieved documents, user inputs, or tool outputs that modify the model’s behavior. If your pipeline passes retrieved context directly into the prompt without sanitization, you have a tampering surface.

Question to ask: Is there any point in this pipeline where external content is inserted into a prompt without validation?

Repudiation

Repudiation means a user or component can deny having taken an action. AI systems create new repudiation risks because the model’s reasoning is often invisible.

If your AI agent takes an action — sends an email, modifies a record, executes a query — and the only log is “AI completed task,” you have a repudiation problem. Who authorized that? What prompt produced it? Which model version ran? In a regulated environment, that audit gap is a compliance failure.

Question to ask: Can I reconstruct exactly what happened, in what order, and based on what input, for any action this system takes?

Information Disclosure

This one is underestimated in AI pipelines. I covered context window leakage in a previous post, but STRIDE surfaces a broader version of the question.

What data is this model exposed to? What can it be made to reveal? Can a user craft a query that causes the model to surface information it retrieved for a different user’s context? Does your system prompt contain credentials, internal instructions, or architectural details that could be extracted via jailbreak?

RAG systems are particularly exposed here. If your retrieval layer doesn’t enforce authorization at the document level — if it retrieves everything that’s semantically similar regardless of who’s asking — you have an information disclosure vulnerability regardless of how well your application layer is locked down.

Question to ask: Could a determined user extract data from this system that they’re not authorized to see?

Denial of Service

AI systems have denial-of-service vectors that traditional APIs don’t. Token consumption is the obvious one — a user who can craft prompts that force maximum context usage can drive up cost and latency until the service is effectively unavailable.

There are subtler versions. Flooding a RAG system with adversarially crafted documents that pollute the vector store. Triggering recursive agent loops that never terminate. Submitting inputs designed to produce output that causes downstream systems to fail validation and retry indefinitely.

Question to ask: Can a user or attacker cause this system to consume unbounded resources or fail in a way that degrades availability?

Elevation of Privilege

In IAM, privilege escalation is the threat I spent the most career time on. In AI, it takes new forms.

The clearest case: a user with read-only access submits a prompt that causes an AI agent to take write actions on their behalf, because the agent’s tool permissions weren’t scoped to match the user’s authorization level. The model can be used as a privilege bridge if the tool layer doesn’t enforce boundaries independently.

Multi-agent systems make this worse. An agent with broad permissions receiving instructions from an agent with narrow permissions — with no verification that the instruction source is authorized to delegate that action — is a privilege escalation waiting to happen.

Question to ask: Can a lower-privileged user cause higher-privileged actions to occur by routing requests through this AI system?


How I Actually Run This in Practice

I don’t run a formal week-long threat modeling engagement for every AI integration. Here’s what I actually do.

When a new AI component is proposed, I draw the data flow diagram first. Inputs, outputs, storage, external calls, model interactions. Ten minutes with a whiteboard or a text file.

Then I run each STRIDE category against each data flow arrow and each component box. I’m looking for the answer “yes” or “I don’t know” — either one becomes a finding.

The output is a short list: confirmed risks with mitigations to implement, and unknowns that need answers before the component ships. I’ve done this in under an hour for straightforward pipelines.

For more complex systems — multi-agent orchestration, RAG with external data sources, anything touching regulated data — I run it more formally and involve whoever owns the data being processed.

The key shift is treating the AI pipeline as infrastructure, not as a vendor product. You wouldn’t deploy a new API without asking what authenticates to it and what it can access. An LLM integration deserves the same question.


The One Thing Most Teams Miss

Authorization at the tool layer is the most consistently skipped control I see.

Teams lock down their application layer carefully. They validate user identity, enforce role-based access at the UI, log everything the user does. Then they give their AI agent a tool that can query the database, and they don’t scope that tool’s permissions to match the calling user’s authorization level.

The model runs with whatever permissions the agent was granted. A user who can read their own records just caused a query against everyone’s records, because that’s what the model decided it needed to answer the question.

STRIDE surfaces this under both Information Disclosure and Elevation of Privilege. If you run the model even quickly, you’ll find it. If you don’t run it at all, you won’t.


Key Takeaways

  • STRIDE threat modeling maps directly to AI and LLM pipelines. The threat categories are the same; the attack surfaces are different.
  • Spoofing, Tampering, and Elevation of Privilege are the highest-signal categories for most LLM integrations.
  • Repudiation is underestimated — if your AI agent takes actions without a complete, attributable audit trail, you have a compliance gap.
  • RAG systems have compounding risk across Information Disclosure and Tampering. Retrieval layers need authorization enforcement, not just semantic search.
  • Running threat modeling doesn’t have to be a heavyweight process. A data flow diagram and one hour against six categories will surface most of what matters.

If you haven’t run a threat model against your AI pipeline, start with the data flow diagram. That drawing will tell you more than any vendor security checklist.

Tags: #ai-security#enterprise-ai#threat-modeling#security-engineer#implementation

Comments

Loading comments...