This document summarizes the security threat model for the Agent Governance Toolkit (AGT) using a STRIDE-oriented view of the main trust boundaries in the system.
For the current 10/10 OWASP Agentic Top 10 coverage mapping, see
packages/agent-compliance/docs/OWASP-COMPLIANCE.md.
This threat model focuses on the runtime governance layer described in the repository README:
- Agent OS: deterministic policy enforcement, approvals, MCP governance, context and policy controls
- AgentMesh: identity, trust scoring, delegated trust, and inter-agent communication
- Agent Runtime: execution rings, kill switch, sandbox boundaries, and saga controls
- Agent SRE: circuit breakers, replay, error budgets, and cascade detection
Users, operators, or reviewers provide prompts, approvals, policies, and configuration. This is the main entry point for prompt injection, social engineering, and unsafe approvals.
Agents exchange requests, credentials, handoff context, and trust assertions. This boundary is vulnerable to spoofed identities, tampered trust signals, and over-broad delegation.
Agents call MCP tools, file operations, shell commands, APIs, plugins, and external services. This is the highest-risk execution boundary because a successful bypass can lead to code execution, data exfiltration, or destructive side effects.
Agents and services interact with package registries, CI/CD, release pipelines, audit systems, and deployment targets. This boundary matters for supply chain, artifact provenance, and operational integrity.
Human / Operator
|
v
Agent OS policy + approval checks
|
+--> AgentMesh identity / trust validation
|
+--> Agent Runtime execution boundary
|
+--> Agent SRE monitoring / replay / rollback
|
v
Tools, plugins, APIs, storage, and external services
| Surface | Example threats |
|---|---|
| Prompts, retrieved context, memory | prompt injection, poisoned context, hidden instructions |
| Agent identity and delegation | spoofing, replay, forged credentials, trust laundering |
| Tool calls and plugins | code execution, shell abuse, dangerous file writes, privilege escalation |
| Policies and config files | unsafe defaults, policy drift, malformed policy documents |
| Audit and observability | log tampering, trace gaps, incomplete attribution |
| CI/CD and package publishing | supply chain tampering, unsigned artifacts, metadata confusion |
| STRIDE category | Example risk in AGT | Primary mitigations |
|---|---|---|
| Spoofing | Malicious agent impersonates a trusted peer | AgentMesh Ed25519 identity, DID-style identities, challenge-response handshakes, trust scoring |
| Tampering | Policies, audit logs, or artifacts are altered in transit or at rest | Agent OS policy interception, signed attestations, Merkle/hash-chain audit trails, ESRP-oriented publishing controls |
| Repudiation | A user or agent denies having taken a high-risk action | Immutable audit trail, replay tooling, trust and approval metadata, SRE event correlation |
| Information Disclosure | Agent leaks secrets, PII, or internal context through tools or messages | Capability scoping, MCP governance, VFS-style access control, prompt/content sanitization, least-privilege runtime boundaries |
| Denial of Service | Cascading failures, expensive loops, or runaway agents | Agent SRE circuit breakers, error budgets, runtime kill switch, bounded execution rings, rate and token controls |
| Elevation of Privilege | Agent escapes its intended scope or performs unauthorized actions | Agent Runtime rings, Agent OS allow/deny rules, approval workflows, trust decay, constrained delegation |
- Prompt injection or goal hijack causes unsafe tool execution
- Agents call tools outside their approved scope
- Policies are too weak, too broad, or bypassed through aliases or malformed requests
- Hidden context or memory poisons future decisions
- Deterministic policy evaluation before action execution
- Capability allowlists / denylists and action interception
- Approval workflows for sensitive actions
- Prompt, tool-input, and context sanitization
- Read-only policy and context controls for critical data paths
- Untrusted agents spoof trusted ones
- Delegation chains become too broad or unverifiable
- Inter-agent messages are replayed, forged, or accepted without validation
- Supply chain metadata about models, tools, or registries becomes untrustworthy
- Ed25519-backed identity and DID-style agent credentials
- Trust scoring, trust decay, and revocation
- Challenge-response handshake and signed trust attestations
- AI-BOM / provenance tracking for models, data, and packages
- Tool execution leads to code execution or destructive side effects
- Long-running sessions escape intended isolation
- Compromised agents persist after unsafe behavior
- Multi-step workflows leave partial state after failure
- Ring-based execution isolation
- Kill switch and termination controls
- Saga orchestration / compensation for partial failures
- Sandboxed runtime boundaries and auditable execution paths
- One compromised or degraded agent causes cascading failures elsewhere
- Operators lack enough telemetry to understand or contain incidents
- Slow drift or anomalous behavior goes unnoticed
- Circuit breakers and rollout controls
- Error budgets and SLO-driven enforcement
- Replay debugging and event correlation
- Anomaly and cascade detection across agent fleets
| Threat | Agent OS | AgentMesh | Agent Runtime | Agent SRE |
|---|---|---|---|---|
| Prompt injection | Policy interception, approval gates | Trusted handoff context | Runtime containment | Replay + anomaly signals |
| Capability escalation | Policy rules, explicit denies | Scoped trust / delegation | Ring isolation | Detection of unusual call patterns |
| Identity spoofing | N/A | Signed identity + handshake | Runtime session binding | Cross-service correlation |
| Data exfiltration | MCP and policy controls | Trust-aware peer gating | Sandboxed execution | Alerting on unusual transfer patterns |
| Rogue behavior | Policy deny / approval | Trust decay and revocation | Kill switch | Error budgets + cascade detection |
| Supply chain compromise | Policy and config review | AI-BOM / provenance | Signed artifacts and controlled runtime | Operational change monitoring |
AGT reduces risk but does not eliminate it. The main residual risks are:
- Misconfigured policies that are syntactically valid but semantically too permissive
- Human approvers making unsafe decisions under time pressure
- External tools or plugins that behave unsafely inside their allowed scope
- Gaps between documented controls and the exact deployment posture of a given organization
- Keep policy scope narrow and prefer deny-by-default for high-risk tools
- Require explicit approval for destructive, financial, or identity-sensitive actions
- Rotate credentials and revoke trust aggressively when behavior changes
- Treat release metadata, package publishing, and provenance as part of the runtime security boundary
- Use SRE telemetry and replay tooling to investigate suspicious agent actions