Skip to content

Latest commit

 

History

History
187 lines (139 loc) · 7.76 KB

File metadata and controls

187 lines (139 loc) · 7.76 KB

Agent Governance Toolkit Threat Model

This document summarizes the security threat model for the Agent Governance Toolkit (AGT) using a STRIDE-oriented view of the main trust boundaries in the system.

For the current 10/10 OWASP Agentic Top 10 coverage mapping, see packages/agent-compliance/docs/OWASP-COMPLIANCE.md.

Scope

This threat model focuses on the runtime governance layer described in the repository README:

  • Agent OS: deterministic policy enforcement, approvals, MCP governance, context and policy controls
  • AgentMesh: identity, trust scoring, delegated trust, and inter-agent communication
  • Agent Runtime: execution rings, kill switch, sandbox boundaries, and saga controls
  • Agent SRE: circuit breakers, replay, error budgets, and cascade detection

Trust Boundaries

1. Human -> Agent

Users, operators, or reviewers provide prompts, approvals, policies, and configuration. This is the main entry point for prompt injection, social engineering, and unsafe approvals.

2. Agent -> Agent

Agents exchange requests, credentials, handoff context, and trust assertions. This boundary is vulnerable to spoofed identities, tampered trust signals, and over-broad delegation.

3. Agent -> Tool

Agents call MCP tools, file operations, shell commands, APIs, plugins, and external services. This is the highest-risk execution boundary because a successful bypass can lead to code execution, data exfiltration, or destructive side effects.

4. Agent -> Platform Control Plane

Agents and services interact with package registries, CI/CD, release pipelines, audit systems, and deployment targets. This boundary matters for supply chain, artifact provenance, and operational integrity.

High-Level Data Flow

Human / Operator
    |
    v
Agent OS policy + approval checks
    |
    +--> AgentMesh identity / trust validation
    |
    +--> Agent Runtime execution boundary
    |
    +--> Agent SRE monitoring / replay / rollback
    |
    v
Tools, plugins, APIs, storage, and external services

Primary Attack Surfaces

Surface Example threats
Prompts, retrieved context, memory prompt injection, poisoned context, hidden instructions
Agent identity and delegation spoofing, replay, forged credentials, trust laundering
Tool calls and plugins code execution, shell abuse, dangerous file writes, privilege escalation
Policies and config files unsafe defaults, policy drift, malformed policy documents
Audit and observability log tampering, trace gaps, incomplete attribution
CI/CD and package publishing supply chain tampering, unsigned artifacts, metadata confusion

STRIDE Analysis

STRIDE category Example risk in AGT Primary mitigations
Spoofing Malicious agent impersonates a trusted peer AgentMesh Ed25519 identity, DID-style identities, challenge-response handshakes, trust scoring
Tampering Policies, audit logs, or artifacts are altered in transit or at rest Agent OS policy interception, signed attestations, Merkle/hash-chain audit trails, ESRP-oriented publishing controls
Repudiation A user or agent denies having taken a high-risk action Immutable audit trail, replay tooling, trust and approval metadata, SRE event correlation
Information Disclosure Agent leaks secrets, PII, or internal context through tools or messages Capability scoping, MCP governance, VFS-style access control, prompt/content sanitization, least-privilege runtime boundaries
Denial of Service Cascading failures, expensive loops, or runaway agents Agent SRE circuit breakers, error budgets, runtime kill switch, bounded execution rings, rate and token controls
Elevation of Privilege Agent escapes its intended scope or performs unauthorized actions Agent Runtime rings, Agent OS allow/deny rules, approval workflows, trust decay, constrained delegation

Threats and Mitigations by Package

Agent OS

Main threats

  • Prompt injection or goal hijack causes unsafe tool execution
  • Agents call tools outside their approved scope
  • Policies are too weak, too broad, or bypassed through aliases or malformed requests
  • Hidden context or memory poisons future decisions

Mitigations

  • Deterministic policy evaluation before action execution
  • Capability allowlists / denylists and action interception
  • Approval workflows for sensitive actions
  • Prompt, tool-input, and context sanitization
  • Read-only policy and context controls for critical data paths

AgentMesh

Main threats

  • Untrusted agents spoof trusted ones
  • Delegation chains become too broad or unverifiable
  • Inter-agent messages are replayed, forged, or accepted without validation
  • Supply chain metadata about models, tools, or registries becomes untrustworthy

Mitigations

  • Ed25519-backed identity and DID-style agent credentials
  • Trust scoring, trust decay, and revocation
  • Challenge-response handshake and signed trust attestations
  • AI-BOM / provenance tracking for models, data, and packages

Agent Runtime

Main threats

  • Tool execution leads to code execution or destructive side effects
  • Long-running sessions escape intended isolation
  • Compromised agents persist after unsafe behavior
  • Multi-step workflows leave partial state after failure

Mitigations

  • Ring-based execution isolation
  • Kill switch and termination controls
  • Saga orchestration / compensation for partial failures
  • Sandboxed runtime boundaries and auditable execution paths

Agent SRE

Main threats

  • One compromised or degraded agent causes cascading failures elsewhere
  • Operators lack enough telemetry to understand or contain incidents
  • Slow drift or anomalous behavior goes unnoticed

Mitigations

  • Circuit breakers and rollout controls
  • Error budgets and SLO-driven enforcement
  • Replay debugging and event correlation
  • Anomaly and cascade detection across agent fleets

Threat-to-Control Mapping

Threat Agent OS AgentMesh Agent Runtime Agent SRE
Prompt injection Policy interception, approval gates Trusted handoff context Runtime containment Replay + anomaly signals
Capability escalation Policy rules, explicit denies Scoped trust / delegation Ring isolation Detection of unusual call patterns
Identity spoofing N/A Signed identity + handshake Runtime session binding Cross-service correlation
Data exfiltration MCP and policy controls Trust-aware peer gating Sandboxed execution Alerting on unusual transfer patterns
Rogue behavior Policy deny / approval Trust decay and revocation Kill switch Error budgets + cascade detection
Supply chain compromise Policy and config review AI-BOM / provenance Signed artifacts and controlled runtime Operational change monitoring

Residual Risks

AGT reduces risk but does not eliminate it. The main residual risks are:

  • Misconfigured policies that are syntactically valid but semantically too permissive
  • Human approvers making unsafe decisions under time pressure
  • External tools or plugins that behave unsafely inside their allowed scope
  • Gaps between documented controls and the exact deployment posture of a given organization

Recommended Operational Practices

  • Keep policy scope narrow and prefer deny-by-default for high-risk tools
  • Require explicit approval for destructive, financial, or identity-sensitive actions
  • Rotate credentials and revoke trust aggressively when behavior changes
  • Treat release metadata, package publishing, and provenance as part of the runtime security boundary
  • Use SRE telemetry and replay tooling to investigate suspicious agent actions