docs: MCP + Trust Verification integration guide#747
docs: MCP + Trust Verification integration guide#747imran-siddique merged 2 commits intomicrosoft:mainfrom
Conversation
Add layered guide showing how to add governance and trust verification to any MCP server, plus a working example server. Guide covers 4 layers: - Layer 1: Trust Proxy — authorization, per-tool policies, rate limiting - Layer 2: Trust Server — identity, 5-dimension trust scoring, handshakes - Layer 3: Security Scanner — tool poisoning, rug pulls, schema abuse - Layer 4: MCP Gateway — 5-stage runtime interception pipeline Plus TrustGatedMCPServer (AgentMesh), end-to-end flow composition, and Claude Desktop integration. Example server: 3 FastMCP tools with escalating trust thresholds, security scanner fingerprinting, fail-closed authorization, and audit logging. Closes microsoft#707
🤖 AI Agent: contributor-guide — 🌟 What You Did WellHi there, and welcome to the microsoft/agent-governance-toolkit community! 🎉 Thank you for taking the time to contribute — we’re thrilled to have you here. This is a fantastic first pull request, and I can already see the effort and thoughtfulness you’ve put into it. Let’s dive in! 🌟 What You Did Well
🛠 Suggestions for ImprovementWhile this is an excellent start, there are a few areas where we can refine things further: 1. Linting
2. Testing Location
3. Commit Message Conventions
4. Security-Sensitive Code
5. Documentation Length
📚 Helpful ResourcesHere are some resources to help you make the requested changes:
🔄 Next Steps
Once you’ve made these updates, push your changes to this branch. The CI/CD pipeline will automatically re-run, and we’ll review your PR again. If you have any questions or need help, don’t hesitate to ask — we’re here to support you! Thank you again for your contribution. We’re excited to collaborate with you and look forward to seeing your updates! 🚀 |
There was a problem hiding this comment.
🤖 AI Agent: code-reviewer
Review of PR: docs: MCP + Trust Verification integration guide
This pull request introduces a comprehensive integration guide and an example server for adding governance and trust verification to MCP servers. The guide is well-structured, covering four progressive layers of trust and governance. Below is a detailed review of the PR, focusing on the specified areas of concern.
🔴 CRITICAL: Security Issues
-
Fail-Closed Behavior in Trust Proxy and Gateway:
- The
TrustProxyandMCPGatewaycomponents claim to be fail-closed, but there is no explicit test coverage or implementation details provided to verify this behavior. If an exception occurs during theauthorize()orintercept_tool_call()methods, the system must guarantee that the call is denied. - Action: Add explicit tests to simulate exceptions in these methods and verify that the system denies the call in such scenarios.
- The
-
Replay Attack Mitigation in Trust Handshake:
- The trust handshake mechanism described in Layer 2 does not explicitly mention how replay attacks are mitigated. While the use of a challenge nonce is a good start, it is unclear if the nonce is tied to a specific session or if it is time-bound.
- Action: Ensure that the nonce is unique per session and has a limited validity period. Document the mechanism for replay attack prevention in the guide.
-
Rate Limiting in Trust Proxy:
- The
TrustProxysupports rate limiting, but there is no mention of how this is implemented or whether it is resistant to bypass techniques (e.g., using multiple DIDs or IP addresses). - Action: Clarify the implementation details of rate limiting and consider adding IP-based rate limiting or other mechanisms to prevent abuse.
- The
-
Tool Poisoning Detection:
- The
MCPSecurityScannerincludes aCONFUSED_DEPUTYthreat type but explicitly states that it has no built-in detection. This is a significant gap, as confused deputy attacks are a critical risk in multi-agent systems. - Action: Implement detection for
CONFUSED_DEPUTYattacks or provide detailed guidance on how users can define custom rules to mitigate this risk.
- The
-
Cryptographic Key Management:
- The guide mentions Ed25519 keys for identity and cryptographic handshakes but does not provide details on key rotation, storage, or revocation.
- Action: Include a section in the guide on best practices for key management, including secure storage, rotation, and revocation.
-
Audit Log Integrity:
- The audit logs in
TrustProxy,MCPGateway, andMCPSecurityScannerare critical for accountability. However, there is no mention of mechanisms to ensure the integrity and immutability of these logs. - Action: Recommend or implement a mechanism (e.g., hash chaining or signing) to ensure that audit logs cannot be tampered with.
- The audit logs in
🟡 WARNING: Potential Breaking Changes
-
Backward Compatibility of
wrap_mcp_server:- The note that
wrap_mcp_server()always enables built-in sanitization regardless of the input configuration could lead to unexpected behavior for existing users. - Action: Clearly document this behavior in the release notes and consider providing a way to disable built-in sanitization if needed.
- The note that
-
Trust Score Model Changes:
- The introduction of a 5-dimension trust model with specific scoring ranges may conflict with existing implementations that use a different scoring system.
- Action: Provide a migration guide for users who need to adapt their existing trust scoring systems to the new model.
💡 Suggestions for Improvement
-
Test Coverage:
- While the PR mentions that all APIs were verified against the source, it does not provide details on the test coverage for the new example server or the guide's code snippets.
- Action: Add automated tests for the example server and validate all code snippets in the guide as part of the CI pipeline.
-
Thread Safety:
- The guide does not address thread safety for components like
TrustProxyandMCPGateway, which may be used in concurrent environments. - Action: Document thread safety considerations and provide examples of how to use these components in multi-threaded or async environments.
- The guide does not address thread safety for components like
-
Type Safety:
- The guide does not mention whether the code snippets are type-checked using tools like
mypy. - Action: Ensure that all public APIs and code snippets are type-annotated and validated using
mypy. Add type-checking to the CI pipeline.
- The guide does not mention whether the code snippets are type-checked using tools like
-
Error Handling:
- The guide does not provide details on how errors are logged or propagated in the example server.
- Action: Include examples of error handling and logging best practices in the guide.
-
Vendor Neutrality:
- While the guide claims to avoid vendor-specific language, it mentions specific agent frameworks like GPT and Claude in the introduction.
- Action: Replace these references with more generic terms like "LLM-based agents" to maintain vendor neutrality.
-
Documentation Length:
- The guide is very detailed but also quite lengthy, which may overwhelm users.
- Action: Consider breaking the guide into separate files for each layer and providing a high-level overview with links to the detailed sections.
✅ Strengths
-
Comprehensive Coverage:
- The guide covers a wide range of topics, from basic authorization to advanced runtime interception, providing a clear path for incremental adoption.
-
Practical Examples:
- The inclusion of code snippets and a working example server makes it easier for users to understand and implement the concepts.
-
Security Awareness:
- The guide demonstrates a strong focus on addressing key security risks, including tool poisoning, rug pulls, and trust verification.
-
Well-Structured:
- The guide is logically organized into layers, making it easy for readers to follow and implement the concepts step-by-step.
Summary of Feedback
- 🔴 CRITICAL: Address fail-closed behavior, replay attack mitigation, rate-limiting robustness, confused deputy detection, key management, and audit log integrity.
- 🟡 WARNING: Document potential breaking changes in
wrap_mcp_serverand the trust score model. - 💡 SUGGESTION: Improve test coverage, document thread safety, ensure type safety, enhance error handling, and consider splitting the guide into smaller sections for better readability.
This PR is a significant addition to the repository, but the critical security issues must be addressed before merging. Once resolved, the guide and example server will provide a robust foundation for integrating governance and trust verification into MCP servers.
🤖 AI Agent: security-scanner — Security Review of Pull RequestSecurity Review of Pull RequestThis pull request introduces a comprehensive integration guide and example server for adding governance and trust verification to MCP (Model Context Protocol) servers using the Agent Governance Toolkit. While the changes are primarily documentation and example code, they touch on critical security layers of the toolkit. Below is a detailed security review based on the specified criteria. 1. Prompt Injection Defense Bypass
2. Policy Engine Circumvention
3. Trust Chain Weaknesses
4. Credential Exposure
5. Sandbox Escape
6. Deserialization Attacks
7. Race Conditions
8. Supply Chain
Overall AssessmentThis pull request provides a comprehensive and well-structured guide for integrating governance and trust verification into MCP servers. However, there are several areas where additional details or safeguards are needed to ensure the security of the system. The most critical issue is the lack of guidance on key management and rotation for the trust server, which could undermine the entire trust chain if not addressed. Summary of Findings
Suggested Next Steps
By addressing these issues, the integration guide and example server can provide a robust foundation for secure MCP governance. |
|
The 4-layer progression is well-structured. A few observations from building cross-org agent trust infrastructure: Layer 2's 5-dimension trust scoring — the cross-org gap The trust scoring system described here works well within a single governance boundary (one organization's agents, one policy engine). The harder problem is: what happens when Agent A from Org X needs to operate on Org Y's MCP servers? Org Y's governance toolkit has no behavioral history for Agent A. The trust score starts at zero. This is the cold-start problem that every within-org trust system faces at org boundaries. Two approaches to bridging this:
The agentfolio-mcp-server implements approach (2) as an MCP server — it could slot into Layer 2 as an additional trust dimension: Layer 3's tool poisoning detection — connection to OWASP #802 The rug-pull fingerprinting and schema abuse detection map directly to the runtime enforcement discussion happening on OWASP #802, where the community is converging on a 5-layer governance architecture (authorization → execution evidence → mutation authority → boundary integrity → behavioral trust). The toolkit's deterministic policy enforcement aligns with the "strong enforceability" tier in that classification. Great work making this framework-agnostic from day one — that's the right architectural decision for ecosystem adoption. |
imran-siddique
left a comment
There was a problem hiding this comment.
Excellent docs work — all APIs verified against source. Two items to address:
- License headers — add <!-- Copyright (c) Microsoft Corporation. Licensed under the MIT License. -->\ as line 1 of both .md files (server.py already has it)
- Security warning on demo — the example server accepts trust_score as a client-supplied tool arg, which means an LLM can fabricate any score. Add a prominent DEMO ONLY warning in server.py and the guide
Also recommend: pass agent_capabilities in the end-to-end governed_tool_call example so capability checks aren't silently skipped.
|
Thanks for the thoughtful analysis. The cross-org cold-start problem is real and worth calling out. This guide covers within-org governance (single trust boundary, single policy engine). Cross-org trust federation is a different architectural problem that would warrant its own design proposal. The OWASP #802 reference is useful context. The 5-layer governance architecture discussion aligns well with how the toolkit separates authorization (proxy), integrity verification (scanner), and runtime enforcement (gateway). Worth tracking as that standard evolves. If the cross-org trust gap is something you'd like to see addressed, filing a feature request issue would be the right next step so the maintainers can evaluate it against the roadmap. |
- Add Microsoft copyright headers to both .md files - Add DEMO ONLY warning to example server (trust_score is client-supplied, not from a verified source) - Pass agent_capabilities in end-to-end governed_tool_call example - Reframe layers as composable governance concerns (authorization, identity, integrity, enforcement) rather than a fixed-count taxonomy
|
All three items addressed in ffd929f:
Also reframed the intro to describe "composable governance layers" covering authorization, identity, integrity, and enforcement rather than a fixed four-layer count. |
There was a problem hiding this comment.
🤖 AI Agent: code-reviewer
Review of Pull Request: docs: MCP + Trust Verification integration guide
🔴 CRITICAL: Security Issues
-
Insufficient Details on Cryptographic Handshake Implementation
- The guide mentions cryptographic handshakes and Ed25519 identity but does not provide sufficient details on how these are implemented. Without clear documentation, it's difficult to verify if the cryptographic operations are implemented securely. For example:
- How are private keys stored and protected?
- What is the exact handshake protocol? Are there protections against replay attacks, man-in-the-middle attacks, or key compromise?
- Actionable Recommendation: Provide a detailed explanation of the cryptographic handshake process, including key generation, storage, exchange, and validation mechanisms. Ensure that best practices for cryptographic operations are followed.
- The guide mentions cryptographic handshakes and Ed25519 identity but does not provide sufficient details on how these are implemented. Without clear documentation, it's difficult to verify if the cryptographic operations are implemented securely. For example:
-
Potential Replay Attack in Handshake Flow
- The handshake flow described in the guide does not mention any mechanism to prevent replay attacks. For example, there is no mention of time-based expiration for the challenge nonce or how it is tied to a specific session.
- Actionable Recommendation: Ensure that the challenge nonce is unique per session and has a short expiration time. Document this in the guide to provide clarity on how replay attacks are mitigated.
-
Lack of Explicit Fail-Closed Behavior in Trust Proxy
- While the
MCPGatewayis explicitly described as "fail-closed," theTrustProxydoes not mention fail-closed behavior. This could lead to potential security bypasses if an unexpected exception occurs during the authorization process. - Actionable Recommendation: Ensure that the
TrustProxyis implemented with fail-closed semantics and document this behavior in the guide.
- While the
-
Potential for Misuse of
ApprovalStatus.PENDING- The
ApprovalStatus.PENDINGstate in theMCPGatewaycould lead to security vulnerabilities if not handled properly. For example, if the approval callback fails to respond or is misconfigured, the system might inadvertently allow or deny access. - Actionable Recommendation: Clearly document the behavior of the system when the approval callback fails or returns
PENDING. Consider implementing a timeout mechanism or a default action for such cases.
- The
-
Insufficient Details on Tool Poisoning Detection
- The guide mentions various threat types (e.g.,
TOOL_POISONING,RUG_PULL,CONFUSED_DEPUTY) but does not provide details on how these threats are detected. For example, what specific patterns are used to detect prompt injection or schema abuse? - Actionable Recommendation: Provide more details on the detection mechanisms for each threat type. This will help users understand the limitations and potential false positives/negatives of the security scanner.
- The guide mentions various threat types (e.g.,
🟡 WARNING: Potential Breaking Changes
wrap_mcp_serverBehavior Change- The guide mentions that
wrap_mcp_server()always enables built-in sanitization regardless of the input configuration. This could lead to unexpected behavior for users who are upgrading from a previous version and expect their existing configurations to remain unchanged. - Actionable Recommendation: Clearly document this behavior as a breaking change in the release notes and provide guidance on how users can adapt their configurations if needed.
- The guide mentions that
💡 Suggestions for Improvement
-
Clarify the Role of
TrustGatedMCPServer- The guide introduces
TrustGatedMCPServeras "Layer 2.5," but its relationship with the other layers is not entirely clear. For example, does it replace theTrustProxyandTrustServer, or is it meant to be used in conjunction with them? - Actionable Recommendation: Add a section to the guide that explains when and why a user might choose
TrustGatedMCPServerover the other layers, and how it integrates with the overall governance pipeline.
- The guide introduces
-
Provide Examples for All Key Features
- While the guide is comprehensive, some features are only described in text without accompanying code examples. For instance:
- Cryptographic handshake flow
- Delegation chain verification
- Actionable Recommendation: Include code examples for all key features to make the guide more actionable for developers.
- While the guide is comprehensive, some features are only described in text without accompanying code examples. For instance:
-
Clarify the Use of
blocked_patterns- The guide mentions
blocked_patternsin theMCPGatewayconfiguration but does not provide details on the pattern syntax or examples of common patterns (e.g., regex for SQL injection or XSS). - Actionable Recommendation: Add a section explaining the
blocked_patternssyntax and provide examples of common patterns that users might want to block.
- The guide mentions
-
Backward Compatibility Testing
- The guide states that "no new dependencies in toolkit packages" were introduced, but it does not mention whether backward compatibility with existing MCP servers and clients was tested.
- Actionable Recommendation: Include a note in the test plan explicitly stating that backward compatibility with existing MCP servers and clients has been verified.
-
Thread Safety
- The guide does not mention whether the components (e.g.,
TrustProxy,MCPGateway,TrustGatedMCPServer) are thread-safe. This is particularly important for concurrent agent execution. - Actionable Recommendation: Clarify the thread safety guarantees of each component in the guide. If any components are not thread-safe, provide guidance on how to use them safely in a concurrent environment.
- The guide does not mention whether the components (e.g.,
-
OWASP Agentic Top 10 Compliance
- The guide addresses several OWASP Agentic Top 10 risks (e.g., ASI01, ASI02), but it does not explicitly mention compliance with other risks, such as ASI03 (Data Leakage) or ASI05 (Supply Chain Vulnerabilities).
- Actionable Recommendation: Map the features of the toolkit to the OWASP Agentic Top 10 risks and include this mapping in the guide. This will help users understand how the toolkit addresses these risks.
-
Type Annotations and Pydantic Validation
- The guide does not mention whether the input schemas for tools are validated using Pydantic models. This is important for type safety and preventing schema abuse.
- Actionable Recommendation: Ensure that all input schemas are validated using Pydantic models and document this in the guide.
Summary of Feedback
- 🔴 CRITICAL: Address security issues related to cryptographic operations, replay attacks, fail-closed behavior, and threat detection mechanisms.
- 🟡 WARNING: Document the breaking change in
wrap_mcp_serverbehavior. - 💡 SUGGESTION: Improve documentation clarity, provide more examples, and address thread safety and OWASP compliance explicitly.
Please address the critical issues as a priority, as they may lead to security vulnerabilities if left unresolved.
|
We've built a working MCP + capability token integration — happy to share the approach for the guide. SINT Protocol's TAM implementation ( The Tool Authorization Manifest (TAM) defines per-tool security requirements — what token scope, what approval tier, what physical constraints are needed before a tool call executes. The MCP server registers tools with their manifest; the bridge validates inbound tool calls against it before forwarding to the handler. // TAM example — defines requirements for each MCP tool
const ROBOT_MANIFEST: ToolAuthorizationManifest = {
toolName: "move_arm",
requiredScope: "robot:actuate",
approvalTier: "T2_act", // requires human review
constraints: {
maxVelocityMps: 0.5,
maxForceNewtons: 50,
requiresHumanPresence: false,
},
escalateOnHumanPresence: true,
};
// Bridge intercepts tool call, validates token against manifest
const result = await validateAgainstTam(token, request, ROBOT_MANIFEST);
// result: { ok: true } | { ok: false, violations: [...] }Trust verification flow: The tier system is what makes this different from simple OAuth scopes: a OWASP Agentic Top 10 coverage from this layer:
Source: https://github.com/pshkv/sint-protocol/tree/master/packages/bridge-mcp Happy to contribute a section to the guide or provide a working example that integrates with |
|
@pshkv — the TAM tier system (T0-T3) is a clean enforcement model. The escalation from auto-allow (T0 observe) to M-of-N quorum (T3 commit) maps well to the enforceability classification emerging across the ecosystem. One extension that makes the tier assignment dynamic rather than static: behavioral trust as an input to tier selection. Currently, the tier is fixed per tool in the manifest ( The integration point: // Current: static tier from manifest
const tier = ROBOT_MANIFEST.approvalTier; // always "T2_act"
// Enhanced: tier adjusted by agent trust score
const trust = await queryAgentTrust(token.agentId);
const tier = trust.score > 80
? demoteTier(ROBOT_MANIFEST.approvalTier) // T2 → T1 for trusted agents
: trust.score < 30
? promoteTier(ROBOT_MANIFEST.approvalTier) // T2 → T3 for untrusted
: ROBOT_MANIFEST.approvalTier; // default for medium trustTrusted agents get faster execution (fewer human approvals). Untrusted agents get stricter gates. The manifest defines the baseline tier; the trust score adjusts it within bounds. This connects to the cross-org gap from my earlier comment: when Agent A from Org X calls a tool on Org Y's server, Org Y's TAM has no history with Agent A. The trust score provides that missing signal — portable reputation that informs the tier decision without Org Y needing to manage per-agent state. The agentfolio-mcp-server provides the |
Summary
Integration guide and working example server showing how to add governance and trust verification to any MCP server.
Guide (
docs/integrations/mcp-trust-guide.md) — 4-layer progression:Plus TrustGatedMCPServer (AgentMesh embedded alternative), end-to-end flow composition, and MCP client integration.
Example server (
examples/mcp-trust-verified-server/) — runnable FastMCP server with 3 tools at escalating trust thresholds (300/600/800), security scanner fingerprinting, fail-closed authorization, and audit logging.All APIs verified against source across two independent review passes. Corrections applied: blocked_patterns tuple form, private import warnings, orphaned error message, fail-closed trust_score preservation, CONFUSED_DEPUTY threat type qualifier, wrap_mcp_server limitation note, vendor-neutral client language.
Closes #707
Test plan
python examples/mcp-trust-verified-server/server.py