Skip to content

docs: MCP + Trust Verification integration guide#747

Merged
imran-siddique merged 2 commits intomicrosoft:mainfrom
MythologIQ:docs/mcp-trust-guide
Apr 5, 2026
Merged

docs: MCP + Trust Verification integration guide#747
imran-siddique merged 2 commits intomicrosoft:mainfrom
MythologIQ:docs/mcp-trust-guide

Conversation

@MythologIQ
Copy link
Copy Markdown
Contributor

Summary

Integration guide and working example server showing how to add governance and trust verification to any MCP server.

Guide (docs/integrations/mcp-trust-guide.md) — 4-layer progression:

  • Layer 1: Trust Proxy — per-tool authorization, capability gating, rate limiting
  • Layer 2: Trust Server — 5-dimension trust scoring, Ed25519 identity, cryptographic handshakes, delegation verification
  • Layer 3: Security Scanner — tool poisoning detection, rug-pull fingerprinting, schema abuse, cross-server attacks
  • Layer 4: MCP Gateway — 5-stage runtime interception pipeline with fail-closed semantics

Plus TrustGatedMCPServer (AgentMesh embedded alternative), end-to-end flow composition, and MCP client integration.

Example server (examples/mcp-trust-verified-server/) — runnable FastMCP server with 3 tools at escalating trust thresholds (300/600/800), security scanner fingerprinting, fail-closed authorization, and audit logging.

All APIs verified against source across two independent review passes. Corrections applied: blocked_patterns tuple form, private import warnings, orphaned error message, fail-closed trust_score preservation, CONFUSED_DEPUTY threat type qualifier, wrap_mcp_server limitation note, vendor-neutral client language.

Closes #707

Test plan

  • All code blocks parse as valid Python 3.10+
  • All import paths resolve to real symbols (verified against source)
  • Cross-references resolve: guide ↔ example ↔ Tutorial 07 ↔ Tutorial 27
  • Example server runs: python examples/mcp-trust-verified-server/server.py
  • No vendor-specific language — MCP client references are framework-agnostic
  • No new dependencies in toolkit packages

Add layered guide showing how to add governance and trust verification
to any MCP server, plus a working example server.

Guide covers 4 layers:
- Layer 1: Trust Proxy — authorization, per-tool policies, rate limiting
- Layer 2: Trust Server — identity, 5-dimension trust scoring, handshakes
- Layer 3: Security Scanner — tool poisoning, rug pulls, schema abuse
- Layer 4: MCP Gateway — 5-stage runtime interception pipeline

Plus TrustGatedMCPServer (AgentMesh), end-to-end flow composition,
and Claude Desktop integration.

Example server: 3 FastMCP tools with escalating trust thresholds,
security scanner fingerprinting, fail-closed authorization, and
audit logging.

Closes microsoft#707
@github-actions github-actions bot added the size/XL Extra large PR (500+ lines) label Apr 3, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 3, 2026

🤖 AI Agent: contributor-guide — 🌟 What You Did Well

Hi there, and welcome to the microsoft/agent-governance-toolkit community! 🎉 Thank you for taking the time to contribute — we’re thrilled to have you here. This is a fantastic first pull request, and I can already see the effort and thoughtfulness you’ve put into it. Let’s dive in!


🌟 What You Did Well

  1. Comprehensive Documentation: The integration guide is incredibly detailed and well-structured. Breaking it into four layers with clear explanations and examples makes it very approachable for users at different levels of expertise.
  2. Runnable Example Server: Including a working example (examples/mcp-trust-verified-server/) is a huge plus! It’s always helpful for users to see concepts in action.
  3. Security Awareness: You’ve clearly thought through security concerns, from trust scoring to fail-closed semantics and tool poisoning detection. This aligns perfectly with the goals of this toolkit.
  4. Thorough Testing Plan: Your test plan is detailed and well-considered, ensuring that the new additions are robust and reliable.
  5. Clear Cross-Referencing: Linking the guide, example, and tutorials (Tutorial 07 and Tutorial 27) shows great attention to detail and helps users navigate the project.

🛠 Suggestions for Improvement

While this is an excellent start, there are a few areas where we can refine things further:

1. Linting

  • We use ruff for linting with the E, F, and W rule sets. Please run ruff on your changes to ensure they adhere to the project’s style guidelines. You can install and run it like this:
    pip install ruff
    ruff check .
  • For example, ensure there are no unused imports, and that line lengths are within the recommended limit.

2. Testing Location

  • Tests for new functionality should go under packages/{name}/tests/. For this PR, it looks like the example server could benefit from automated tests to validate its behavior. Consider adding tests for:
    • Trust threshold enforcement.
    • Security scanner threat detection.
    • Fail-closed behavior in the gateway.
  • This will help ensure the example server remains functional as the project evolves.

3. Commit Message Conventions

  • We follow the Conventional Commits standard. Your commit message should start with a prefix like docs:, feat:, or fix:. For example:
    docs: add MCP + Trust Verification integration guide
    
  • If you have multiple commits, consider squashing them into a single commit with a clear message.

4. Security-Sensitive Code

  • Since this PR introduces security-sensitive functionality (e.g., trust scoring, cryptographic handshakes, and fail-closed mechanisms), it will undergo extra scrutiny. A few things to double-check:
    • Are all cryptographic operations using secure, well-vetted libraries?
    • Are there any edge cases where the fail-closed behavior might not trigger as expected?
    • Are there sufficient safeguards against injection attacks in the example server?

5. Documentation Length

  • While the guide is thorough, it’s quite lengthy. Consider splitting it into smaller, more focused sections or linking to external resources for advanced topics. For example:
    • A separate guide for the example server.
    • A dedicated page for the security scanner’s threat types and usage.

📚 Helpful Resources

Here are some resources to help you make the requested changes:


🔄 Next Steps

  1. Run ruff and address any linting issues.
  2. Add tests for the example server under packages/{name}/tests/.
  3. Review your commit messages and ensure they follow the Conventional Commits format.
  4. Double-check the security-sensitive parts of your code for potential vulnerabilities.
  5. (Optional) Consider breaking the documentation into smaller sections for better readability.

Once you’ve made these updates, push your changes to this branch. The CI/CD pipeline will automatically re-run, and we’ll review your PR again. If you have any questions or need help, don’t hesitate to ask — we’re here to support you!

Thank you again for your contribution. We’re excited to collaborate with you and look forward to seeing your updates! 🚀

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review of PR: docs: MCP + Trust Verification integration guide

This pull request introduces a comprehensive integration guide and an example server for adding governance and trust verification to MCP servers. The guide is well-structured, covering four progressive layers of trust and governance. Below is a detailed review of the PR, focusing on the specified areas of concern.


🔴 CRITICAL: Security Issues

  1. Fail-Closed Behavior in Trust Proxy and Gateway:

    • The TrustProxy and MCPGateway components claim to be fail-closed, but there is no explicit test coverage or implementation details provided to verify this behavior. If an exception occurs during the authorize() or intercept_tool_call() methods, the system must guarantee that the call is denied.
    • Action: Add explicit tests to simulate exceptions in these methods and verify that the system denies the call in such scenarios.
  2. Replay Attack Mitigation in Trust Handshake:

    • The trust handshake mechanism described in Layer 2 does not explicitly mention how replay attacks are mitigated. While the use of a challenge nonce is a good start, it is unclear if the nonce is tied to a specific session or if it is time-bound.
    • Action: Ensure that the nonce is unique per session and has a limited validity period. Document the mechanism for replay attack prevention in the guide.
  3. Rate Limiting in Trust Proxy:

    • The TrustProxy supports rate limiting, but there is no mention of how this is implemented or whether it is resistant to bypass techniques (e.g., using multiple DIDs or IP addresses).
    • Action: Clarify the implementation details of rate limiting and consider adding IP-based rate limiting or other mechanisms to prevent abuse.
  4. Tool Poisoning Detection:

    • The MCPSecurityScanner includes a CONFUSED_DEPUTY threat type but explicitly states that it has no built-in detection. This is a significant gap, as confused deputy attacks are a critical risk in multi-agent systems.
    • Action: Implement detection for CONFUSED_DEPUTY attacks or provide detailed guidance on how users can define custom rules to mitigate this risk.
  5. Cryptographic Key Management:

    • The guide mentions Ed25519 keys for identity and cryptographic handshakes but does not provide details on key rotation, storage, or revocation.
    • Action: Include a section in the guide on best practices for key management, including secure storage, rotation, and revocation.
  6. Audit Log Integrity:

    • The audit logs in TrustProxy, MCPGateway, and MCPSecurityScanner are critical for accountability. However, there is no mention of mechanisms to ensure the integrity and immutability of these logs.
    • Action: Recommend or implement a mechanism (e.g., hash chaining or signing) to ensure that audit logs cannot be tampered with.

🟡 WARNING: Potential Breaking Changes

  1. Backward Compatibility of wrap_mcp_server:

    • The note that wrap_mcp_server() always enables built-in sanitization regardless of the input configuration could lead to unexpected behavior for existing users.
    • Action: Clearly document this behavior in the release notes and consider providing a way to disable built-in sanitization if needed.
  2. Trust Score Model Changes:

    • The introduction of a 5-dimension trust model with specific scoring ranges may conflict with existing implementations that use a different scoring system.
    • Action: Provide a migration guide for users who need to adapt their existing trust scoring systems to the new model.

💡 Suggestions for Improvement

  1. Test Coverage:

    • While the PR mentions that all APIs were verified against the source, it does not provide details on the test coverage for the new example server or the guide's code snippets.
    • Action: Add automated tests for the example server and validate all code snippets in the guide as part of the CI pipeline.
  2. Thread Safety:

    • The guide does not address thread safety for components like TrustProxy and MCPGateway, which may be used in concurrent environments.
    • Action: Document thread safety considerations and provide examples of how to use these components in multi-threaded or async environments.
  3. Type Safety:

    • The guide does not mention whether the code snippets are type-checked using tools like mypy.
    • Action: Ensure that all public APIs and code snippets are type-annotated and validated using mypy. Add type-checking to the CI pipeline.
  4. Error Handling:

    • The guide does not provide details on how errors are logged or propagated in the example server.
    • Action: Include examples of error handling and logging best practices in the guide.
  5. Vendor Neutrality:

    • While the guide claims to avoid vendor-specific language, it mentions specific agent frameworks like GPT and Claude in the introduction.
    • Action: Replace these references with more generic terms like "LLM-based agents" to maintain vendor neutrality.
  6. Documentation Length:

    • The guide is very detailed but also quite lengthy, which may overwhelm users.
    • Action: Consider breaking the guide into separate files for each layer and providing a high-level overview with links to the detailed sections.

✅ Strengths

  1. Comprehensive Coverage:

    • The guide covers a wide range of topics, from basic authorization to advanced runtime interception, providing a clear path for incremental adoption.
  2. Practical Examples:

    • The inclusion of code snippets and a working example server makes it easier for users to understand and implement the concepts.
  3. Security Awareness:

    • The guide demonstrates a strong focus on addressing key security risks, including tool poisoning, rug pulls, and trust verification.
  4. Well-Structured:

    • The guide is logically organized into layers, making it easy for readers to follow and implement the concepts step-by-step.

Summary of Feedback

  • 🔴 CRITICAL: Address fail-closed behavior, replay attack mitigation, rate-limiting robustness, confused deputy detection, key management, and audit log integrity.
  • 🟡 WARNING: Document potential breaking changes in wrap_mcp_server and the trust score model.
  • 💡 SUGGESTION: Improve test coverage, document thread safety, ensure type safety, enhance error handling, and consider splitting the guide into smaller sections for better readability.

This PR is a significant addition to the repository, but the critical security issues must be addressed before merging. Once resolved, the guide and example server will provide a robust foundation for integrating governance and trust verification into MCP servers.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 3, 2026

🤖 AI Agent: security-scanner — Security Review of Pull Request

Security Review of Pull Request

This pull request introduces a comprehensive integration guide and example server for adding governance and trust verification to MCP (Model Context Protocol) servers using the Agent Governance Toolkit. While the changes are primarily documentation and example code, they touch on critical security layers of the toolkit. Below is a detailed security review based on the specified criteria.


1. Prompt Injection Defense Bypass

  • Finding: 🔵 LOW
    • The documentation mentions that the MCPSecurityScanner detects prompt injection patterns in tool descriptions (DESCRIPTION_INJECTION threat type). However, the specific patterns or techniques used for detection are not detailed in the guide. This could lead to gaps in understanding for implementers who may rely on the scanner without fully understanding its limitations.
  • Attack Vector: If the scanner's detection patterns are incomplete or not updated to handle new prompt injection techniques, malicious actors could craft tool descriptions that bypass the scanner and exploit downstream agents.
  • Recommendation: Provide a detailed list of the prompt injection patterns detected by the MCPSecurityScanner and include guidance for extending or customizing these patterns to address emerging threats.

2. Policy Engine Circumvention

  • Finding: 🟠 HIGH
    • The Trust Proxy and MCP Gateway components rely on user-defined policies to enforce security. However, the documentation does not emphasize the importance of testing and validating these policies. Misconfigurations or overly permissive policies could lead to circumvention of security controls.
  • Attack Vector: A misconfigured policy (e.g., missing a required capability or setting an overly low trust threshold) could allow unauthorized agents to access sensitive tools or bypass rate limits.
  • Recommendation: Include explicit warnings and examples of common misconfigurations in the documentation. Provide a validation tool or script to help users test their policies for potential weaknesses.

3. Trust Chain Weaknesses

  • Finding: 🔴 CRITICAL
    • The Trust Server implements cryptographic handshakes and delegation chain verification but does not provide sufficient details about how these processes are secured. For example, the documentation does not specify how the Ed25519 keys are managed, rotated, or revoked.
  • Attack Vector: If an attacker compromises an agent's private key or if key rotation is not implemented, the entire trust chain could be undermined, allowing unauthorized agents to impersonate trusted entities.
  • Recommendation: Include detailed guidance on key management best practices, such as secure storage, periodic rotation, and revocation mechanisms. Additionally, provide examples of how to implement these practices using the toolkit.

4. Credential Exposure

  • Finding: 🔵 LOW
    • The documentation does not explicitly mention whether sensitive information (e.g., agent DIDs, trust scores, or tool parameters) is logged in plaintext in the audit logs.
  • Attack Vector: If sensitive information is logged in plaintext, it could be exposed to unauthorized users with access to the logs.
  • Recommendation: Add a note in the documentation about configuring logging to avoid storing sensitive information in plaintext. Consider implementing a default behavior in the toolkit to redact sensitive data from logs.

5. Sandbox Escape

  • Finding: 🔵 LOW
    • The MCPGateway includes parameter sanitization to prevent shell injection and PII leakage. However, the documentation does not specify the comprehensiveness of the built-in sanitization rules.
  • Attack Vector: If the sanitization rules are incomplete or do not cover all potential attack vectors, malicious input could lead to command injection or data leakage.
  • Recommendation: Provide a detailed list of the built-in sanitization rules and guidance for extending them. Include examples of common attack patterns and how they are mitigated.

6. Deserialization Attacks

  • Finding: 🟡 MEDIUM
    • The example server and tools use JSON for schema validation and data exchange. While JSON is generally safer than formats like pickle or YAML, the documentation does not mention any safeguards against malicious or malformed JSON payloads.
  • Attack Vector: Malicious JSON payloads could exploit vulnerabilities in downstream parsers or cause denial-of-service attacks by consuming excessive resources.
  • Recommendation: Add a note in the documentation about validating and sanitizing JSON payloads before processing. Consider including examples of safe JSON parsing practices.

7. Race Conditions

  • Finding: 🟠 HIGH
    • The Trust Proxy and MCP Gateway enforce rate limits and call budgets on a per-agent basis. However, the documentation does not mention whether these checks are thread-safe or how they handle concurrent requests.
  • Attack Vector: A race condition in rate-limiting logic could allow an attacker to exceed their call budget by sending concurrent requests.
  • Recommendation: Clarify in the documentation whether the rate-limiting logic is thread-safe. If not, provide guidance on deploying the proxy or gateway in a way that ensures thread safety (e.g., using a single-threaded event loop or external rate-limiting middleware).

8. Supply Chain

  • Finding: 🟡 MEDIUM
    • The guide introduces several new dependencies (mcp-trust-proxy, mcp-trust-server, agent-os-kernel, agentmesh-platform) without providing information about their security posture or how they are maintained.
  • Attack Vector: If any of these dependencies are compromised (e.g., through dependency confusion or typosquatting), it could introduce vulnerabilities into the system.
  • Recommendation: Include a section in the documentation about verifying the integrity of dependencies. Encourage users to pin dependency versions and use tools like pip-audit to check for known vulnerabilities.

Overall Assessment

This pull request provides a comprehensive and well-structured guide for integrating governance and trust verification into MCP servers. However, there are several areas where additional details or safeguards are needed to ensure the security of the system. The most critical issue is the lack of guidance on key management and rotation for the trust server, which could undermine the entire trust chain if not addressed.

Summary of Findings

Finding Severity Recommendation
Prompt injection defense bypass 🔵 LOW Document detection patterns and provide guidance for extending them.
Policy engine circumvention 🟠 HIGH Warn about misconfigurations and provide a policy validation tool.
Trust chain weaknesses 🔴 CRITICAL Add key management, rotation, and revocation guidance.
Credential exposure 🔵 LOW Ensure sensitive data is not logged in plaintext.
Sandbox escape 🔵 LOW Document built-in sanitization rules and provide guidance for extending them.
Deserialization attacks 🟡 MEDIUM Add guidance on validating and sanitizing JSON payloads.
Race conditions 🟠 HIGH Ensure rate-limiting logic is thread-safe and document best practices.
Supply chain 🟡 MEDIUM Include information on dependency security and encourage version pinning.

Suggested Next Steps

  1. Address the critical finding related to trust chain weaknesses by providing detailed key management guidance.
  2. Update the documentation to include more details on the security features and their limitations (e.g., prompt injection patterns, sanitization rules).
  3. Consider adding a policy validation tool or script to help users avoid common misconfigurations.
  4. Ensure that rate-limiting and audit logging mechanisms are thread-safe and document any necessary deployment considerations.
  5. Review the security of new dependencies and provide guidance for users to verify their integrity.

By addressing these issues, the integration guide and example server can provide a robust foundation for secure MCP governance.

@0xbrainkid
Copy link
Copy Markdown

The 4-layer progression is well-structured. A few observations from building cross-org agent trust infrastructure:

Layer 2's 5-dimension trust scoring — the cross-org gap

The trust scoring system described here works well within a single governance boundary (one organization's agents, one policy engine). The harder problem is: what happens when Agent A from Org X needs to operate on Org Y's MCP servers?

Org Y's governance toolkit has no behavioral history for Agent A. The trust score starts at zero. This is the cold-start problem that every within-org trust system faces at org boundaries.

Two approaches to bridging this:

  1. Portable trust scores — Agent A carries a verifiable trust record from its home org that Org Y's gateway can validate. This requires a common trust evidence format and independent verification (not just "Agent A claims trust score 85").

  2. On-chain behavioral anchoring — Trust evidence is recorded on a public ledger (e.g., Solana via SATP), independently verifiable by any party. Org Y queries the agent's cross-org reputation without trusting Org X's attestation.

The agentfolio-mcp-server implements approach (2) as an MCP server — it could slot into Layer 2 as an additional trust dimension: cross_org_reputation alongside the 5 existing dimensions.

Layer 3's tool poisoning detection — connection to OWASP #802

The rug-pull fingerprinting and schema abuse detection map directly to the runtime enforcement discussion happening on OWASP #802, where the community is converging on a 5-layer governance architecture (authorization → execution evidence → mutation authority → boundary integrity → behavioral trust). The toolkit's deterministic policy enforcement aligns with the "strong enforceability" tier in that classification.

Great work making this framework-agnostic from day one — that's the right architectural decision for ecosystem adoption.

Copy link
Copy Markdown
Member

@imran-siddique imran-siddique left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent docs work — all APIs verified against source. Two items to address:

  1. License headers — add <!-- Copyright (c) Microsoft Corporation. Licensed under the MIT License. -->\ as line 1 of both .md files (server.py already has it)
  2. Security warning on demo — the example server accepts trust_score as a client-supplied tool arg, which means an LLM can fabricate any score. Add a prominent DEMO ONLY warning in server.py and the guide

Also recommend: pass agent_capabilities in the end-to-end governed_tool_call example so capability checks aren't silently skipped.

@MythologIQ
Copy link
Copy Markdown
Contributor Author

Thanks for the thoughtful analysis. The cross-org cold-start problem is real and worth calling out. This guide covers within-org governance (single trust boundary, single policy engine). Cross-org trust federation is a different architectural problem that would warrant its own design proposal.

The OWASP #802 reference is useful context. The 5-layer governance architecture discussion aligns well with how the toolkit separates authorization (proxy), integrity verification (scanner), and runtime enforcement (gateway). Worth tracking as that standard evolves.

If the cross-org trust gap is something you'd like to see addressed, filing a feature request issue would be the right next step so the maintainers can evaluate it against the roadmap.

- Add Microsoft copyright headers to both .md files
- Add DEMO ONLY warning to example server (trust_score is client-supplied,
  not from a verified source)
- Pass agent_capabilities in end-to-end governed_tool_call example
- Reframe layers as composable governance concerns (authorization,
  identity, integrity, enforcement) rather than a fixed-count taxonomy
@MythologIQ
Copy link
Copy Markdown
Contributor Author

All three items addressed in ffd929f:

  1. License headers added as line 1 of both .md files
  2. DEMO ONLY warning added to server.py docstring and README.md. In production, agent identity and trust scores must come from a verified source, not from the calling agent itself.
  3. agent_capabilities added to governed_tool_call() and passed to proxy.authorize() so capability checks are not silently skipped

Also reframed the intro to describe "composable governance layers" covering authorization, identity, integrity, and enforcement rather than a fixed four-layer count.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review of Pull Request: docs: MCP + Trust Verification integration guide


🔴 CRITICAL: Security Issues

  1. Insufficient Details on Cryptographic Handshake Implementation

    • The guide mentions cryptographic handshakes and Ed25519 identity but does not provide sufficient details on how these are implemented. Without clear documentation, it's difficult to verify if the cryptographic operations are implemented securely. For example:
      • How are private keys stored and protected?
      • What is the exact handshake protocol? Are there protections against replay attacks, man-in-the-middle attacks, or key compromise?
    • Actionable Recommendation: Provide a detailed explanation of the cryptographic handshake process, including key generation, storage, exchange, and validation mechanisms. Ensure that best practices for cryptographic operations are followed.
  2. Potential Replay Attack in Handshake Flow

    • The handshake flow described in the guide does not mention any mechanism to prevent replay attacks. For example, there is no mention of time-based expiration for the challenge nonce or how it is tied to a specific session.
    • Actionable Recommendation: Ensure that the challenge nonce is unique per session and has a short expiration time. Document this in the guide to provide clarity on how replay attacks are mitigated.
  3. Lack of Explicit Fail-Closed Behavior in Trust Proxy

    • While the MCPGateway is explicitly described as "fail-closed," the TrustProxy does not mention fail-closed behavior. This could lead to potential security bypasses if an unexpected exception occurs during the authorization process.
    • Actionable Recommendation: Ensure that the TrustProxy is implemented with fail-closed semantics and document this behavior in the guide.
  4. Potential for Misuse of ApprovalStatus.PENDING

    • The ApprovalStatus.PENDING state in the MCPGateway could lead to security vulnerabilities if not handled properly. For example, if the approval callback fails to respond or is misconfigured, the system might inadvertently allow or deny access.
    • Actionable Recommendation: Clearly document the behavior of the system when the approval callback fails or returns PENDING. Consider implementing a timeout mechanism or a default action for such cases.
  5. Insufficient Details on Tool Poisoning Detection

    • The guide mentions various threat types (e.g., TOOL_POISONING, RUG_PULL, CONFUSED_DEPUTY) but does not provide details on how these threats are detected. For example, what specific patterns are used to detect prompt injection or schema abuse?
    • Actionable Recommendation: Provide more details on the detection mechanisms for each threat type. This will help users understand the limitations and potential false positives/negatives of the security scanner.

🟡 WARNING: Potential Breaking Changes

  1. wrap_mcp_server Behavior Change
    • The guide mentions that wrap_mcp_server() always enables built-in sanitization regardless of the input configuration. This could lead to unexpected behavior for users who are upgrading from a previous version and expect their existing configurations to remain unchanged.
    • Actionable Recommendation: Clearly document this behavior as a breaking change in the release notes and provide guidance on how users can adapt their configurations if needed.

💡 Suggestions for Improvement

  1. Clarify the Role of TrustGatedMCPServer

    • The guide introduces TrustGatedMCPServer as "Layer 2.5," but its relationship with the other layers is not entirely clear. For example, does it replace the TrustProxy and TrustServer, or is it meant to be used in conjunction with them?
    • Actionable Recommendation: Add a section to the guide that explains when and why a user might choose TrustGatedMCPServer over the other layers, and how it integrates with the overall governance pipeline.
  2. Provide Examples for All Key Features

    • While the guide is comprehensive, some features are only described in text without accompanying code examples. For instance:
      • Cryptographic handshake flow
      • Delegation chain verification
    • Actionable Recommendation: Include code examples for all key features to make the guide more actionable for developers.
  3. Clarify the Use of blocked_patterns

    • The guide mentions blocked_patterns in the MCPGateway configuration but does not provide details on the pattern syntax or examples of common patterns (e.g., regex for SQL injection or XSS).
    • Actionable Recommendation: Add a section explaining the blocked_patterns syntax and provide examples of common patterns that users might want to block.
  4. Backward Compatibility Testing

    • The guide states that "no new dependencies in toolkit packages" were introduced, but it does not mention whether backward compatibility with existing MCP servers and clients was tested.
    • Actionable Recommendation: Include a note in the test plan explicitly stating that backward compatibility with existing MCP servers and clients has been verified.
  5. Thread Safety

    • The guide does not mention whether the components (e.g., TrustProxy, MCPGateway, TrustGatedMCPServer) are thread-safe. This is particularly important for concurrent agent execution.
    • Actionable Recommendation: Clarify the thread safety guarantees of each component in the guide. If any components are not thread-safe, provide guidance on how to use them safely in a concurrent environment.
  6. OWASP Agentic Top 10 Compliance

    • The guide addresses several OWASP Agentic Top 10 risks (e.g., ASI01, ASI02), but it does not explicitly mention compliance with other risks, such as ASI03 (Data Leakage) or ASI05 (Supply Chain Vulnerabilities).
    • Actionable Recommendation: Map the features of the toolkit to the OWASP Agentic Top 10 risks and include this mapping in the guide. This will help users understand how the toolkit addresses these risks.
  7. Type Annotations and Pydantic Validation

    • The guide does not mention whether the input schemas for tools are validated using Pydantic models. This is important for type safety and preventing schema abuse.
    • Actionable Recommendation: Ensure that all input schemas are validated using Pydantic models and document this in the guide.

Summary of Feedback

  • 🔴 CRITICAL: Address security issues related to cryptographic operations, replay attacks, fail-closed behavior, and threat detection mechanisms.
  • 🟡 WARNING: Document the breaking change in wrap_mcp_server behavior.
  • 💡 SUGGESTION: Improve documentation clarity, provide more examples, and address thread safety and OWASP compliance explicitly.

Please address the critical issues as a priority, as they may lead to security vulnerabilities if left unresolved.

@pshkv
Copy link
Copy Markdown

pshkv commented Apr 4, 2026

We've built a working MCP + capability token integration — happy to share the approach for the guide.

SINT Protocol's TAM implementation (@sint/bridge-mcp):

The Tool Authorization Manifest (TAM) defines per-tool security requirements — what token scope, what approval tier, what physical constraints are needed before a tool call executes. The MCP server registers tools with their manifest; the bridge validates inbound tool calls against it before forwarding to the handler.

// TAM example — defines requirements for each MCP tool
const ROBOT_MANIFEST: ToolAuthorizationManifest = {
  toolName: "move_arm",
  requiredScope: "robot:actuate",
  approvalTier: "T2_act",           // requires human review
  constraints: {
    maxVelocityMps: 0.5,
    maxForceNewtons: 50,
    requiresHumanPresence: false,
  },
  escalateOnHumanPresence: true,
};

// Bridge intercepts tool call, validates token against manifest
const result = await validateAgainstTam(token, request, ROBOT_MANIFEST);
// result: { ok: true } | { ok: false, violations: [...] }

Trust verification flow:

MCP client → tool call request
      ↓
TAM bridge (validateAgainstTam)
  ├── Token signature verification (Ed25519)
  ├── Token expiry + revocation check
  ├── Scope match: token.resource ⊇ manifest.requiredScope
  ├── Constraint check: token.constraints ≥ manifest.constraints
  └── Tier assignment → T0/T1 (auto) | T2/T3 (human approval required)
      ↓
MCP handler (only if all checks pass)

The tier system is what makes this different from simple OAuth scopes: a T3_commit tool (irreversible, high-stakes) requires explicit human sign-off with M-of-N quorum before execution. A T0_observe tool (read-only sensor data) is auto-allowed with audit logging.

OWASP Agentic Top 10 coverage from this layer:

  • Tool Misuse (OAT-05): TAM enforces per-tool constraints at the gate, not just at the LLM prompt layer
  • Identity Abuse (OAT-03): Ed25519 token chain cryptographically binds identity to permitted actions
  • Cascading Failures (OAT-08): Rate limit enforcement + T3 circuit-breaker prevents runaway tool execution

Source: https://github.com/pshkv/sint-protocol/tree/master/packages/bridge-mcp
63 tests covering the full validation pipeline.

Happy to contribute a section to the guide or provide a working example that integrates with agent-os.

@0xbrainkid
Copy link
Copy Markdown

@pshkv — the TAM tier system (T0-T3) is a clean enforcement model. The escalation from auto-allow (T0 observe) to M-of-N quorum (T3 commit) maps well to the enforceability classification emerging across the ecosystem.

One extension that makes the tier assignment dynamic rather than static: behavioral trust as an input to tier selection.

Currently, the tier is fixed per tool in the manifest (approvalTier: "T2_act"). But the same tool might warrant different tiers depending on which agent is calling it. An agent with 500 verified completions and no drift history is a different risk profile than a brand-new agent making its first call.

The integration point:

// Current: static tier from manifest
const tier = ROBOT_MANIFEST.approvalTier; // always "T2_act"

// Enhanced: tier adjusted by agent trust score
const trust = await queryAgentTrust(token.agentId);
const tier = trust.score > 80 
  ? demoteTier(ROBOT_MANIFEST.approvalTier)  // T2 → T1 for trusted agents
  : trust.score < 30
    ? promoteTier(ROBOT_MANIFEST.approvalTier)  // T2 → T3 for untrusted
    : ROBOT_MANIFEST.approvalTier;              // default for medium trust

Trusted agents get faster execution (fewer human approvals). Untrusted agents get stricter gates. The manifest defines the baseline tier; the trust score adjusts it within bounds.

This connects to the cross-org gap from my earlier comment: when Agent A from Org X calls a tool on Org Y's server, Org Y's TAM has no history with Agent A. The trust score provides that missing signal — portable reputation that informs the tier decision without Org Y needing to manage per-agent state.

The agentfolio-mcp-server provides the queryAgentTrust() call via MCP. Could slot into the TAM bridge between token validation and tier assignment.

@imran-siddique imran-siddique merged commit 7c86578 into microsoft:main Apr 5, 2026
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR (500+ lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🔗 Integration Guide: MCP (Model Context Protocol) + Trust Verification

4 participants