Note: This project is already fully compliant with MCP 2025-11-25. Reference this document only when making protocol-level changes.
Version: 1.0 MCP Protocol Revision: 2025-11-25 Purpose: Authoritative reference for development agents building MCP servers. Codifies protocol compliance requirements, production best practices, and common failure modes distilled from specification research, industry experience reports (Block, Docker, Philschmid/HuggingFace), and hands-on spec development.
The single most important concept in MCP server design is that your user is an LLM, not a human developer. Every design decision flows from this.
A REST API is designed for a developer who reads docs once, writes integration code, debugs it, and deploys. An MCP server is designed for an agent that discovers tools at runtime, interprets schemas within a finite context window, selects and invokes tools turn-by-turn, and must self-correct from errors without human intervention.
This has concrete consequences:
- Tool schemas compete for context window tokens. Every tool definition, every parameter description, every enum value — all of it occupies space that could otherwise hold user instructions, conversation history, or results from previous calls.
- Agents hallucinate structure. Given a
dictparameter, an LLM will invent key names. Given a free-form string where an enum would do, it will produce plausible but invalid values. - Multi-step orchestration is expensive and error-prone. Each chained tool call introduces latency, consumes tokens, and creates a potential failure point. Agents are improving at planning but remain unreliable beyond ~5 chained calls.
- Error messages are the agent's primary recovery mechanism. A good error message is an instruction the agent can act on. A bad one is a dead end that wastes a retry.
This section covers the non-negotiable structural requirements. Violating any of these will cause interoperability failures with conformant clients.
The server MUST implement the full initialization handshake:
- Client sends
initializewith its capabilities andprotocolVersion. - Server responds with its own capabilities,
protocolVersion,serverInfo, and optionallyinstructions. - Client sends
initializednotification. - Only after receiving
initializedmay the server processtools/callortools/listrequests. Implementations MUST gate on this. Handling tool calls before initialization is a protocol violation.
The initialize response MUST include:
{
"protocolVersion": "2025-11-25",
"capabilities": {
"tools": {
"listChanged": false
}
},
"serverInfo": {
"name": "your-server-name",
"version": "1.0.0",
"description": "One-sentence description for humans."
},
"instructions": "LLM-targeted guidance on when/how to use this server's tools. This is injected into the system prompt by many hosts."
}Key points:
serverInfo.nameis the machine identifier. Keep it short, lowercase, hyphenated.serverInfo.descriptionis for humans (UI display).instructionsis for the LLM. Write it as you would write a system prompt section — direct, imperative, specific. Many hosts (Claude Desktop, Goose, etc.) inject this directly into the model's context.
Every tool exposed via tools/list MUST include:
| Field | Required | Purpose |
|---|---|---|
name |
Yes | Machine identifier. 1–128 chars, [a-zA-Z0-9_\-.] only. Case-sensitive. |
description |
Yes | LLM-targeted explanation of what the tool does, when to use it, and what it returns. |
inputSchema |
Yes | JSON Schema (2020-12 default) defining accepted parameters. MUST be type: "object". |
title |
Recommended | Human-readable display name. Distinct from name. |
outputSchema |
Recommended | JSON Schema defining the structure of structuredContent in the response. |
annotations |
Recommended | Behavioral hints: readOnlyHint, destructiveHint, idempotentHint, openWorldHint. |
- Allowed characters:
A-Z,a-z,0-9,_,-,. - No spaces, commas, or special characters.
- SHOULD be unique within the server.
- For servers likely to run alongside others, use a service-prefixed pattern:
{service}_{action}_{resource}(e.g.,github_create_issue,slack_send_message). Some MCP clients auto-prefix with the server name, so avoid redundancy if you know your target host.
Every tools/call response MUST return a content array (for backward compatibility) and SHOULD return structuredContent (for typed consumption):
{
"content": [
{
"type": "text",
"text": "{\"key\": \"serialized JSON of structuredContent\"}"
}
],
"structuredContent": {
"key": "typed object matching outputSchema"
},
"isError": false
}Rules:
contentcontainsTextContentblocks with the serialized JSON. This is what older clients see.structuredContentis the typed object conforming tooutputSchema. This is what modern clients validate.- If
outputSchemais declared,structuredContentMUST conform to it. - For large responses where the serialized JSON exceeds a reasonable threshold (~20K characters), the
contenttext block SHOULD contain a summary or truncation note referencingstructuredContent, not the full serialization. This prevents context window overflow in hosts that injectcontentdirectly.
MCP distinguishes two error types. Getting this wrong is one of the most common implementation mistakes.
Protocol Errors — Standard JSON-RPC errors for structural/routing issues:
- Unknown tool name →
-32602(Invalid params) - Malformed request →
-32600(Invalid request) - Server bug →
-32603(Internal error)
{
"jsonrpc": "2.0",
"id": 3,
"error": {
"code": -32602,
"message": "Unknown tool: nonexistent_tool"
}
}Tool Execution Errors — Returned in the normal result envelope with isError: true:
- Input validation failures (date in wrong format, value out of range)
- Business logic errors (file not found, API rate limit)
- Partial failures (some items processed, others failed)
{
"content": [
{
"type": "text",
"text": "File not found at /path/to/image.png. Verify the path exists and is an absolute path to a supported image format (PNG, JPEG, WebP)."
}
],
"isError": true
}Critical: Input validation errors MUST be Tool Execution Errors, not Protocol Errors (SEP-1303). This is because Tool Execution Errors are reliably fed back to the LLM, enabling self-correction. Protocol Errors may be swallowed by the client or shown to the user instead. When a tool receives a valid JSON-RPC request with an invalid parameter value (e.g., k: -5), that's a Tool Execution Error.
Error responses MUST NOT include structuredContent. Only successful responses get structured output.
MCP 2025-11-25 establishes JSON Schema 2020-12 as the default dialect (SEP-1613). When inputSchema or outputSchema omit the $schema field, clients assume 2020-12. You MAY include $schema explicitly if you need a different dialect (e.g., draft-07), but 2020-12 is recommended.
Annotations are optional behavioral hints that clients use for UI decisions (e.g., auto-approving safe operations):
{
"annotations": {
"readOnlyHint": true,
"destructiveHint": false,
"idempotentHint": true,
"openWorldHint": false
}
}readOnlyHint: true— Tool does not modify state. Hosts like Goose and Claude Code may auto-approve these.destructiveHint— Only meaningful whenreadOnlyHintisfalse. Indicates potential for irreversible changes.idempotentHint: true— Repeated calls with identical arguments produce the same result. Enables safe retries.openWorldHint: true— Tool interacts with external entities (APIs, network).falsemeans it operates on local/contained data only.
Set these accurately. Incorrect annotations can lead to auto-approval of dangerous operations or unnecessary confirmation prompts for safe ones.
These practices are drawn from production experience at Block (60+ MCP servers), Docker (100+ catalog servers), and independent analysis. They are not protocol requirements — they are engineering recommendations that directly impact how well agents can use your tools.
The mistake: Exposing one MCP tool per REST endpoint or database query.
The fix: Design each tool around a complete user goal.
If fulfilling a request requires calling GET /users, then GET /orders/{id}, then GET /shipments/{id}, expose a single track_order(email) tool that does all three internally and returns a synthesized result. The LLM should not need to orchestrate multi-step data fetching.
This doesn't mean every server has one tool. It means every tool delivers a complete, actionable result for one class of question. A well-designed server typically has 3–15 tools, each covering a distinct goal.
The mistake: Accepting nested configuration objects.
// BAD: Agent must guess the nested structure
{
"filters": {
"status": "active",
"date_range": { "start": "2025-01-01", "end": "2025-06-01" }
}
}The fix: Top-level primitives with constrained types.
// GOOD: Every parameter is visible, typed, and constrained
{
"status": { "type": "string", "enum": ["active", "pending", "closed"] },
"date_start": { "type": "string", "format": "date" },
"date_end": { "type": "string", "format": "date" }
}LLMs reliably produce flat key-value pairs. They hallucinate nested keys, invent dictionary structures, and miss required sub-fields. Use enum for every parameter that has a finite set of valid values.
Exception: Arrays of structured items (e.g., a list of bounding boxes) are sometimes unavoidable. In these cases, keep the inner object as flat as possible and document the structure exhaustively in the parameter description.
This prevents agents from inventing extra parameters. Without it, an LLM might pass {"image_path": "/img.png", "quality": "high"} where quality doesn't exist, and the server silently ignores it — leading the agent to believe it influenced the output.
{
"type": "object",
"properties": { ... },
"required": ["image_path"],
"additionalProperties": false
}Caveat: For tools that accept passthrough data (e.g., generate_design_tokens accepting palette items with extra fields from upstream), deliberately omit additionalProperties: false on the inner array item schema and document why.
Tool and parameter descriptions are consumed by the LLM as part of its context. They are not API docs for humans — they are instructions for an agent. Write them accordingly.
Bad: "description": "The algorithm to use for clustering."
Good: "description": "Clustering algorithm. Use 'k-means' (default) for predictable results with a known color count. Use 'median-cut' for deterministic output without seed dependency. Use 'dbscan' when the number of distinct colors is unknown — it auto-discovers clusters and treats k as a maximum cap, not a target."
For the tool-level description, specify three things:
- When to use it: "Use this tool when you need to extract a color palette from a UI mockup screenshot."
- What it returns: "Returns an array of dominant colors with hex values, CSS strings, percentage coverage, and semantic role hints."
- How to format arguments: "image_path must be an absolute filesystem path. The calling agent is responsible for saving remote images to disk first."
The agent sees your error message and must decide what to do next. The message is a recovery instruction.
Bad: "Access denied."
Good: "Cannot read file at /path/to/image.png. The file does not exist or is not readable. Verify the absolute path is correct and the file is a supported format (PNG, JPEG, WebP, BMP, TIFF). If the image was downloaded from a URL, ensure it was fully saved to disk before invoking this tool."
Bad: "Invalid parameter."
Good: "Parameter 'k' must be an integer between 2 and 32. Received: -1. For most UI mockups, k=8 works well. Use higher values (12-16) for complex multi-color designs."
Every error message should answer: what went wrong, what the valid state looks like, and what the agent should do differently on retry.
Tool responses consume context window space. A response that returns 50KB of JSON will cripple the agent's ability to process subsequent turns.
Tactics:
- Cap output size. Set a
max_resultsormax_componentsparameter with a sensible default. When the cap is hit, include atruncated: trueflag andtotal_availablecount so the agent knows there's more. - Truncate text content. If the serialized JSON for
contentblocks exceeds ~20K characters, provide a summary string instead of the full serialization. The typed data is still available instructuredContent. - Paginate list operations. Return
has_more,next_cursor, andtotal_countmetadata. Never load unbounded result sets into memory. - Return only what's needed. If your backing API returns 50 fields per record, select the 5–8 that matter for the agent's task. Don't pass through raw API responses.
- 5–15 tools per server. More than that and tool selection becomes unreliable — the agent spends tokens parsing descriptions it will never use.
- One server, one domain. A "GitHub + Jira + Slack" server is three servers pretending to be one. Split them.
- Delete unused tools. If telemetry shows a tool is never invoked, remove it. It's consuming description tokens on every request.
- Separate read and write tools. This enables granular permission management. Users can "always allow" read tools while requiring confirmation for writes.
Agents retry. They also compare outputs across calls. Non-determinism creates confusion.
- Fix random seeds for any stochastic algorithm. Expose the seed as a parameter for reproducibility, but use a stable default (e.g.,
random_seed=42). - Document where non-determinism is unavoidable (e.g., DBSCAN cluster ordering, floating-point sensitivity across platforms).
- Use stable sort orders for output arrays.
When a group of parameters must all be provided together or not at all (e.g., region coordinates x, y, width, height):
Option A — JSON Schema dependentRequired (2020-12):
{
"dependentRequired": {
"region_x": ["region_y", "region_width", "region_height"],
"region_y": ["region_x", "region_width", "region_height"],
"region_width": ["region_x", "region_y", "region_height"],
"region_height": ["region_x", "region_y", "region_width"]
}
}Option B — Runtime validation fallback: Not all clients validate dependentRequired. Always implement server-side validation and return a Tool Execution Error with a clear message: "If any region parameter is provided, all four (region_x, region_y, region_width, region_height) must be provided."
Use both. The schema catches it at the client level; the runtime validation catches it when the client doesn't validate.
If multiple tools in your server accept the same parameter (e.g., algorithm, color_space, max_resolution), they MUST:
- Use identical names, types, defaults, and enum values.
- Appear in the same relative position in the schema (not enforced, but aids readability).
- Share identical descriptions.
Inconsistency between sibling tools is a common gap. An agent that learns extract_palette accepts color_space will assume extract_components does too — and be confused when it doesn't.
If your server works with spatial data (images, maps, documents), state the coordinate system explicitly:
- In the
instructionsfield of theinitializeresponse. - In the
descriptionof every coordinate parameter. - In the
descriptionof every coordinate field in theoutputSchema.
Example: "All input coordinates and output coordinates are in original image pixels (top-left origin). The server handles internal scaling for preprocessing; callers never need to account for downscaling."
Use enums (enum in JSON Schema) for every parameter with a known finite set of valid values. This includes:
- Algorithm choices
- Output formats
- Sort orders
- Status values
- Mode selectors
If the set might expand in future versions, still use an enum for the current version and update it when values are added. An agent that receives a clear enum will always pick a valid value. An agent that receives "type": "string" will frequently produce plausible but invalid values.
When writing specs that development agents will implement, these patterns prevent the most common classes of implementation bugs.
For every behavioral decision in your spec, verify it passes the "two reasonable implementations" test: could two independent implementers, reading only your spec, produce different behavior? If yes, the spec is underspecified.
Common ambiguity sources:
| Ambiguity | Example | Fix |
|---|---|---|
| Unspecified algorithm | "Find the nearest color" | "Find the nearest CSS named color using CIE2000 Delta-E in CIELAB space against the 148 CSS Level 4 named colors." |
| Unspecified threshold | "Merge similar colors" | "Merge colors within ΔE ≤ 5.0 (CIE2000) AND with bounding boxes overlapping > 50% of the smaller region's area." |
| Unspecified nullability | "Background field" | "null when exclude_background is false, OR when true but no single color exceeds the background_threshold." |
| Unspecified format | "Generate Tailwind config" | Provide a complete output example with exact syntax. |
| Contradictory statements | "DBSCAN ignores k" vs "k is a cap" | Resolve to a single statement and use identical phrasing everywhere the concept appears. |
Any time your output schema contains semantically-named fields derived from classification (e.g., role_hint, category, type), and those names feed into downstream identifiers (e.g., CSS variable names, token names), you MUST specify deduplication rules.
Example: If two palette entries both receive role_hint: "neutral", and the token naming pattern is --{prefix}-{role_hint}, you get a collision: two tokens named --color-neutral.
Specify explicitly:
- First occurrence:
--color-neutral - Subsequent:
--color-neutral-2,--color-neutral-3 - If no
role_hintis available, fall back to rank-based:--color-1,--color-2
If your output includes percentage fields, always state the denominator:
- "Percentages are relative to non-background pixels (when background is excluded) or total image pixels (when it is not). They sum to approximately 100%."
- "Per-component percentages are relative to the component's bounding box pixel count, not the full image."
For every output format your server produces, include a complete, valid example in the spec. Not a fragment — a complete, copy-paste-able block that an implementer can use as a test fixture.
This is especially critical for formats with precise syntax (CSS custom properties, SCSS variables, Tailwind config objects, W3C Design Token JSON).
If recommending an implementation language, provide a concrete dependency manifest — not just package names but exact package identifiers and version constraints:
# pyproject.toml [project.dependencies] example
dependencies = [
"mcp>=1.9",
"opencv-python-headless>=4.8", # not opencv-python (avoids GUI deps)
"scikit-learn>=1.3",
"Pillow>=10.0",
"pydantic>=2.0",
"numpy>=1.24",
]An agent choosing between opencv-python and opencv-python-headless will pick the wrong one without guidance.
The MCP Inspector (npx @modelcontextprotocol/inspector) is the canonical validation tool. Every server MUST pass the following checklist before release:
- Server starts and completes the
initialize→initializedhandshake. tools/listreturns all tools with validname,description,inputSchema.- Each tool is callable with valid arguments and returns
content+structuredContent. - Each tool returns
isError: true(not a protocol error) for invalid input values. - Unknown tool names produce a
-32602protocol error. pingrequests receive a response.- No requests are processed before
initialized.
For servers that process input data (images, documents, datasets):
- Synthetic fixtures: Programmatically generated inputs with exactly known expected outputs. These enable deterministic assertions. Example: a 100×100 image with four 50×50 solid-color quadrants should yield exactly 4 clusters with known hex values.
- Edge cases: Empty input, minimum-size input, maximum-size input, corrupt input. Each should produce a specific, documented error.
- Format coverage: If your server accepts PNG, JPEG, and WebP, test all three. JPEG compression introduces artifacts that affect clustering — your tests should verify this is handled.
- Unit tests cover individual modules in isolation: color conversion round-trips, clustering on synthetic data, schema validation, error formatting.
- Integration tests cover the full MCP protocol flow: initialize → tools/list → tools/call → validate response against outputSchema.
- Property tests (if applicable): verify invariants like "percentages sum to ~100%", "output coordinates are within original image bounds", "token names are unique".
For Python servers with heavy dependencies (OpenCV, scikit-learn, NumPy), use lazy imports to minimize startup latency on stdio transport:
def handle_extract_palette(params):
import cv2 # ~200ms cold import
import numpy as np # ~100ms cold import
from sklearn.cluster import KMeans
# ... processingThis keeps the initialize → initialized handshake fast while deferring heavy library loading to first use.
- stdio: Maximum client compatibility. Use for single-user, locally-run servers. This is the default for most use cases.
- Streamable HTTP: Use when you need networked access, horizontal scaling, or incremental results. Note: SSE transport is deprecated as of 2025-06-18; use Streamable HTTP instead.
LLM providers offer significant latency and cost reductions for cached prompt prefixes. Your tool definitions and instructions are part of this prefix. To maximize cache hits:
- Avoid injecting dynamic data (timestamps, live counts) into
instructionsor tool descriptions. - Keep tool definitions stable across sessions.
- If you must include dynamic context, append it after the stable prefix.
- Validate ALL tool inputs server-side, even if the schema constrains them. Clients may not validate schemas.
- For file paths: resolve to absolute paths, verify existence, check file type by magic bytes (not just extension), enforce size limits.
- For string inputs that feed into shell commands or SQL: sanitize or parameterize. Never interpolate user-provided strings into commands.
- Never echo secrets, API keys, or credentials in tool results or error messages.
- If your tool accesses authenticated APIs, strip authentication headers from any debug output.
- Rate-limit tool invocations to prevent abuse in multi-agent scenarios.
Design tools with a single risk level each:
- Read-only tools: Query data, retrieve status, list items. Mark with
readOnlyHint: true. - Write tools: Create, update, delete. Mark with
readOnlyHint: false, and setdestructiveHintappropriately.
Don't mix reads and writes in one tool — it prevents users from making informed permission decisions. If a workflow genuinely requires both, document it clearly and validate inputs aggressively.
- Use OAuth 2.1 for HTTP-based transports (mandatory per MCP spec).
- Never store tokens in plaintext files. Use platform keyrings.
- Request minimum necessary scopes.
- Handle token refresh proactively.
This section catalogs the most frequently observed failure modes, drawn from spec reviews and production experience. Use it as a pre-flight checklist.
| # | Anti-Pattern | Symptom | Fix |
|---|---|---|---|
| 1 | REST endpoint 1:1 mapping | Agent chains 3-5 calls for one goal | Consolidate into outcome-oriented tools |
| 2 | Nested input objects | Agent hallucinates keys, misses required sub-fields | Flatten to top-level primitives |
| 3 | Free-form strings where enums exist | Agent produces plausible but invalid values | Use JSON Schema enum |
| 4 | Human-facing error messages | Agent can't self-correct | Include what went wrong, what's valid, what to do differently |
| 5 | Validation errors as Protocol Errors | Agent never sees the error | Use Tool Execution Errors (isError: true) |
| 6 | Unbounded response sizes | Context window overflow | Cap, truncate, paginate |
| 7 | Missing additionalProperties: false |
Agent invents parameters silently | Add to all input schemas |
| 8 | Inconsistent parameters across tools | Agent assumes capabilities that don't exist | Audit sibling tools for symmetry |
| 9 | Ambiguous coordinate systems | Off-by-factor-N positioning errors | Declare coordinate space in three places |
| 10 | No outputSchema |
Agent must parse untyped JSON | Always declare output structure |
| 11 | No instructions in initialize |
Agent lacks server-level context | Write LLM-targeted usage guidance |
| 12 | Processing requests before initialized |
Protocol violation, undefined behavior | Gate on lifecycle state |
| 13 | Duplicate semantic identifiers | Token/key collisions in output | Specify deduplication rules |
| 14 | structuredContent in error responses |
Schema validation failure on client | Error responses use content only |
| 15 | Dynamic data in tool descriptions | Cache invalidation, increased costs | Keep descriptions static |
Use this checklist when implementing or reviewing an MCP server. Every item maps to a section in this guide.
- Server responds to
initializewithprotocolVersion,capabilities,serverInfo,instructions - Server gates tool processing on
initializednotification - Server responds to
ping -
tools/listreturns all tools withname,description,inputSchema -
tools/callreturns bothcontentandstructuredContent - Tool names comply with SEP-986 character/length rules
- JSON Schema defaults to 2020-12 dialect
- Input validation errors are Tool Execution Errors, not Protocol Errors
- Error responses do not include
structuredContent - Unknown tool names return Protocol Error -32602
- Each tool delivers a complete outcome (no mandatory chaining)
- All parameters are top-level primitives or constrained types
- All finite-set parameters use
enum -
additionalProperties: falseon input schemas (with documented exceptions) - Tool descriptions specify when to use, what's returned, how to format arguments
-
outputSchemadeclared for every tool -
annotationsset accurately (readOnlyHint, idempotentHint, etc.) - Consistent parameters across sibling tools
- Every error message is actionable (what happened, what's valid, what to do)
- File/path errors include format requirements
- Parameter errors include valid ranges and defaults
- Partial failures include what succeeded and what didn't
- Response sizes are bounded (max_results, max_components, or equivalent)
- Truncation flags (
truncated: true,total_available: N) are included when applicable - Text content blocks are summarized when full serialization exceeds ~20K characters
- List operations support pagination metadata
- MCP Inspector handshake passes
- All tools callable with valid arguments
- All tools return isError for invalid arguments
- Unit tests cover core logic modules
- Integration tests cover full protocol flow
- Synthetic test fixtures with known expected outputs
- Edge cases: empty input, minimum size, maximum size, corrupt input
- All inputs validated server-side
- No secrets in tool results or error messages
- File paths resolved and checked for existence/permissions/type
- Tools separated by risk level (read vs. write)
| Resource | URL |
|---|---|
| MCP Specification (2025-11-25) | https://modelcontextprotocol.io/specification/2025-11-25 |
| MCP Tools Spec | https://modelcontextprotocol.io/specification/2025-11-25/server/tools |
| MCP Lifecycle Spec | https://modelcontextprotocol.io/specification/2025-11-25/basic/lifecycle |
| MCP Changelog (2025-11-25) | https://modelcontextprotocol.io/specification/2025-11-25/changelog |
| MCP Schema Reference | https://modelcontextprotocol.io/specification/2025-11-25/schema |
| MCP Security Best Practices | https://modelcontextprotocol.io/specification/2025-11-25/basic/security_best_practices |
| Block's MCP Playbook | https://engineering.block.xyz/blog/blocks-playbook-for-designing-mcp-servers |
| Docker MCP Best Practices | https://www.docker.com/blog/mcp-server-best-practices/ |
| Philschmid's MCP Guide | https://www.philschmid.de/mcp-best-practices |
| The New Stack: 15 Best Practices | https://thenewstack.io/15-best-practices-for-building-mcp-servers-in-production/ |
| Term | Definition |
|---|---|
| Tool Execution Error | An error returned within the normal CallToolResult envelope with isError: true. Visible to the LLM for self-correction. |
| Protocol Error | A JSON-RPC error object at the transport level (error.code). May not reach the LLM. |
| structuredContent | Typed JSON object in tool results, validated against outputSchema. The machine-readable response. |
| content | Array of TextContent, ImageContent, or EmbeddedResource blocks. The backward-compatible response. |
| annotations | Behavioral metadata on tools (readOnlyHint, destructiveHint, etc.) used by hosts for permission decisions. |
| instructions | LLM-targeted text in the initialize response, injected into the system prompt by most hosts. |
| SEP | Specification Enhancement Proposal — the formal process for changes to the MCP specification. |
| stdio transport | Communication via stdin/stdout. Server is launched as a child process by the host. Maximum compatibility. |
| Streamable HTTP | HTTP-based transport supporting long-running connections and incremental results. Replaces deprecated SSE. |