Three lines of defense: Declarative Permissions + Runtime Sandbox + User Confirmation
πSecurity checks are a built-in pipeline, not an optional Skill. If security were optional, users could skip it, or a malicious payload could write "ignore security checks" in its systemPrompt. The security layer MUST be non-bypassable.
Community security tools (mcp-scan, mcp-sec-audit, mcpmarket) all assume the user is a developer or security professional. Their output is technical reports, CVE numbers, and risk scores.
Our users are ordinary people scanning a QR code at a dumpling shop. They won't read an A-F risk report.
| Dimension | Community Tools | skill-shelf |
|---|---|---|
| Detection pattern library (prompt injection / dangerous permissions) | β Mature | Reuse directly β don't reinvent |
| Plain-language risk explanation for end users | β Does not exist | Core differentiator |
| Non-bypassable pre-install interception | β All optional tools | Built into install flow |
| Semantic judgment: declared capability vs. expected scenario | β Rule matching only | LLM-native advantage |
Key insight: Risk = the gap between declared capabilities and reasonable expectations. A WiFi Skill requesting SHELL access is far more dangerous than a dev tool doing the same. Audit tools don't make this judgment β LLMs can.
graph LR
P1["parse_payload"] --> P2["validate_schema"] --> P3["static_audit"] --> P4["capability_analysis"] --> P5["generate_summary"] --> P6["user_confirm"] --> P7["install"]
| Step | What It Does | How |
|---|---|---|
parse_payload |
Parse QR content β fetch manifest JSON | Deterministic |
validate_schema |
Validate manifest format ($schema, required fields) | JSON Schema |
static_audit |
Detect prompt injection patterns, scan dangerous permissions | Reuse mcp-scan / community engines |
capability_analysis |
LLM judges: do capabilities match the scenario? Cross-validate systemPrompt vs permissions | LLM reasoning |
generate_summary |
Generate plain-language security summary with risk highlights | LLM generation |
user_confirm |
Block for user confirmation (Install / View Details / Cancel) | Client UI |
install |
Register MCP connection, start TTL timer | Reuse existing capability |
| Dimension | sandboxed (default) | agent-assisted |
|---|---|---|
| Context | Fully isolated β cannot see user's other conversations/tools/data | Can access host conversation context (constrained by permissions) |
| Trust requirement | Low β isolation = security | High β requires platform verification + explicit user authorization |
| Call interception | Only exposes the mini-skill's own declared MCP tools; blocks all host agent capabilities | Selectively passes through per permissions declaration |
Core guarantee: enforcement is runtime interception, not declarations. skill-shelf as middleware intercepts all tool call requests from mini-skills.
Host Agent β skill-shelf.invoke(skillId, toolName, args) β skill-shelf permission check β mini-skill MCP Server
skill-shelf never exposes mini-skill MCP connections directly to the host agent.
Inspired by browser lock icon:
| Tier | Condition | UI | Install Behavior |
|---|---|---|---|
| π‘ Unverified | signed: false Β· publisher: "unverified" |
Yellow shield + "Unverified source β only use in locations you trust" | Mandatory user confirmation + sandbox |
| π’ Verified | signed: true Β· publisher: "verified" |
Green shield + "Verified by OpenClaw platform" | Optional auto-install (user-configurable) |
| π΅ Audited | audit field present |
Blue shield + audit report link | Highest trust tier |
User analogy: "This is like scanning a QR code to open a stranger's mini-program on WeChat" β helps ordinary users build a mental model quickly.
Template output from generate_summary:
π₯ About to install: Wang's Dumpling Shop Assistant
Source: Downstairs dumpling shop (mcp.jiaozi.local) Β· π‘ Not platform-verified
What this Skill can do: π View today's menu β Get the owner's recommended dishes πΆ Query shop WiFi password
Security check results: β Won't access your files β Won't read your chat history or work data β Only communicates with the dumpling shop's own server βΉοΈ This Skill is self-deployed by the merchant, not platform-verified
Retention: This session only (auto-removed when you close the chat)
[Install]γ[View Details]γ[Cancel]
- Emoji for quick scanning (β
safe Β·
β οΈ note Β· β risk) - No technical jargon (no JSON, schema, MCP)
- Risk framed as "expectation gap": "This is unusual for a restaurant assistant"
| Level | Audience | Content |
|---|---|---|
| L1 β Security Summary | Ordinary users | Plain-language capability description + risk interpretation |
| L2 β Technical Details | Curious users | Tool list + permission declarations + source URL + systemPrompt + behavior summary |
| L3 β Raw Manifest | Developers | Complete JSON manifest |
Compromise approach:
- systemPrompt is included in the payload (preserves flexibility)
- Agent does NOT silently inject it. Instead:
- Default: show LLM-generated one-line behavior summary: "This Skill will respond as a 'dumpling shop intelligent assistant'"
- Raw prompt only shown at "Technical Details" level
capability_analysisstep adds systemPrompt vs permissions cross-validation β detect if the prompt contains instructions contradicting declared permissionsinline+runtime: "prompt"MUST display the full prompt content to the user; silent injection is not allowed
| Project | Relevance | Notes |
|---|---|---|
| mcp-scan (Invariant Labs) | Medium | Detects prompt injection / tool poisoning β reusable as static_audit engine |
| mcp-sec-audit (CSA) | Low-Medium | Static pattern matching + Docker/eBPF sandbox fuzzing β academic oriented |
| mcpmarket Security Audit | Medium | 22 prompt injection patterns, A-F scoring |
| Claude Code Skills | Medium | Permission model reference β scoped execution, trust gradient |
| OWASP MCP Top 10 | High | Scope Creep (#2 risk) β session-level auth, auto-revoke |
| NVIDIA Sandbox Guide | Medium | Layered control β deny list > workspace allow > whitelist > default deny |