Skip to content

Security: PhoenixForrestLin/skill-shelf

Security

docs/security.md

Security Architecture


Design Principles

Three lines of defense: Declarative Permissions + Runtime Sandbox + User Confirmation

πŸ”

Security checks are a built-in pipeline, not an optional Skill. If security were optional, users could skip it, or a malicious payload could write "ignore security checks" in its systemPrompt. The security layer MUST be non-bypassable.


Core Positioning: Translation, Not Audit

Community security tools (mcp-scan, mcp-sec-audit, mcpmarket) all assume the user is a developer or security professional. Their output is technical reports, CVE numbers, and risk scores.

Our users are ordinary people scanning a QR code at a dumpling shop. They won't read an A-F risk report.

Dimension Community Tools skill-shelf
Detection pattern library (prompt injection / dangerous permissions) βœ… Mature Reuse directly β€” don't reinvent
Plain-language risk explanation for end users ❌ Does not exist Core differentiator
Non-bypassable pre-install interception ❌ All optional tools Built into install flow
Semantic judgment: declared capability vs. expected scenario ❌ Rule matching only LLM-native advantage

Key insight: Risk = the gap between declared capabilities and reasonable expectations. A WiFi Skill requesting SHELL access is far more dangerous than a dev tool doing the same. Audit tools don't make this judgment β€” LLMs can.


7-Step Install Pipeline

graph LR
    P1["parse_payload"] --> P2["validate_schema"] --> P3["static_audit"] --> P4["capability_analysis"] --> P5["generate_summary"] --> P6["user_confirm"] --> P7["install"]
Loading
Step What It Does How
parse_payload Parse QR content β†’ fetch manifest JSON Deterministic
validate_schema Validate manifest format ($schema, required fields) JSON Schema
static_audit Detect prompt injection patterns, scan dangerous permissions Reuse mcp-scan / community engines
capability_analysis LLM judges: do capabilities match the scenario? Cross-validate systemPrompt vs permissions LLM reasoning
generate_summary Generate plain-language security summary with risk highlights LLM generation
user_confirm Block for user confirmation (Install / View Details / Cancel) Client UI
install Register MCP connection, start TTL timer Reuse existing capability

Runtime Sandbox

Two Interaction Modes

Dimension sandboxed (default) agent-assisted
Context Fully isolated β€” cannot see user's other conversations/tools/data Can access host conversation context (constrained by permissions)
Trust requirement Low β€” isolation = security High β€” requires platform verification + explicit user authorization
Call interception Only exposes the mini-skill's own declared MCP tools; blocks all host agent capabilities Selectively passes through per permissions declaration

Core guarantee: enforcement is runtime interception, not declarations. skill-shelf as middleware intercepts all tool call requests from mini-skills.

Call Proxy Mechanism

Host Agent β†’ skill-shelf.invoke(skillId, toolName, args) β†’ skill-shelf permission check β†’ mini-skill MCP Server

skill-shelf never exposes mini-skill MCP connections directly to the host agent.


Trust Tier System

Inspired by browser lock icon:

Tier Condition UI Install Behavior
🟑 Unverified signed: false Β· publisher: "unverified" Yellow shield + "Unverified source β€” only use in locations you trust" Mandatory user confirmation + sandbox
🟒 Verified signed: true · publisher: "verified" Green shield + "Verified by OpenClaw platform" Optional auto-install (user-configurable)
πŸ”΅ Audited audit field present Blue shield + audit report link Highest trust tier

User analogy: "This is like scanning a QR code to open a stranger's mini-program on WeChat" β€” helps ordinary users build a mental model quickly.


User-Facing Security Summary

Template output from generate_summary:

πŸ₯Ÿ About to install: Wang's Dumpling Shop Assistant

Source: Downstairs dumpling shop (mcp.jiaozi.local) · 🟑 Not platform-verified

What this Skill can do: πŸ“‹ View today's menu ⭐ Get the owner's recommended dishes πŸ“Ά Query shop WiFi password

Security check results: βœ… Won't access your files βœ… Won't read your chat history or work data βœ… Only communicates with the dumpling shop's own server ℹ️ This Skill is self-deployed by the merchant, not platform-verified

Retention: This session only (auto-removed when you close the chat)

[Install]γ€€[View Details]γ€€[Cancel]

Copy Principles

  • Emoji for quick scanning (βœ… safe Β· ⚠️ note Β· ❌ risk)
  • No technical jargon (no JSON, schema, MCP)
  • Risk framed as "expectation gap": "This is unusual for a restaurant assistant"

Three-Layer Detail Architecture

Level Audience Content
L1 β€” Security Summary Ordinary users Plain-language capability description + risk interpretation
L2 β€” Technical Details Curious users Tool list + permission declarations + source URL + systemPrompt + behavior summary
L3 β€” Raw Manifest Developers Complete JSON manifest

systemPrompt Security Handling

Compromise approach:

  • systemPrompt is included in the payload (preserves flexibility)
  • Agent does NOT silently inject it. Instead:
    • Default: show LLM-generated one-line behavior summary: "This Skill will respond as a 'dumpling shop intelligent assistant'"
    • Raw prompt only shown at "Technical Details" level
  • capability_analysis step adds systemPrompt vs permissions cross-validation β€” detect if the prompt contains instructions contradicting declared permissions
  • inline + runtime: "prompt" MUST display the full prompt content to the user; silent injection is not allowed

Community Reference

Project Relevance Notes
mcp-scan (Invariant Labs) Medium Detects prompt injection / tool poisoning β€” reusable as static_audit engine
mcp-sec-audit (CSA) Low-Medium Static pattern matching + Docker/eBPF sandbox fuzzing β€” academic oriented
mcpmarket Security Audit Medium 22 prompt injection patterns, A-F scoring
Claude Code Skills Medium Permission model reference β€” scoped execution, trust gradient
OWASP MCP Top 10 High Scope Creep (#2 risk) β€” session-level auth, auto-revoke
NVIDIA Sandbox Guide Medium Layered control β€” deny list > workspace allow > whitelist > default deny

There aren’t any published security advisories