日本語 | 中文 | Español | Français | हिन्दी | Italiano | Português (BR)
Write your visual rules. Generate art. Judge every image against those rules. Ship the results as versioned, auditable training data — then put trained models to work in real production workflows and feed the best outputs back into your corpus.
Style Dataset Lab connects the thing you wrote down about your art style to the dataset you actually train from, and then closes the loop all the way through production. You define a constitution — silhouette rules, palette constraints, material language, whatever matters for your project. The pipeline generates candidates, scores them against those rules, and packages the approved work into reproducible datasets where every record explains why it was included.
Then the production workbench takes over: compile generation briefs from project truth, run them through ComfyUI, critique the outputs, batch-produce expression sheets and environment boards, select the best results, and re-ingest them as new candidates. The loop closes: produce, select, review, strengthen.
# Write your canon. Scaffold the project.
sdlab init my-project --domain character-design
# Generate candidates via ComfyUI, then review them
sdlab generate inputs/prompts/wave1.json --project my-project
sdlab curate <id> approved "Strong silhouette, correct faction palette"
# Bind approved work to constitution rules
# (`sdlab bind` is a shorter alias for `canon-bind`)
sdlab canon-bind --project my-project
# Freeze a versioned dataset
sdlab snapshot create --project my-project
sdlab split build
sdlab export build
# Build a training package
sdlab training-manifest create --profile character-style-lora
sdlab training-package build
# Compile a production brief and run it
sdlab brief compile --workflow character-portrait-set --subject kael_maren
sdlab run generate --brief brief_2026-04-16_001
# Critique, refine, batch-produce
sdlab critique --run run_2026-04-16_001
sdlab refine --run run_2026-04-16_001 --pick 001.png
sdlab batch generate --mode expression-sheet --subject kael_maren
# Select the best outputs and bring them back
sdlab select --run run_2026-04-16_001 --approve 001.png,003.png
sdlab reingest selected --selection selection_2026-04-16_001That last command is the point. Selected outputs come back through the same review process as everything else. The corpus grows and the rules hold.
Seven dataset artifacts and a full production workbench. Each artifact links to its predecessors so you can trace any training record back to the rule that approved it.
| Artifact | What it is |
|---|---|
| Snapshot | Frozen record selection with config fingerprint. Every inclusion has an explicit reason. |
| Split | Train/val/test partition where subject families never cross boundaries. |
| Export package | Self-contained dataset: manifest, metadata, images, splits, dataset card, checksums. |
| Eval pack | Canon-aware test tasks: lane coverage, forbidden drift, anchor/gold, subject continuity. |
| Training package | Trainer-ready layout via adapters (diffusers-lora, generic-image-caption). Same truth, different format. |
| Eval scorecard | Per-task pass/fail from scoring generated outputs against eval packs. |
| Implementation pack | Prompt examples, known failures, continuity tests, and re-ingest guidance. |
The production workbench adds:
| Surface | What it does |
|---|---|
| Compiled brief | Deterministic generation instruction from workflow profile + project truth. |
| Run | Frozen execution artifact: brief + seeds + ComfyUI outputs + manifest. |
| Critique | Structured multi-dimension evaluation of run outputs against canon. |
| Batch | Coordinated multi-slot production (expression sheets, environment boards, silhouette packs). |
| Selection | Creative decision artifact: which outputs were chosen, why, and where they came from. |
| Re-ingest | Selected outputs return as candidate records with full generation provenance. |
Training data is the highest-leverage artifact in any visual AI pipeline. But most training data is a folder of images with no history, no judgment trail, and no connection to the style rules it was supposed to follow.
Style Dataset Lab makes the connection explicit. Your constitution defines the rules. Your rubric defines the scoring dimensions. Your curation records the judgment. Your canon binding proves the connection. And your dataset carries all of that forward as structured, queryable, reproducible truth.
The practical result: when your LoRA drifts, you can ask why. When your next training round needs better data, you know exactly which records are near-misses and what single rule they failed. When a new team member asks what the project's visual language is, the answer isn't a Figma board — it's a searchable constitution with 1,182 graded examples.
Not placeholder templates. Each domain ships with production-grade constitution rules, lane definitions, scoring rubrics, and group vocabulary.
| Domain | Lanes | What gets judged |
|---|---|---|
| game-art | character, environment, prop, UI, ship, interior, equipment | Silhouette at gameplay scale, faction read, wear and aging |
| character-design | portrait, full_body, turnaround, expression_sheet, action_pose | Proportions, costume logic, personality, gesture clarity |
| creature-design | concept, orthographic, detail_study, action, scale_reference, habitat | Anatomy, evolutionary logic, silhouette distinction |
| architecture | exterior, interior, streetscape, structural_detail, ruin, landscape | Structure, material consistency, perspective, era coherence |
| vehicle-mech | exterior, cockpit, component, schematic, silhouette_sheet, damage_variant | Mechanical logic, design language, access points, damage narrative |
Each project is self-contained. Five JSON config files define the rules; everything else is data.
projects/my-project/
project.json Identity + generation defaults
constitution.json Rules with rationale templates
lanes.json Subject lanes with detection patterns
rubric.json Scoring dimensions + thresholds
terminology.json Group vocabulary + detection order
records/ Per-asset JSON (provenance + judgment + canon)
snapshots/ Frozen dataset snapshots
splits/ Train/val/test partitions
exports/ Versioned export packages
training/ Profiles, manifests, packages, eval runs, implementations
workflows/ Workflow profiles + batch mode definitions
briefs/ Compiled generation briefs
runs/ Execution artifacts (brief + outputs + manifest)
batches/ Coordinated multi-slot productions
selections/ Chosen outputs with reasons and provenance
inbox/generated/ Re-ingested images awaiting review
These are not aspirational. They are enforced.
- Snapshots are immutable. Config fingerprint (SHA-256) proves nothing changed.
- Splits prevent leakage. Subject families (by identity, lineage, or ID suffix) never cross partition boundaries.
- Manifests are frozen contracts. Export hash + config fingerprint. If anything changes, create a new one.
- Adapters cannot mutate truth. Different layout, same records. No additions, no removals, no reclassification.
- Generated outputs re-enter through review. No bypass. Curate and bind like everything else.
The repo includes a complete working example: 1,182 records, 5 factions, 7 lanes, 24 constitution rules, 892 approved assets, 2 training profiles. A gritty sci-fi RPG visual canon, fully curated.
git clone https://github.com/mcp-tool-shop-org/style-dataset-lab
cd style-dataset-lab
sdlab project doctor --project star-freight
sdlab snapshot create --project star-freight # 839 eligible records
sdlab split build --project star-freight # zero subject leakagesdlab owns the dataset. Format conversion is handled by repo-dataset: TRL, LLaVA, Qwen2-VL, JSONL, Parquet, and more. repo-dataset renders; it never decides inclusion.
npm install -g @mcptoolshop/style-dataset-labRequires Node.js 20+ and ComfyUI on localhost:8188 for generation.
You can explore the full non-generation surface — inspection, curation, snapshot, split, export — using the bundled Star Freight project without installing ComfyUI or downloading any SDXL weights.
# Scaffold a fresh project (no ComfyUI needed)
sdlab init test --domain game-art
# Run the canonical health check (no ComfyUI needed)
sdlab project doctor --project test
# Dry-run a snapshot against the bundled Star Freight corpus
sdlab snapshot create --dry-run --project star-freightsdlab project doctor validates every project config (constitution, lanes, rubric, terminology) and reports eligibility without touching the GPU. Any command that mutates generated state accepts --dry-run to preview the effect first.
If you forget --project, the CLI falls back to the first project it finds under projects/ and prints a warning — pass --project explicitly to silence it.
Long generation runs can be resumed without redoing completed work:
# Skip subjects whose record + image are already on disk.
# Seeds are preserved — resumed runs are bit-identical to fresh ones.
sdlab generate inputs/prompts/wave1.json --project my-project --resume
# Re-run only failed/missing slots in an existing batch.
# Inherits mode/subject/theme from the prior manifest.
sdlab batch generate --resume batch_2026-04-22_001 --project my-projectBoth commands work because every slot writes its manifest entry atomically as it finishes — a crash mid-run never corrupts the partial state.
Common failure modes and fixes:
ECONNREFUSED 127.0.0.1:8188 on any sdlab generate / sdlab run generate / sdlab batch generate
ComfyUI isn't running. Start ComfyUI (python main.py --listen 127.0.0.1 --port 8188) and confirm with curl http://127.0.0.1:8188/system_stats. To point at a different host/port, set COMFY_URL=http://host:port.
missing checkpoint / LoRA weight not found
Your workflow profile names a model file that isn't in ComfyUI's models/checkpoints/ or models/loras/ folder. Open projects/<project>/workflows/profiles/<profile>.json, locate the checkpoint or lora field, and either download the referenced weight or swap it for one you already have. Re-run sdlab project doctor --project <project> to confirm the fix.
sdlab project doctor errors
Doctor returns structured error codes. Common ones:
E_PROJECT_NOT_FOUND— the project directory doesn't exist underprojects/. Check spelling.E_CONFIG_INVALID— one of the five JSON config files failed schema validation. Thehintfield names the bad file and field.E_RECORD_DRIFT— a record's config fingerprint no longer matches its source. Re-curate or re-bind as the hint suggests.
No --project specified, falling back to <name>
A soft warning. Pass --project <name> explicitly to select the right project and silence the warning.
Painterly / VRAM out-of-memory issues
See docs/internal/HANDOFF.md for the painterly denoise tuning notes. In short: lower the denoise strength, reduce batch size, or switch to a smaller checkpoint in your workflow profile.
Reporting bugs
File an issue at https://github.com/mcp-tool-shop-org/style-dataset-lab/issues with your sdlab version (sdlab --version), Node version (node -v), the full command, and the structured error output. A bug-report template prefills the fields.
Local-only. No telemetry, no analytics, no external requests. Images stay on your GPU and filesystem.
MIT
Built by MCP Tool Shop
