Testable world-model workflows for physical-AI systems.
WorldForge is a Python integration layer that gives world-model providers, score models, embodied policies, and media generators explicit capability contracts. It adds planning, evaluation, benchmarks, diagnostics, local state, and CLI tools while keeping checkpoints, credentials, robot controllers, and deployment host-owned.
Quickstart · CLI · Providers · Rerun · Capability Model · Architecture · Quality · Docs · Playbooks · Support · Security
WorldForge's front-door robotics demo composes a Hugging Face LeRobot policy with a LeWorldModel checkpoint. LeRobot proposes PushT action candidates, WorldForge bridges those policy actions into LeWorldModel-native candidate tensors, LeWorldModel scores the candidates, and WorldForge selects and mock-replays the lowest-cost action chunk.
The LeWorldModel runtime path intentionally follows the official LeWM loading contract:
stable_worldmodel.policy.AutoCostModel("pusht/lewm") loads the Lucas Maes LeWM object checkpoint.
stable-worldmodel is the runtime/evaluation library used by the official LeWorldModel repo, not
a substitute score model.
This is simulation/replay planning. It demonstrates policy inference, score-model inference, typed provider composition, candidate ranking, event capture, and visual replay. Hardware control, safety checks, robot-controller integration, and task-specific preprocessing stay host-owned.
Pipeline: real policy, real score checkpoint, WorldForge planner, local mock replay. |
Decision: candidate ranking, robot-arm illustration, and fixed tabletop replay. |
scripts/robotics-showcaseThe command launches a staged Textual report by default, writes the same run data to
/tmp/worldforge-robotics-showcase/real-run.json, and writes a visual Rerun recording to
/tmp/worldforge-robotics-showcase/real-run.rrd. Press o in the TUI to open that recording in
Rerun. Use --tui-stage-delay 0.1 for a faster reveal, --no-tui-animation to skip sleeps and
arm motion, --no-tui for the plain terminal report, --no-rerun to skip the Rerun artifact,
--json-only for automation, or --health-only for a non-mutating dependency/checkpoint
preflight. Use --lewm-revision <tag-or-commit> to pin auto-built LeWorldModel assets.
Read the walkthrough and implementation notes: Robotics Replay Showcase and Robotics Showcase Technical Deep Dive.
TheWorldHarness TUI - checkout-safe visual harness for worlds, providers, evals, benchmarks, and packaged flows
TheWorldHarness is the optional Textual workspace for inspecting WorldForge flows without installing
robotics or model runtimes. It runs checkout-safe demos, provider diagnostics, benchmark comparison,
world editing, and saved report previews through the harness extra.
uv run --extra harness worldforge-harness
uv run --extra harness worldforge-harness --flow lerobot
uv run --extra harness worldforge-harness --flow diagnosticsMore detail: TheWorldHarness docs.
Rerun observability - optional recording layer for events, world snapshots, plans, and benchmark artifacts
WorldForge can stream sanitized provider events and run artifacts into Rerun without making Rerun a provider or base dependency.
uv run --extra rerun worldforge-demo-rerun
uv run --extra rerun rerun .worldforge/rerun/worldforge-rerun-showcase.rrd
scripts/robotics-showcase
uvx --from "rerun-sdk>=0.24,<0.32" rerun /tmp/worldforge-robotics-showcase/real-run.rrdThe checkout-safe Rerun demo records provider event logs, world snapshots, a predictive plan,
3D object boxes, and benchmark metrics into a local .rrd file. The robotics showcase records the
real PushT policy+score run with candidate target points, selected trajectory, score bars, latency
bars, provider events, plan payload, and replay snapshots. Use --spawn, --connect-url, or
--serve-grpc-port for live viewer workflows in the checkout-safe demo.
More detail: Rerun integration docs.
A score model, a robot policy server, a video simulator, and a remote media API have different
inputs, runtimes, and failure modes. WorldForge does not flatten those differences. Each provider
adapter declares which of eight capabilities it supports (predict, score, policy, generate,
transfer, reason, embed, plan). The contract is strict and fail-closed: calling an
unsupported capability raises rather than quietly returning empty results.
Planning, evaluation, benchmarks, diagnostics, and persistence are built on top of that contract, not on any specific runtime. Benchmark budget files can turn success rate, error count, retry count, latency, and throughput thresholds into non-zero CLI gates for release checks or preserved benchmark claims.
WorldForge is not a hosted service, a model API abstraction, or a training framework. Optional runtimes, robot stacks, credentials, checkpoints, and durable storage remain the host application's responsibility.
| Capability contracts | Eight named capabilities. Adapters advertise only what they actually implement and return typed WorldForge results. Unknown names raise instead of behaving like empty filters. |
| Composable planning | Combine predictive, score, and policy providers in a single planning loop. Rank candidates, roll out futures, execute actions, persist state. |
| Deterministic by default | Built-in mock provider, reusable contract assertions (worldforge.testing), and packaged demos that run from a clean checkout without credentials or GPUs. |
| Host-owned runtimes | No torch, CUDA, robot controllers, or checkpoints in base dependencies. LeWorldModel, GR00T, LeRobot, Cosmos, and Runway integrate through their own surfaces. |
| Diagnostics | worldforge doctor, provider events, benchmark and evaluation harnesses, and an optional Textual TUI (TheWorldHarness) for inspecting traces. |
| Rerun observability | Optional rerun-sdk bridge for event streams, world snapshots, plans, and benchmark artifacts. |
| Quality gates | py.typed, import-isolated pytest, ruff, a 90% coverage floor, strict docs, and wheel + sdist contract tests in CI on Python 3.13. |
# From PyPI (recommended)
uv add worldforge-ai
# or
pip install worldforge-aiThe Python import path stays the same:
import worldforgeIf you want the optional Textual harness UI:
uv add "worldforge-ai[harness]"If you want Rerun-backed event and artifact recording:
uv add "worldforge-ai[rerun]"uv add "worldforge-ai @ git+https://github.com/AbdelStark/worldforge"git clone https://github.com/AbdelStark/worldforge.git
cd worldforge
uv sync --group dev
cp .env.example .envOptional extras:
uv sync --group dev --extra harness # TheWorldHarness Textual TUI
uv sync --group dev --extra rerun # Rerun event and artifact recordingPython 3.13 only. Base install depends only on httpx. Optional runtimes are host-owned.
The short path is the mock provider: it runs from a clean checkout and exercises the same typed world, provider, planning, persistence, and diagnostics surfaces used by richer runtimes.
Full references: Python API · CLI reference · Examples index
Python API sample
from worldforge import Action, BBox, Position, SceneObject, StructuredGoal, WorldForge
forge = WorldForge()
world = forge.create_world("kitchen", provider="mock")
world.add_object(
SceneObject(
"red_mug",
Position(0.0, 0.8, 0.0),
BBox(Position(-0.05, 0.75, -0.05), Position(0.05, 0.85, 0.05)),
)
)
prediction = world.predict(Action.move_to(0.3, 0.8, 0.0), steps=2)
print(prediction.provider, prediction.physics_score)
plan = world.plan(
goal_spec=StructuredGoal.object_at(
object_name="red_mug",
position=Position(0.3, 0.8, 0.0),
)
)
print(plan.action_count, plan.success_probability)
doctor = forge.doctor()
print(doctor.healthy_provider_count, doctor.provider_count)CLI sample
uv run worldforge examples # runnable scripts index
uv run worldforge doctor --registered-only # active provider health
uv run worldforge world create lab --provider mock # save a local world
uv run worldforge world add-object <world-id> cube --x 0 --y 0.5 --z 0 # edit scene state
uv run worldforge world predict <world-id> --object-id <object-id> --x 0.4 --y 0.5 --z 0
uv run worldforge world list # persisted worlds
uv run worldforge world objects <world-id> # scene objects
uv run worldforge world history <world-id> # object edits + predictions
uv run worldforge world export <world-id> --output world.json # portable state JSON
uv run worldforge world delete <world-id> # remove local JSON state
uv run worldforge provider list # registered providers
uv run worldforge provider info mock # capability surface
uv run worldforge predict kitchen --provider mock --x 0.3 --y 0.8 --z 0.0 --steps 2
uv run worldforge eval --suite planning --provider mock --format json
uv run worldforge benchmark --provider mock --iterations 5 --format json
uv run worldforge benchmark --provider mock --operation embed --input-file examples/benchmark-inputs.json
uv run worldforge benchmark --provider mock --operation generate --budget-file examples/benchmark-budget.jsonScene mutations append persisted history entries with typed action payloads. Position patches keep the object's bounding box translated with the pose so saved snapshots stay coherent.
Full CLI reference: worldforge/cli.
In WorldForge, a "capability" names an operation an adapter actually supports, not the upstream model's branding.
| Capability | Signature | Example providers |
|---|---|---|
predict |
state + action → predicted state |
mock |
score |
observations + goal + candidates → ranked candidates |
leworldmodel |
policy |
observation + instruction → action chunks |
gr00t, lerobot |
generate |
prompt + options → media artifact |
cosmos, runway, mock |
transfer |
artifact + prompt/options → artifact |
runway, mock |
reason |
structured reasoning over state | mock |
embed |
observation → embedding | mock |
plan |
facade over composed surfaces | WorldForge facade |
Adapters can register a full BaseProvider or a narrow capability protocol implementation such
as a Cost, Policy, Generator, or Predictor. The protocol path is intentionally small:
declare name, optional profile metadata, and the one method behind the advertised capability.
Registered protocol implementations are visible through diagnostics, planning, and benchmarks
without forcing unrelated provider methods into the adapter.
LeWorldModel is a score provider, not a video generator. GR00T and LeRobot are policy providers, not predictive world models. Cosmos and Runway are media generators, not controllable physical planning.
The canonical loop:
observe state
→ propose candidate actions
→ score or roll out possible futures (score / predict)
→ select an action sequence (plan)
→ execute through a provider (policy / predict)
→ persist, evaluate, observe again
| Provider | Maturity | Capability surface | Registration | Runtime ownership |
|---|---|---|---|---|
mock |
stable |
predict, generate, transfer, reason, embed |
always registered | in-repo deterministic local provider |
cosmos |
beta |
generate |
COSMOS_BASE_URL |
host supplies a reachable Cosmos deployment and optional NVIDIA_API_KEY |
runway |
beta |
generate, transfer |
RUNWAYML_API_SECRET or RUNWAY_API_SECRET |
host supplies Runway credentials and persists returned artifacts |
leworldmodel |
stable |
score |
LEWORLDMODEL_POLICY or LEWM_POLICY |
host installs the official LeWM loading path (stable_worldmodel.policy.AutoCostModel), torch, and compatible checkpoints |
gr00t |
beta |
policy |
GROOT_POLICY_HOST |
host runs or reaches an Isaac GR00T policy server |
lerobot |
stable |
policy |
LEROBOT_POLICY_PATH or LEROBOT_POLICY |
host installs LeRobot and compatible policy checkpoints |
jepa |
experimental |
score |
JEPA_MODEL_NAME |
host supplies torch, facebookresearch/jepa-wms runtime dependencies, and task preprocessing |
genie |
scaffold |
scaffold | GENIE_API_KEY |
capability-fail-closed reservation; Project Genie has no supported automation API contract |
jepa is a score-only adapter for host-owned facebookresearch/jepa-wms torch-hub runtimes.
genie remains a capability-closed reservation. Executable scaffold candidates stay outside
package exports and auto-registration until they have a validated runtime path, typed parser
coverage, request limits, and docs.
┌──────────────────────────────────────────────┐
│ Host application / CLI │
└──────────────────────┬───────────────────────┘
│
▼
┌──────────────────────────────────────────────┐
│ WorldForge facade │
│ catalog · registry · diagnostics · persist │
└──────────────────────┬───────────────────────┘
│
▼
┌──────────────────────────────────────────────┐
│ World runtime │
│ state · history · planning · execution │
└──────────────────────┬───────────────────────┘
│
▼
┌──────────────────────────────────────────────┐
│ Provider adapter │
│ capability contract · validation · events │
└──────────────────────┬───────────────────────┘
│
▼
┌──────────────────────────────────────────────┐
│ Upstream runtime or API │
│ local model · policy server · media API │
└──────────────────────────────────────────────┘
| Path | Responsibility |
|---|---|
src/worldforge/models.py |
Domain models, serialization, validation errors, provider metadata, result types, request policies |
src/worldforge/framework.py |
WorldForge, World, persistence, planning, prediction, comparison, diagnostics |
src/worldforge/providers/catalog.py |
In-repo provider factories and auto-registration policy |
src/worldforge/providers/base.py |
Provider interfaces, ProviderError, remote-provider behavior, PredictionPayload |
src/worldforge/providers/ |
Concrete adapters: mock, Cosmos, Runway, LeWorldModel, GR00T, LeRobot, JEPA, Genie |
src/worldforge/evaluation/ |
Deterministic evaluation suites and report renderers |
src/worldforge/benchmark.py |
Capability-aware latency, retry, throughput, and event benchmark harness |
src/worldforge/observability.py |
ProviderEvent sinks for logs, recording, and metrics |
src/worldforge/rerun.py |
Optional Rerun SDK bridge for events, worlds, plans, and benchmark artifacts |
src/worldforge/testing/ |
Reusable provider contract assertions |
Read architecture · world-model taxonomy · provider authoring guide before adding a new adapter.
The README keeps the primary showcase and quickstart visible. Use the docs for the full command surface and runtime-specific entrypoints:
| Need | Start here |
|---|---|
| Full CLI command map | CLI reference |
| Runnable example index | Examples and CLI commands or uv run worldforge examples |
| LeRobot + LeWorldModel replay showcase | Robotics showcase walkthrough |
| Checkout-safe visual flows | TheWorldHarness |
| Rerun event and artifact recording | Rerun integration or uv run --extra rerun worldforge-demo-rerun |
| Optional runtime operations | Operator playbooks |
| Support, security, citation | Support, Security, Citation |
- Researchers comparing world-model surfaces without rewriting the harness for each one.
- Robotics and physical-AI engineers wiring policies, scorers, simulators, and media providers around their own stacks.
- Framework builders shipping adapter packages, CLI workflows, and reproducible demos.
- Anyone who wants the repo to run from a clean checkout before installing CUDA or downloading checkpoints.
- Capabilities are contracts. Don't advertise an operation unless the adapter implements it and returns the typed WorldForge result.
- Optional runtimes remain host-owned. No torch, LeWorldModel, LeRobot, GR00T, CUDA, TensorRT, controllers, checkpoints, or datasets in base dependencies.
- Embodiment-specific action translation is host-owned. Policy providers preserve raw actions; the
caller converts them into executable
Actionobjects. - Local JSON persistence is single-writer and available through both Python APIs and
worldforge worldCLI commands. Services needing locking, transactions, or migrations own that layer. - Built-in evaluation suites are deterministic contract harnesses. They are not physical-fidelity, media-quality, or real-world safety claims.
- Scaffold adapters (
jepa,genie,jepa-wms) are placeholders, not real integrations. - World IDs are local storage identifiers. Path separators and traversal-shaped IDs are rejected.
Primary local gate (same as CI):
uv sync --group dev
uv lock --check
uv run ruff check src tests examples scripts
uv run ruff format --check src tests examples scripts
uv run python scripts/generate_provider_docs.py --check
uv run mkdocs build --strict
uv run pytest
uv run --extra harness pytest --cov=src/worldforge --cov-report=term-missing --cov-fail-under=90
bash scripts/test_package.sh
uv build --out-dir dist --clear --no-build-logsBefore a tag, also run the locked dependency audit. The expanded gate and triage steps live in the operator playbooks.
Scaffold a new provider:
uv run python scripts/scaffold_provider.py "Acme WM" \
--taxonomy "JEPA latent predictive world model" \
--planned-capability scoreContributor guide: CONTRIBUTING.md. Repository agent context: AGENTS.md.
WorldForge is pre-1.0 beta. Minor releases may still include breaking changes when the public API needs to tighten.
Useful today for
- local provider adapter development
- deterministic planning and evaluation experiments
- checkout-safe demos and optional-runtime smoke tests
- contract testing for third-party provider packages
- CLI diagnostics around provider registration, health, and capabilities
Known limits
jeparequires host-owned PyTorch, JEPA-WMS dependencies, checkpoints, and task preprocessinggenieis a capability-fail-closed scaffold adapterjepa-wmsremains a direct-construction candidate for host experiments- local JSON persistence is single-writer only
- evaluation scores are contract signals, not physical-fidelity or safety claims
- optional runtimes, checkpoints, trace export, dashboards, and production telemetry stay host-owned
If you use WorldForge in academic work, a BibTeX entry is:
@software{worldforge,
title = {WorldForge: An integration layer for physical-AI world models},
author = {AbdelStark and {WorldForge contributors}},
year = {2026},
url = {https://github.com/AbdelStark/worldforge},
version = {0.5.0}
}Issues, discussions, and pull requests are welcome. Please read CONTRIBUTING.md and open an issue for non-trivial changes before sending a patch. For provider work, start with the provider authoring guide and the playbooks.
WorldForge is released under the MIT License.
- Documentation: https://abdelstark.github.io/worldforge/
- Quickstart: https://abdelstark.github.io/worldforge/quickstart/
- Playbooks: https://abdelstark.github.io/worldforge/playbooks/
- Architecture: https://abdelstark.github.io/worldforge/architecture/
- World-model taxonomy: https://abdelstark.github.io/worldforge/world-model-taxonomy/
- Security policy: SECURITY.md
- Repository: https://github.com/AbdelStark/worldforge
- Issues: https://github.com/AbdelStark/worldforge/issues
Abdel 💻 🤔 📆 |
0xLucqs 💻 |
Made with love by Abdel and the WorldForge community.





