Skip to content

Latest commit

 

History

History
225 lines (159 loc) · 6.32 KB

File metadata and controls

225 lines (159 loc) · 6.32 KB

Python Environments and uv Integration

Principle

Environment management is a first-class part of CPython4J.

The library should avoid "whatever Python is on PATH" as the default. Instead, use one of the supported modes.

Distribution modes

Mode Network needed? Use case
PythonEnv.uvProject() At build/first run Developer workstation
PythonEnv.explicit() Never Container / air-gapped
PythonEnv.venv() Never Pre-existing venv
PythonEnv.bundled() At first run Ship a JAR, uv bootstraps

All modes produce the same inputs to the engine: a libpython path and sys.path entries. The engine does not care how the environment was created.

uv-first API

PythonEnv env = PythonEnv.uvProject(Path.of("."))
    .sync(true)
    .build();

try (var engine = PythonEngine.create(env)) {
    ...
}

Expected behavior:

  • find uv;
  • optionally run uv sync;
  • locate project .venv;
  • probe the Python executable;
  • find libpython;
  • configure CPython paths;
  • provide diagnostics on failure.

Existing venv API

PythonEnv env = PythonEnv.venv(Path.of(".venv"));

Explicit API

PythonEnv env = PythonEnv.explicit()
    .library(Path.of("/opt/python/lib/libpython3.12.so"))
    .sitePackages(Path.of("/app/.venv/lib/python3.12/site-packages"))
    .build();

Probe script

Use the selected Python executable as the source of truth:

The probe script runs inside Python and discovers the environment cross-platform. It finds version, site-packages paths, and the shared library (libpython3.x.so, libpython3.x.dylib, or python3xx.dll) by searching sysconfig, sys.prefix, and sys.base_prefix. The Java side reads the resolved paths directly with no platform-specific logic.

Bundled / battery-included API

For self-contained distribution, bundle pyproject.toml and Python sources inside JAR resources. On first run, uv bootstraps CPython and all dependencies into a local cache.

PythonEnv env = PythonEnv.bundled()
    .sync(true)
    .build();

try (var engine = PythonEngine.create(env)) {
    ...
}

Expected behavior:

  • extract pyproject.toml and uv.lock from JAR resources to a temporary staging directory;
  • find uv on PATH (or use a bundled uv binary if present);
  • compute a content hash of pyproject.toml + uv.lock to determine the cache key;
  • if the cache directory does not exist or the hash has changed, run uv sync --python-preference managed to install CPython and all dependencies;
  • resolve libpython from the cached venv;
  • configure CPython paths from the cached environment;
  • on subsequent runs, skip sync and reuse the cached venv.

Cache location

The cache directory follows platform conventions:

~/cpython4j/cache/<hash>/

Override with -Dcpython4j.cache.dir=/custom/path or CPYTHON4J_CACHE_DIR env var, or .cacheDir(Path) on the builder.

The hash is derived from the SHA-256 of pyproject.toml + uv.lock. Changing dependencies produces a new cache directory. Concurrent processes are safe — the venv is prepared in a temp directory and atomically renamed into place.

Resource layout

JAR resources for bundled mode:

META-INF/cpython4j/
  pyproject.toml
  uv.lock
  sources/
    my_module.py
    ...

Custom resource path

PythonEnv.bundled()
    .resourcePrefix("META-INF/cpython4j")
    .sync(true)
    .build();

Container / production environments

Using the Prepare CLI

The cpython4j JAR includes a Prepare CLI that runs uv sync at Docker build time, so there is zero first-run latency:

FROM eclipse-temurin:25-jre
WORKDIR /app

# Python dependency spec — changing these invalidates the layer cache
COPY pyproject.toml uv.lock ./
COPY target/lib/cpython4j.jar ./

# Install Python + dependencies into .venv (cached across rebuilds)
RUN java -jar cpython4j.jar

# Application code — changes here don't re-install Python deps
COPY target/my-app.jar ./
CMD ["java", "-cp", "my-app.jar:lib/*", "com.example.Main"]
PythonEnv env = PythonEnv.venv(Path.of(".venv")).build();

For uber JARs, add one line to your main():

PythonEnv.prepareIfRequested(args);  // enables: java -jar app.jar --cpython4j-prepare

The prepare command accepts --project-dir <path> (default: current directory) and also triggers on the CPYTHON4J_PREPARE=true environment variable.

Docker layer caching for heavy dependencies

For large dependency trees (e.g., Docling with PyTorch), separate pyproject.toml/uv.lock from application code for better layer caching:

FROM eclipse-temurin:25-jre AS builder
WORKDIR /app

# Python dependency spec — this layer is cached unless deps change
COPY pyproject.toml uv.lock ./
COPY target/lib/cpython4j.jar ./

# Install Python + all dependencies (e.g., PyTorch, NumPy) into .venv
RUN java -jar cpython4j.jar

# Application code and remaining libraries
COPY target/my-app.jar ./
COPY target/lib/ ./lib/

# Final image — only copies the prepared result
FROM eclipse-temurin:25-jre
COPY --from=builder /app /app
WORKDIR /app
CMD ["java", "-cp", "my-app.jar:lib/*", "com.example.Main"]

Air-gapped / explicit mode

For environments with no uv and no internet access, use PythonEnv.explicit() with a pre-built venv:

PythonEnv env = PythonEnv.explicit()
    .library(Path.of("/usr/lib/libpython3.13.so"))
    .sitePackages(Path.of("/app/.venv/lib/python3.13/site-packages"))
    .build();

For non-container air-gapped deployments, alternatives include:

  • pre-populate the uv cache and use uv sync --offline;
  • vendor wheels with uv pip download and install with --no-index --find-links;
  • ship a pre-built venv directory alongside the JAR.

Python installation policy

In uv project mode, uv sync --python-preference managed automatically installs CPython if needed. This is the default behavior when .sync(true) is set:

PythonEnv.uvProject(Path.of("."))
    .sync(true)
    .build();

Build tool integration (optional)

Gradle and Maven plugins are optional convenience layers, not required for any mode.

Gradle example:

plugins {
    id("io.roastedroot.cpython4j") version "0.1.0"
}

python {
    uvProject.set(layout.projectDirectory)
    sync.set(true)
}

Maven can mirror this with a plugin. Both plugins are optional modules shipped separately from the core library.