Skip to content

roastedroot/cpython4j

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CPython4J

An experimental project exploring how to call Python from Java, in-process, with full native extension support. Inspired by Java Project Detroit and the growing need for Java/Python interop in AI workloads.

CPython4J runs real CPython via the Foreign Function & Memory API (Panama), with a focus on developer experience: type-safe contract interfaces, annotation-processed proxies, automatic Python environment management via uv, and bidirectional Java/Python calls.

The north-star use case is Docling: running an AI document processing pipeline from Java, with Java classification callbacks firing per-element inside Python's loop. PyTorch, NumPy, OpenCV all just work because it's real CPython.

This is not a Wasm-based approach. Projects like trino-wasm-python and boomslang compile CPython to WebAssembly for sandboxed execution. The tradeoff is that every native Python library (NumPy, PyTorch, OpenCV, etc.) must also be compiled to Wasm and statically linked into the build. This makes it impractical to support the broader Python ecosystem, especially AI/ML libraries with complex native dependencies. CPython4J uses the system's native CPython instead, so any pip-installable package works out of the box.

Prerequisites

  • Java 25+
  • uv (for Python environment management)
  • A shared libpython (uv installs one automatically with --python-preference managed)

Quickstart

Option A: Let CPython4J manage everything

Create a pyproject.toml with your Python dependencies:

[project]
name = "my-project"
version = "0.1.0"
requires-python = ">=3.13"
dependencies = ["cowsay"]

Write a Python module:

# greeting.py
import cowsay

def greet(name):
    return cowsay.get_output_string("cow", "Hello " + name)

def wordCount(text):
    return len(text.split())

Use it from Java - CPython4J runs uv sync, installs Python and dependencies, and configures everything:

var env = PythonEnv.uvProject(Path.of("./my-project")).sync(true).build();
try (var engine = PythonEngine.create(env)) {
    String msg = engine.invokeFunction("greeting", "greet", List.of("World"), String.class);
    int count = engine.invokeFunction("greeting", "wordCount", List.of("one two three"), int.class);
}

Option B: Pre-existing Python environment

If you already have a venv (from CI, a container, or manual uv sync):

cd my-project
uv sync  # one command - creates .venv, installs Python + deps

Then in Java, skip the sync:

var env = PythonEnv.venv(Path.of("./my-project/.venv")).build();
try (var engine = PythonEngine.create(env)) {
    // ready to use - sys.path configured automatically
}

Option C: Production container

Prepare the Python environment at Docker build time - zero first-run latency:

FROM eclipse-temurin:25-jre
WORKDIR /app

# Python dependency spec - changing these invalidates the layer cache
COPY pyproject.toml uv.lock ./
COPY target/lib/cpython4j.jar ./

# Install Python + dependencies into .venv (cached across rebuilds)
RUN java -jar cpython4j.jar

# Application code - changes here don't re-install Python deps
COPY target/my-app.jar ./
CMD ["java", "-cp", "my-app.jar:lib/*", "com.example.Main"]
var env = PythonEnv.venv(Path.of(".venv")).build();
try (var engine = PythonEngine.create(env)) {
    // ready - venv was pre-built in Docker image
}

For uber JARs, add one line to your main() instead:

public static void main(String[] args) {
    PythonEnv.prepareIfRequested(args);  // enables: java -jar app.jar --cpython4j-prepare
    // ... rest of your app
}

For full manual control (air-gapped, no uv), use PythonEnv.explicit():

var env = PythonEnv.explicit()
    .library(Path.of("/usr/lib/libpython3.13.so"))
    .sitePackages(Path.of("/app/.venv/lib/python3.13/site-packages"))
    .build();

Contract-first API with annotation processor

Define a Java interface, annotate it, and the annotation processor generates a concrete implementation at compile time (no reflection):

@ScriptInterface(module = "analyzer")
public interface Analyzer {
    Analysis analyze(Document document);
    List<String> keywords(String text);
    int wordCount(String text);
}

public record Document(String text, String source) {}
public record Analysis(String language, int wordCount) {}

Python implements the contract:

# analyzer.py
def analyze(document):
    return {"language": "en", "wordCount": len(document["text"].split())}

def keywords(text):
    return text.lower().split()[:5]

def wordCount(text):
    return len(text.split())

Java usage with the generated proxy:

var env = PythonEnv.uvProject(Path.of(".")).sync(true).build();
try (var engine = PythonEngine.create(env)) {
    Analyzer analyzer = new Analyzer_Proxy(engine);
    Analysis result = analyzer.analyze(new Document("Hello from CPython4J", "demo"));
}

Host callbacks (Python calls Java)

Use context on @ScriptInterface to define both directions in one place. Mark Java methods with @HostFunction and they become importable from Python:

class HostApi {
    @HostFunction
    public void log(String message) { System.out.println(message); }

    @HostFunction
    public String reverse(String s) { return new StringBuilder(s).reverse().toString(); }
}

@ScriptInterface(module = "my_module", context = HostApi.class)
public interface MyModule {
    String process(String text);
}

The annotation processor generates both MyModule_Proxy (Java calls Python) and MyModule_Builtins (Python calls Java):

var env = PythonEnv.uvProject(Path.of(".")).sync(true).build();
try (var engine = PythonEngine.builder()
        .withEnv(env)
        .expose(MyModule_Builtins.toHostModule(new HostApi()))
        .build()) {

    // Python can now: from hostapi import log, reverse
    MyModule mod = new MyModule_Proxy(engine);
    mod.process("hello");
}

Docling end-to-end demo

The Docling integration test shows the north-star use case: converting a PDF with Docling while Java classification callbacks fire per-element inside Python's processing loop. See the Python bridge and Java handler for the full example.

Run it with mvn -B install -Pdocling (installs PyTorch + Docling, ~5GB on first run).

See also the Langchain4j agent demo showing a Java AI agent using Python spaCy for named entity extraction with bidirectional callbacks.

Building

mvn -B install

CI

The CI workflow has two jobs:

  • build - runs core + integration tests on Linux, macOS, and Windows with Java 25 + Python 3.13 + uv
  • docling - runs the Docling end-to-end test (Linux only, heavier)

Design documentation

About

EXPERIMENTAL - CPython bindings for the JVM

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors