An experimental project exploring how to call Python from Java, in-process, with full native extension support. Inspired by Java Project Detroit and the growing need for Java/Python interop in AI workloads.
CPython4J runs real CPython via the Foreign Function & Memory API (Panama), with a focus on developer experience: type-safe contract interfaces, annotation-processed proxies, automatic Python environment management via uv, and bidirectional Java/Python calls.
The north-star use case is Docling: running an AI document processing pipeline from Java, with Java classification callbacks firing per-element inside Python's loop. PyTorch, NumPy, OpenCV all just work because it's real CPython.
This is not a Wasm-based approach. Projects like trino-wasm-python and boomslang compile CPython to WebAssembly for sandboxed execution. The tradeoff is that every native Python library (NumPy, PyTorch, OpenCV, etc.) must also be compiled to Wasm and statically linked into the build. This makes it impractical to support the broader Python ecosystem, especially AI/ML libraries with complex native dependencies. CPython4J uses the system's native CPython instead, so any pip-installable package works out of the box.
- Java 25+
- uv (for Python environment management)
- A shared libpython (uv installs one automatically with
--python-preference managed)
Create a pyproject.toml with your Python dependencies:
[project]
name = "my-project"
version = "0.1.0"
requires-python = ">=3.13"
dependencies = ["cowsay"]Write a Python module:
# greeting.py
import cowsay
def greet(name):
return cowsay.get_output_string("cow", "Hello " + name)
def wordCount(text):
return len(text.split())Use it from Java - CPython4J runs uv sync, installs Python and dependencies, and configures everything:
var env = PythonEnv.uvProject(Path.of("./my-project")).sync(true).build();
try (var engine = PythonEngine.create(env)) {
String msg = engine.invokeFunction("greeting", "greet", List.of("World"), String.class);
int count = engine.invokeFunction("greeting", "wordCount", List.of("one two three"), int.class);
}If you already have a venv (from CI, a container, or manual uv sync):
cd my-project
uv sync # one command - creates .venv, installs Python + depsThen in Java, skip the sync:
var env = PythonEnv.venv(Path.of("./my-project/.venv")).build();
try (var engine = PythonEngine.create(env)) {
// ready to use - sys.path configured automatically
}Prepare the Python environment at Docker build time - zero first-run latency:
FROM eclipse-temurin:25-jre
WORKDIR /app
# Python dependency spec - changing these invalidates the layer cache
COPY pyproject.toml uv.lock ./
COPY target/lib/cpython4j.jar ./
# Install Python + dependencies into .venv (cached across rebuilds)
RUN java -jar cpython4j.jar
# Application code - changes here don't re-install Python deps
COPY target/my-app.jar ./
CMD ["java", "-cp", "my-app.jar:lib/*", "com.example.Main"]var env = PythonEnv.venv(Path.of(".venv")).build();
try (var engine = PythonEngine.create(env)) {
// ready - venv was pre-built in Docker image
}For uber JARs, add one line to your main() instead:
public static void main(String[] args) {
PythonEnv.prepareIfRequested(args); // enables: java -jar app.jar --cpython4j-prepare
// ... rest of your app
}For full manual control (air-gapped, no uv), use PythonEnv.explicit():
var env = PythonEnv.explicit()
.library(Path.of("/usr/lib/libpython3.13.so"))
.sitePackages(Path.of("/app/.venv/lib/python3.13/site-packages"))
.build();Define a Java interface, annotate it, and the annotation processor generates a concrete implementation at compile time (no reflection):
@ScriptInterface(module = "analyzer")
public interface Analyzer {
Analysis analyze(Document document);
List<String> keywords(String text);
int wordCount(String text);
}
public record Document(String text, String source) {}
public record Analysis(String language, int wordCount) {}Python implements the contract:
# analyzer.py
def analyze(document):
return {"language": "en", "wordCount": len(document["text"].split())}
def keywords(text):
return text.lower().split()[:5]
def wordCount(text):
return len(text.split())Java usage with the generated proxy:
var env = PythonEnv.uvProject(Path.of(".")).sync(true).build();
try (var engine = PythonEngine.create(env)) {
Analyzer analyzer = new Analyzer_Proxy(engine);
Analysis result = analyzer.analyze(new Document("Hello from CPython4J", "demo"));
}Use context on @ScriptInterface to define both directions in one place. Mark Java methods with @HostFunction and they become importable from Python:
class HostApi {
@HostFunction
public void log(String message) { System.out.println(message); }
@HostFunction
public String reverse(String s) { return new StringBuilder(s).reverse().toString(); }
}
@ScriptInterface(module = "my_module", context = HostApi.class)
public interface MyModule {
String process(String text);
}The annotation processor generates both MyModule_Proxy (Java calls Python) and MyModule_Builtins (Python calls Java):
var env = PythonEnv.uvProject(Path.of(".")).sync(true).build();
try (var engine = PythonEngine.builder()
.withEnv(env)
.expose(MyModule_Builtins.toHostModule(new HostApi()))
.build()) {
// Python can now: from hostapi import log, reverse
MyModule mod = new MyModule_Proxy(engine);
mod.process("hello");
}The Docling integration test shows the north-star use case: converting a PDF with Docling while Java classification callbacks fire per-element inside Python's processing loop. See the Python bridge and Java handler for the full example.
Run it with mvn -B install -Pdocling (installs PyTorch + Docling, ~5GB on first run).
See also the Langchain4j agent demo showing a Java AI agent using Python spaCy for named entity extraction with bidirectional callbacks.
mvn -B installThe CI workflow has two jobs:
- build - runs core + integration tests on Linux, macOS, and Windows with Java 25 + Python 3.13 + uv
- docling - runs the Docling end-to-end test (Linux only, heavier)
- Constitution - 12 design principles
- Product thesis - positioning and north-star use case (Docling)
- Architecture - runtime shape, distribution modes, module layout
- Contract-first API design
- Conversion model - primitives + JSON structural conversion
- Callbacks - pure FFM, no native helper
- Python environments - uvProject, venv, explicit, bundled
- Implementation plan - phases with estimates
- Risks and Open questions
- ADRs: 0001 | 0002 | 0003 | 0004 | 0005 | 0006