Llama.cpp as Ollama

Some application nowadays have a proper support of Ollama, but they support llama.cpp only as OpenAI-compatible API despite its functional overlap with Ollama. This limits its capabilities for its local LLM management, like detecting or unloading currently loaded models.

This application mimics llama.cpp as Ollama, serving as a compatibility layer which mimics as Ollama instance and translates Ollama REST endpoints into llama.cpp server calls.

How it works

Ollama client -> o_llama_relay (:11434) -> llama.cpp server (:8080)

/api/* endpoints — Translated from Ollama format to llama.cpp's OpenAI-compatible format.
/v1/* endpoints — Forwarded to llama.cpp without transformation.
Streaming — stream: true is supported for translated /api/generate and /api/chat requests.
unloading - Supported by keep_alive: 0.

Build

go build
./o_llama_relay --port=1234

Docker Compose

LLAMA_CPP_URL=http://host.docker.internal:8080 docker compose up --build

Environment variable	Description
`OLLAMA_PORT`	External port to expose
`LLAMA_CPP_URL`	llama.cpp server URL

Usage

./o_llama_relay [options] [port] [host]

Every parameter can be set via a CLI argument or an environment variable. CLI arguments take priority over environment variables.

CLI argument	Environment variable	Description	Default
`--port <n>` or positional number	`OLLAMA_PORT`	Listen port	`11434`
`--host <addr>` or positional string	`OLLAMA_HOST`	Listen address	`0.0.0.0`
`--llama-cpp-url <url>`	`LLAMA_CPP_URL`	llama.cpp server base URL	`http://127.0.0.1:8080`
`-v`	—	Verbose logging (show request routing)	off
`-vv`	—	Very verbose logging (show bodies/payloads)	off

Positional arguments are supported for backward compatibility: a bare number is treated as the port, a bare string as the host address.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
config.go		config.go
docker-compose.yml		docker-compose.yml
go.mod		go.mod
handler.go		handler.go
handler_chat.go		handler_chat.go
handler_generate.go		handler_generate.go
handler_models.go		handler_models.go
llama_client.go		llama_client.go
main.go		main.go
model_info.go		model_info.go
o_llama_relay_test.go		o_llama_relay_test.go
proxy.go		proxy.go
stream.go		stream.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Llama.cpp as Ollama

How it works

Build

Docker Compose

Usage

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Llama.cpp as Ollama

How it works

Build

Docker Compose

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages