Add your own functions and tables to DuckDB — written in Rust, shipped as one binary.
No C++ extension to compile, no linking against DuckDB, no version coupling.
Created by Query.Farm
A VGI worker is a small Rust program that DuckDB talks to over Apache Arrow IPC. It can expose scalar / table / aggregate functions and whole catalogs (schemas, tables, views) that behave like native DuckDB objects. DuckDB launches your worker for you when a query needs it — you never run a server by hand.
This repo is the Rust worker SDK (vgi). It is
byte-for-byte wire-compatible with the canonical
Python SDK, so a Rust worker
drops in behind the same ATTACH ... (TYPE vgi). Built on
vgi-rpc; stock arrow-rs 58.x, MSRV 1.86.
| Traditional DuckDB extension | VGI worker |
|---|---|
| Written in C/C++, compiled and linked against DuckDB | Written in Rust, one standalone binary |
| Must be rebuilt for each DuckDB version | Version independent |
| Complex build / signing / release cycle | cargo build, ship the binary |
| Runs in-process | Process isolation |
Reach for it when you want to: call REST APIs from SQL, run ML inference, expose an external database / API / filesystem as a queryable catalog, or ship domain-specific functions to your team as a single binary.
1. Create a project and add the dependencies:
# Cargo.toml
[dependencies]
vgi = "0.1"
vgi-rpc = "0.2"
arrow-array = "58"
arrow-schema = "58"2. Write a function and serve it:
// src/main.rs
use std::sync::Arc;
use arrow_array::{cast::AsArray, ArrayRef, RecordBatch, StringArray};
use arrow_schema::DataType;
use vgi::{ArgSpec, FunctionMetadata, ProcessParams, ScalarFunction, Worker};
use vgi_rpc::{Result, RpcError};
/// `upper_case(s)` — uppercase a string column.
struct UpperCase;
impl ScalarFunction for UpperCase {
fn name(&self) -> &str {
"upper_case"
}
fn metadata(&self) -> FunctionMetadata {
FunctionMetadata {
description: "Convert string values to uppercase".into(),
return_type: Some(DataType::Utf8),
..Default::default()
}
}
fn argument_specs(&self) -> Vec<ArgSpec> {
vec![ArgSpec::column("value", 0, "varchar", "String to uppercase")]
}
fn process(&self, params: &ProcessParams, batch: &RecordBatch) -> Result<RecordBatch> {
let col = batch.column(0).as_string::<i32>();
let upper: StringArray = col.iter().map(|v| v.map(str::to_uppercase)).collect();
let out: ArrayRef = Arc::new(upper);
RecordBatch::try_new(params.output_schema.clone(), vec![out])
.map_err(|e| RpcError::runtime_error(e.to_string()))
}
}
fn main() {
let mut worker = Worker::new();
worker.register_scalar(UpperCase);
worker.run(); // serves stdio (default), --unix <path>, or --http
}3. Build it (cargo build --release), then call it from a DuckDB engine
that has the vgi extension. The vgi extension currently ships with Query
Farm's Haybarn DuckDB
distribution, which starts with no install via uvx haybarn-cli. From your
project directory:
-- Haybarn ships the `vgi` extension. DuckDB LAUNCHES the worker for you;
-- LOCATION is the command it runs, and the alias 'demo' is what you
-- qualify functions with in SQL.
ATTACH 'demo' (TYPE vgi, LOCATION './target/release/my-worker');
SELECT demo.main.upper_case(name) FROM (VALUES ('alice'), ('bob')) t(name);
-- ALICE
-- BOB
LOCATIONgotcha: the path is resolved relative to DuckDB's working directory, not your project. If the worker isn't found, use an absolute path.
Change your Rust, rebuild, and re-attach — DuckDB pools the worker per attachment,
so DETACH demo; ATTACH 'demo' (...) (or a fresh session) picks up a new build.
ATTACHcan't find the worker —LOCATIONis relative to DuckDB's working directory; use an absolute path.Catalog Error: ... does not exist— qualify with the attach alias (demo.main.upper_case) or runUSE demo;.- Runtime / type errors — errors returned from
process(and bind-timeargument_specstype checks) surface directly in DuckDB's error message.
| Type | Trait | SQL pattern | Use case |
|---|---|---|---|
| Scalar | ScalarFunction |
SELECT f(col) FROM t |
Per-row transforms (1:1) |
| Table | TableFunction |
SELECT * FROM f(args) |
Generate / scan data |
| Table-In-Out | TableInOutFunction |
SELECT * FROM f((SELECT …)) |
Streaming transforms |
| Table-Buffering | TableBufferingFunction |
SELECT * FROM f((SELECT …)) |
Aggregate-then-emit (sink → combine → source) |
| Aggregate | AggregateFunction |
SELECT f(col) … GROUP BY … |
Grouped / window / streaming aggregates |
Each trait is small: name, metadata, argument_specs, an on_bind to resolve
the output schema, and process (or the buffering / aggregate lifecycle methods).
Projection & filter pushdown, ORDER BY / TABLESAMPLE hints, settings, secrets,
bearer auth, and a cross-process state store are handled for you.
Worker::set_catalog exposes a complete catalog — schemas, function-backed
tables, views, and macros — with constraints, column statistics, time
travel (AT), and secondary catalogs attachable by name:
ATTACH 'external_db' (TYPE vgi, LOCATION './my-catalog-worker');
SELECT * FROM external_db.main.users; -- a function-backed table
SELECT * FROM external_db.analytics.daily_view; -- a view
SELECT external_db.main.transform(col) FROM t; -- a functionWorker::run picks the transport from argv: stdio (default), Unix socket
(--unix <path>, the launcher contract), or HTTP (--http, Arrow-IPC over
HTTP with AEAD-sealed stateless stream tokens and optional bearer auth).
VGI uses vgi-rpc, an Apache-Arrow-IPC RPC
framework, for all DuckDB ↔ worker communication. You don't write to this
directly — the traits handle it — but here's what happens per query:
DuckDB (client) VGI worker
│──── bind(request) ─────────────▶ │ function name, args, input schema
│◀─── BindResponse ─────────────── │ output schema (your on_bind)
│──── init(request) ─────────────▶ │ start the processing stream
│◀─── stream header ────────────── │ execution_id, max_workers
│──── process(batch) ────────────▶ │
│◀─── output batch ─────────────── │ your process(batch)
│──── [stream close] ────────────▶ │
| crate | published | summary |
|---|---|---|
vgi/ |
✅ crates.io · docs.rs | The worker SDK: function models, declarative catalogs, wire dispatch, transports. |
vgi-example-worker/ |
— | A fixture worker registering every function kind and full catalogs; drives the integration suite. publish = false. |
Read vgi-example-worker/src/ for a working example of every trait — scalar,
table, table-in-out, buffering, aggregate, and catalog-backed tables/views.
The fastest check is to call your function from a DuckDB session (see "Your first
worker"). For automated tests, drive the worker from Rust with vgi-rpc's client,
or shell out to a DuckDB session from your test harness.
The full behavioral suite is the canonical VGI C++ integration suite
(test/sql/integration/* in the vgi extension repo), which drives DuckDB's
unittest binary against the example worker. It passes across all three transports
(8176 assertions on subprocess, 7774 on HTTP, 0 failures):
cargo build --release
scripts/run_tests.sh # subprocess transport, full in-scope suite
LAUNCH=1 scripts/run_tests.sh # launcher (Unix socket) transport
scripts/run_http_tests.sh # HTTP transportcargo fmt / clippy / build / doc run in CI.
vgi depends on the published vgi-rpc from crates.io. To develop against an
unreleased vgi-rpc checkout, add an uncommitted patch to the root
Cargo.toml:
[patch.crates-io]
vgi-rpc = { path = "../vgi-rpc-rust/vgi-rpc" }cargo build --workspace
cargo clippy -p vgi --all-targets --all-features -- -D warnings
cargo test --doc -p vgi
cargo fmt --allQuery Farm Source-Available License v1.0 — see LICENSE. Copyright © 2025, 2026 Query Farm LLC.
