Skip to content

redpanda-data/connect

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7,407 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Redpanda Connect

Build Status Apache V2 API Enterprise API

Redpanda Connect is a stream processor that moves data between a wide range of sources and sinks, with support for hydration, enrichment, transformation, and filtering along the way.

That includes a rich set of change-data-capture (CDC) connectors — for Postgres, MySQL, MongoDB, Oracle, MSSQL, and more — so database changes can flow through your pipelines as first-class events.

It uses Bloblang for mapping, runs as a single static binary or container image, and is easy to operate and monitor.

Highlights

  • Declarative pipelines — a stream topology fits in a single YAML file.
  • At-least-once delivery by default — in-process transactions, no disk state required.
  • A large connector catalog — cloud services, message brokers, databases, HTTP, and more.
  • First-class CDC — change-data-capture connectors for Postgres, MySQL, MongoDB, Oracle, and MSSQL.
  • Bloblang — a mapping language designed for stream data.
  • Cloud-friendly — stateless and horizontally scalable, with metrics and tracing built in.

Example

input:
  gcp_pubsub:
    project: foo
    subscription: bar

pipeline:
  processors:
    - mapping: |
        root.message = this
        root.meta.link_count = this.links.length()
        root.user.age = this.user.age.number()

output:
  redis_streams:
    url: tcp://TODO:6379
    stream: baz
    max_in_flight: 20

Quickstart

Install

Linux:

curl -LO https://github.com/redpanda-data/redpanda/releases/latest/download/rpk-linux-amd64.zip
unzip rpk-linux-amd64.zip -d ~/.local/bin/

macOS (Homebrew):

brew install redpanda-data/tap/redpanda

Docker:

docker pull docker.redpanda.com/redpandadata/connect

See the getting started guide for more options.

Run

rpk connect run ./config.yaml

With Docker:

# From a config file
docker run --rm -v /path/to/your/config.yaml:/connect.yaml docker.redpanda.com/redpandadata/connect run

# With inline overrides
docker run --rm -p 4195:4195 docker.redpanda.com/redpandadata/connect run \
  -s "input.type=http_server" \
  -s "output.type=kafka" \
  -s "output.kafka.addresses=kafka-server:9092" \
  -s "output.kafka.topic=redpanda_topic"

Connectors

The catalog includes AWS (DynamoDB, Kinesis, S3, SQS, SNS), Azure (Blob, Queue, Table), GCP (Pub/Sub, Cloud Storage, BigQuery), Kafka, NATS (JetStream, Streaming), NSQ, MQTT, AMQP 0.91 (RabbitMQ), AMQP 1, Redis, Cassandra, Elasticsearch, HDFS, HTTP (server, client, websockets), MongoDB, and SQL (MySQL, PostgreSQL, ClickHouse, MSSQL) — and a lot more in the components documentation.

Delivery guarantees

Delivery guarantees can be a tricky subject. Redpanda Connect processes and acknowledges messages using an in-process transaction model with no disk-persisted state, so when it's connecting at-least-once sources and sinks it can guarantee at-least-once delivery — even through crashes, disk corruption, or other server faults.

That's the default, with no caveats, which keeps deployment and scaling straightforward.

Observability

Health checks

Two HTTP endpoints are exposed for orchestration probes:

  • /ping — liveness probe; always returns 200.
  • /ready — readiness probe; returns 200 once both input and output are connected, otherwise 503.

Metrics

Redpanda Connect exposes metrics to Statsd, Prometheus, a JSON HTTP endpoint, and other backends.

Tracing

OpenTelemetry traces are emitted natively, so you can visualize what's happening inside a pipeline end-to-end.

Configuration

Redpanda Connect ships with tooling for configuration discovery, debugging, and organization — see the configuration guide.

Documentation

Build from source

Requires a currently supported Go version:

git clone git@github.com:redpanda-data/connect
cd connect
task build:all

Plugins with external dependencies

Components that link against external C libraries (for example zmq4) aren't included by default. To pull them in, set the x_benthos_extra build tag:

# With go
go install -tags "x_benthos_extra" github.com/redpanda-data/connect/v4/cmd/redpanda-connect@latest

# With task
TAGS=x_benthos_extra task build:all

This tag may change or be split into more granular tags in future releases. If the required system libraries aren't installed, the build will fail with an error like ld: library not found for -lzmq.

Docker image

A multi-stage Dockerfile builds a minimal scratch-based image:

task docker:all
docker run --rm \
    -v /path/to/your/config.yaml:/config.yaml \
    -v /tmp/data:/data \
    -p 4195:4195 \
    docker.redpanda.com/redpandadata/connect run /config.yaml

Custom plugins

Writing your own plugins in Go is straightforward — check out the API docs and the example plugin repository for reference implementations.

Development

Redpanda Connect uses golangci-lint for linting and gofumpt for formatting. You can configure your editor to use gofumpt automatically — instructions are here.

task fmt    # format the codebase
task lint   # lint the codebase
task test   # unit and template tests

Contributing

Contributions are welcome. Before opening a pull request, please make sure it has been:

  • Unit tested with task test
  • Linted with task lint
  • Formatted with task fmt

Most integration tests spin up Docker containers, so they're skipped by task test. You can run them individually with:

go test -run "^Test.*Integration.*$" ./internal/impl/<connector directory>/...