Introduction
This proposal introduces a backup and restore capability for ClickHouse using a plugin-based architecture #1792
The solution is divided into two main parts:
-
Operator Part
- Extends the ClickHouse Operator with new Custom Resource Definitions (CRDs) for managing backups.
- Reconciles these resources and delegates execution to an external plugin.
- Handles scheduling, lifecycle management, and integration with the ClickHouseInstallation (CHI) object.
-
Plugin Part
- A standalone Go-based gRPC service that implements backup and restore logic.
- Receives serialized CRD definitions from the operator and performs actual backup operations.
- Provides well-defined APIs for Backup and Restore actions, returning status and metadata.
By separating orchestration (operator) from execution (plugin), this design ensures clean separation of concerns, easier extensibility, and the possibility for different backup implementations without modifying operator core code.
1. Operator Part
The operator will be extended to support Backup and Restore functionality via a plugin-based architecture. This is aligned with the ClickHouse Operator Plugin Interface (COP-I), which introduces modular gRPC-based extensions for auxiliary features.
New Custom Resources (CRDs)
Two new CRDs will be introduced:
-
ClickhouseBackup (CHB)
- Represents a single backup request.
- Defines scope (
dbTable whitelist/blacklist), destination (e.g., S3), credentials, and metadata.
- Operator responsibility:
- Serialize CHB into JSON and send to the backup plugin.
- Monitor backup status and update CR status (
running, completed, failed).
-
ClickhouseScheduledBackup (CHSB)
- Represents scheduled backups.
- Supports cron-like schedules (
schedule field).
- Options:
immediate (trigger immediately), suspend (pause).
- Operator responsibility:
- Manage recurring backup triggers.
- Ensure backup CRs are created as per schedule.
- Route definitions to the plugin.
Operator Responsibilities
-
CR Lifecycle Management
Ensure CHB/CHSB resources are reconciled, status updated, and cleanup performed.
-
Plugin Discovery
Detect backup plugin services via:
altinity.com/pluginName label (e.g., clickhouse.backup.altinity.com)
altinity.com/pluginPort annotation.
-
gRPC Invocation
Marshal CHI + Backup specs into JSON, invoke plugin APIs, and update CR status.
-
Restore Flow
Extend ClickHouseInstallation (CHI) CR with a bootstrap.recovery section pointing to a backupRef.
2. Plugin Part
The backup plugin will be implemented as a gRPC service deployed independently from the operator.
The operator communicates with it using the defined protobuf contracts.
Exposed gRPC APIs
1. Backup API
BackupRequest
chi_definition: JSON of the target ClickHouseInstallation
backup_definition: JSON of the ClickhouseBackup / ClickhouseScheduledBackup
parameters: Optional overrides (compression, retention policy, etc.)
BackupResult
backup_id, backup_name
started_at, stopped_at
metadata: Plugin-specific info (S3 path, compression type, etc.)
2. Restore API
RestoreRequest
chi_definition: JSON of the cluster to restore
backup_definition: JSON of the ClickhouseBackup / ClickhouseScheduledBackup
RestoreResponse
restore_id, restore_name
started_at, stopped_at
metadata: Additional info (restored tables, PITR info, etc.)
Features
-
Backup Types
- Cluster-wide or per-db/table (with whitelist/blacklist)
- Default: backup all except
system schemas
-
Storage
- S3-compatible destinations
-
Scheduling
- Cron-based recurring backups
-
Restore
- Support bootstrap recovery from defined
backupRef
Benefits
- Separation of Concerns: Operator focuses on orchestration, plugin handles backup mechanics
- Modularity: Backup logic can evolve independently
- Extensibility: Community or vendors can build custom backup plugins without forking operator code
- Consistency: Standard gRPC-based interface ensures compatibility
Introduction
This proposal introduces a backup and restore capability for ClickHouse using a plugin-based architecture #1792
The solution is divided into two main parts:
Operator Part
Plugin Part
By separating orchestration (operator) from execution (plugin), this design ensures clean separation of concerns, easier extensibility, and the possibility for different backup implementations without modifying operator core code.
1. Operator Part
The operator will be extended to support Backup and Restore functionality via a plugin-based architecture. This is aligned with the ClickHouse Operator Plugin Interface (COP-I), which introduces modular gRPC-based extensions for auxiliary features.
New Custom Resources (CRDs)
Two new CRDs will be introduced:
ClickhouseBackup (CHB)
dbTablewhitelist/blacklist), destination (e.g., S3), credentials, and metadata.running,completed,failed).ClickhouseScheduledBackup (CHSB)
schedulefield).immediate(trigger immediately),suspend(pause).Operator Responsibilities
CR Lifecycle Management
Ensure CHB/CHSB resources are reconciled, status updated, and cleanup performed.
Plugin Discovery
Detect backup plugin services via:
altinity.com/pluginNamelabel (e.g.,clickhouse.backup.altinity.com)altinity.com/pluginPortannotation.gRPC Invocation
Marshal CHI + Backup specs into JSON, invoke plugin APIs, and update CR status.
Restore Flow
Extend
ClickHouseInstallation(CHI) CR with abootstrap.recoverysection pointing to abackupRef.2. Plugin Part
The backup plugin will be implemented as a gRPC service deployed independently from the operator.
The operator communicates with it using the defined protobuf contracts.
Exposed gRPC APIs
1. Backup API
BackupRequest
chi_definition: JSON of the target ClickHouseInstallationbackup_definition: JSON of the ClickhouseBackup / ClickhouseScheduledBackupparameters: Optional overrides (compression, retention policy, etc.)BackupResult
backup_id,backup_namestarted_at,stopped_atmetadata: Plugin-specific info (S3 path, compression type, etc.)2. Restore API
RestoreRequest
chi_definition: JSON of the cluster to restorebackup_definition: JSON of the ClickhouseBackup / ClickhouseScheduledBackupRestoreResponse
restore_id,restore_namestarted_at,stopped_atmetadata: Additional info (restored tables, PITR info, etc.)Features
Backup Types
systemschemasStorage
Scheduling
Restore
backupRefBenefits