Summary
Add clusterDeletionBehavior to ClusterProfile.spec to specify the behavior when a cluster is deleted with a single key. Three values are available, with RemovePolicies as the default.
LeavePolicies — Leave deployed resources (Helm/manifests) intact
RemovePolicies — Best-effort deletion (MUST NOT block with Runtime Hook)
EnforceRemovePolicies — Ensure deletion (Block using CAPI Runtime Hook until deletion completes)
This provides explicit control for "cluster deletion" behavior, complementing stopMatchingBehavior (behavior when "match is lost").
Proposal (API)
# ClusterProfile (CRD sketch)
spec:
# When the Cluster resource itself is being deleted,
# what should Sveltos do with resources deployed by this ClusterProfile?
clusterDeletionBehavior: LeavePolicies | RemovePolicies | EnforceRemovePolicies
# default: RemovePolicies
Semantics
- LeavePolicies
Do not delete anything (leave resources in place).
- RemovePolicies (default)
Best-effort deletion as much as possible. However, cluster deletion MUST NOT be blocked by Runtime Hook.
Even if Runtime Extension exists, Hook returns immediate success (or unused), and cleanup proceeds non-blocking.
- EnforceRemovePolicies
Stop cluster deletion until deletion completion is observed. Utilizing the BeforeClusterDelete Hook described in CAPI's Lifecycle Hook Runtime Extensions, returns retryAfterSeconds for retry until completion → blocks.
Dependency-aware deletion order (Important)
- Consider
ClusterProfile's dependsOn, execute deletion in reverse dependency order (delete dependents last).
Example: If a depends on b → Delete in order a → b.
- Implementation builds a DAG among
ClusterProfiles for the target cluster and processes in reverse topological order.
Controller behavior (high-level)
- Detect
Cluster with metadata.deletionTimestamp and enumerate associated ClusterProfiles.
- Analyze
dependsOn and sort in reverse topological order (dependents last).
- Apply
clusterDeletionBehavior in sorted order:
LeavePolicies → Leave in place
RemovePolicies → Best-effort deletion (no Hook blocking / async progress)
EnforceRemovePolicies → Wait for completion with Hook coordination (BeforeClusterDelete/retryAfterSeconds)
- Reflect progress/results in
ClusterSummary / status.conditions.
- Backward compatibility: Unspecified defaults to
RemovePolicies.
Examples
Best-effort deletion (default/non-blocking)
apiVersion: config.projectsveltos.io/v1beta1
kind: ClusterProfile
metadata:
name: cleanup-on-delete
spec:
clusterSelector:
matchLabels: { env: prod }
stopMatchingBehavior: RemovePolicies
clusterDeletionBehavior: RemovePolicies
Ensure deletion (block with Hook)
apiVersion: config.projectsveltos.io/v1beta1
kind: ClusterProfile
metadata:
name: strict-cleanup
spec:
clusterSelector:
matchLabels: { env: prod }
stopMatchingBehavior: RemovePolicies
clusterDeletionBehavior: EnforceRemovePolicies
Reference Information
CAPI Runtime Hook (BeforeClusterDelete) Key Points
CAPI's Runtime SDK provides extensions (Runtime Extensions) that can hook into cluster lifecycle. BeforeClusterDelete is called immediately before cluster deletion starts and can block deletion until add-on cleanup completes (by returning retryAfterSeconds for retry). See Cluster API Book's Lifecycle Hook Runtime Extensions for details.
Runtime Extension is implemented as an HTTPS server, registering handlers (e.g., BeforeClusterDelete). Blocking behavior is achieved simply by returning retryAfterSeconds, causing CAPI to retry. For implementation details, see Cluster API Book's Implementing Runtime Extensions.
This design enables implementing "wait until deletion completes" with clusterDeletionBehavior: EnforceRemovePolicies in this proposal. Conversely, RemovePolicies makes Hook immediate success (or unregistered) for async cleanup, achieving non-blocking behavior.
ExtensionConfig Registration Example (CAPI side)
Minimal example of ExtensionConfig to register Runtime Extension to management cluster (Service/TLS prepared separately):
apiVersion: runtime.cluster.x-k8s.io/v1alpha1
kind: ExtensionConfig
metadata:
name: sveltos-cleanup-gate
annotations:
runtime.cluster.x-k8s.io/inject-ca-from-secret: sveltos-cleanup/ext-svc-cert
spec:
clientConfig:
service:
name: sveltos-cleanup-svc # Runtime Extension Service name
namespace: sveltos-cleanup # Deployment namespace
port: 443
namespaceSelector:
matchExpressions:
- key: kubernetes.io/metadata.name
operator: In
values:
- default # Example: Apply to Clusters in default namespace
ExtensionConfig declares "which clusters to apply Runtime Extension to". In this example, Hook is enabled for Cluster under default namespace. For detailed configuration, see Cluster API Book's Implementing Runtime Extensions.
Hook Handler Implementation Minimal Code (Go/pseudo)
Minimal skeleton example for "waiting for add-on uninstall completion" with BeforeClusterDelete (replace actual decision logic per operations):
package main
import (
"context"
ctrl "sigs.k8s.io/controller-runtime"
runtimehooksv1 "sigs.k8s.io/cluster-api/exp/runtime/hooks/api/v1alpha1"
"sigs.k8s.io/cluster-api/exp/runtime/server"
runtimecatalog "sigs.k8s.io/cluster-api/exp/runtime/catalog"
)
var catalog = runtimecatalog.New()
func init() { _ = runtimehooksv1.AddToCatalog(catalog) }
func main() {
s, _ := server.New(server.Options{Catalog: catalog, Port: 9443, CertDir: "/certs"})
_ = s.AddExtensionHandler(server.ExtensionHandler{
Hook: runtimehooksv1.BeforeClusterDelete,
Name: "before-cluster-delete",
HandlerFunc: DoBeforeClusterDelete,
})
_ = s.Start(ctrl.SetupSignalHandler())
}
func DoBeforeClusterDelete(
ctx context.Context,
req *runtimehooksv1.BeforeClusterDeleteRequest,
resp *runtimehooksv1.BeforeClusterDeleteResponse,
) {
log := ctrl.LoggerFrom(ctx)
// Example: Implement HelmChartProxy uninstall completion check here
// (Read management cluster API, check Sveltos/CAAPH status, etc.)
ready := addonsCleanupCompleted(req.Cluster)
if !ready {
resp.Status = runtimehooksv1.ResponseStatusSuccess
resp.Message = "waiting for add-on cleanup"
resp.RetryAfterSeconds = 10 // Block until complete (CAPI will retry)
log.Info(resp.Message)
return
}
resp.Status = runtimehooksv1.ResponseStatusSuccess // Complete → Continue deletion
}
Key point: Simply returning resp.RetryAfterSeconds achieves "stop deletion → retry later". Returning Success instead of failure is more operational (don't fail unless permanent error). This implementation pattern is recommended in Cluster API Book's Runtime Extensions implementation guide.
Actual PoC Testing Notes
In local PoC, confirmed "Block deletion with Hook → Continue deletion after add-on completion" with the following flow:
- Build and deploy sample Runtime Extension (configuration as above)
- Deploy Service + TLS Secret, apply
ExtensionConfig.
- Create test
Cluster and apply add-ons (Helm/manifests).
- Execute
kubectl delete cluster ....
- Hook keeps returning
retryAfterSeconds, deletion pauses.
- Complete uninstall during this time (delete Helm release, etc.) → When check becomes ready, deletion resumes.
Test manifests/scripts are available at kubernetes-playground/capi/runtime-hooks (includes ExtensionConfig/server templates and manual test procedure notes).
Correspondence with This Proposal (clusterDeletionBehavior)
LeavePolicies … Runtime Hook unregistered (or always success) + no deletion.
RemovePolicies (default) … Delete what's possible non-blocking. Runtime Hook unused/immediate success, cleanup is async.
EnforceRemovePolicies … Leverage CAPI's Lifecycle Hook feature, enable BeforeClusterDelete and block with retryAfterSeconds. Continue deletion after observing cleanup completion.
Notes
The clusterDeletionBehavior: EnforceRemovePolicies option, which uses the CAPI Runtime Hook (BeforeClusterDelete), is only supported for clusters created using a ClusterClass.
For clusters not created with a ClusterClass, the Runtime Hook will not be invoked (see reference: kubernetes-sigs/cluster-api#11491).
Implementation details:
- The controller logic for
EnforceRemovePolicies starts, like RemovePolicies, when the target cluster has metadata.deletionTimestamp set.
- For clusters created with a ClusterClass, the CAPI Runtime Hook (
BeforeClusterDelete) will also be delivered to the controller.
Upon receiving this Hook, the controller runs an additional blocking step to wait for add-on cleanup to complete.
- For non-ClusterClass clusters, the Hook is not triggered, so deletion proceeds asynchronously without blocking, just like
RemovePolicies.
- The only difference between
EnforceRemovePolicies and RemovePolicies is whether a blocking step is executed in the Runtime Hook server.
SveltosCluster case:
Deleting a SveltosCluster resource does not necessarily mean the cluster itself is being deleted.
It simply means the cluster is no longer managed by Sveltos. Therefore, no special deletion or blocking behavior is performed.
Summary
Add
clusterDeletionBehaviortoClusterProfile.specto specify the behavior when a cluster is deleted with a single key. Three values are available, withRemovePoliciesas the default.LeavePolicies— Leave deployed resources (Helm/manifests) intactRemovePolicies— Best-effort deletion (MUST NOT block with Runtime Hook)EnforceRemovePolicies— Ensure deletion (Block using CAPI Runtime Hook until deletion completes)This provides explicit control for "cluster deletion" behavior, complementing
stopMatchingBehavior(behavior when "match is lost").Proposal (API)
Semantics
Do not delete anything (leave resources in place).
Best-effort deletion as much as possible. However, cluster deletion MUST NOT be blocked by Runtime Hook.
Even if Runtime Extension exists, Hook returns immediate success (or unused), and cleanup proceeds non-blocking.
Stop cluster deletion until deletion completion is observed. Utilizing the
BeforeClusterDeleteHook described in CAPI's Lifecycle Hook Runtime Extensions, returnsretryAfterSecondsfor retry until completion → blocks.Dependency-aware deletion order (Important)
ClusterProfile'sdependsOn, execute deletion in reverse dependency order (delete dependents last).Example: If
adepends onb→ Delete in ordera→b.ClusterProfilesfor the target cluster and processes in reverse topological order.Controller behavior (high-level)
Clusterwithmetadata.deletionTimestampand enumerate associatedClusterProfiles.dependsOnand sort in reverse topological order (dependents last).clusterDeletionBehaviorin sorted order:LeavePolicies→ Leave in placeRemovePolicies→ Best-effort deletion (no Hook blocking / async progress)EnforceRemovePolicies→ Wait for completion with Hook coordination (BeforeClusterDelete/retryAfterSeconds)ClusterSummary/status.conditions.RemovePolicies.Examples
Best-effort deletion (default/non-blocking)
Ensure deletion (block with Hook)
Reference Information
CAPI Runtime Hook (BeforeClusterDelete) Key Points
CAPI's Runtime SDK provides extensions (Runtime Extensions) that can hook into cluster lifecycle.
BeforeClusterDeleteis called immediately before cluster deletion starts and can block deletion until add-on cleanup completes (by returningretryAfterSecondsfor retry). See Cluster API Book's Lifecycle Hook Runtime Extensions for details.Runtime Extension is implemented as an HTTPS server, registering handlers (e.g.,
BeforeClusterDelete). Blocking behavior is achieved simply by returningretryAfterSeconds, causing CAPI to retry. For implementation details, see Cluster API Book's Implementing Runtime Extensions.ExtensionConfig Registration Example (CAPI side)
Minimal example of
ExtensionConfigto register Runtime Extension to management cluster (Service/TLS prepared separately):ExtensionConfigdeclares "which clusters to apply Runtime Extension to". In this example, Hook is enabled forClusterunder default namespace. For detailed configuration, see Cluster API Book's Implementing Runtime Extensions.Hook Handler Implementation Minimal Code (Go/pseudo)
Minimal skeleton example for "waiting for add-on uninstall completion" with
BeforeClusterDelete(replace actual decision logic per operations):Key point: Simply returning
resp.RetryAfterSecondsachieves "stop deletion → retry later". ReturningSuccessinstead of failure is more operational (don't fail unless permanent error). This implementation pattern is recommended in Cluster API Book's Runtime Extensions implementation guide.Actual PoC Testing Notes
In local PoC, confirmed "Block deletion with Hook → Continue deletion after add-on completion" with the following flow:
ExtensionConfig.Clusterand apply add-ons (Helm/manifests).kubectl delete cluster ....retryAfterSeconds, deletion pauses.Test manifests/scripts are available at kubernetes-playground/capi/runtime-hooks (includes ExtensionConfig/server templates and manual test procedure notes).
Correspondence with This Proposal (
clusterDeletionBehavior)LeavePolicies… Runtime Hook unregistered (or always success) + no deletion.RemovePolicies (default)… Delete what's possible non-blocking. Runtime Hook unused/immediate success, cleanup is async.EnforceRemovePolicies… Leverage CAPI's Lifecycle Hook feature, enableBeforeClusterDeleteand block withretryAfterSeconds. Continue deletion after observing cleanup completion.Notes
The
clusterDeletionBehavior: EnforceRemovePoliciesoption, which uses the CAPI Runtime Hook (BeforeClusterDelete), is only supported for clusters created using a ClusterClass.For clusters not created with a ClusterClass, the Runtime Hook will not be invoked (see reference: kubernetes-sigs/cluster-api#11491).
Implementation details:
EnforceRemovePoliciesstarts, likeRemovePolicies, when the target cluster hasmetadata.deletionTimestampset.BeforeClusterDelete) will also be delivered to the controller.Upon receiving this Hook, the controller runs an additional blocking step to wait for add-on cleanup to complete.
RemovePolicies.EnforceRemovePoliciesandRemovePoliciesis whether a blocking step is executed in the Runtime Hook server.SveltosCluster case:
Deleting a
SveltosClusterresource does not necessarily mean the cluster itself is being deleted.It simply means the cluster is no longer managed by Sveltos. Therefore, no special deletion or blocking behavior is performed.