Skip to content

feat(cloudflare): custom hostname SNI overrides and migration support#6211

Open
mrozentsvayg wants to merge 6 commits intokubernetes-sigs:masterfrom
conduitxyz:mrozentsvayg/custom_origin_sni
Open

feat(cloudflare): custom hostname SNI overrides and migration support#6211
mrozentsvayg wants to merge 6 commits intokubernetes-sigs:masterfrom
conduitxyz:mrozentsvayg/custom_origin_sni

Conversation

@mrozentsvayg
Copy link
Copy Markdown
Contributor

@mrozentsvayg mrozentsvayg commented Feb 21, 2026

What does it do ?

Three improvements to Cloudflare custom hostname management:

1. Origin SNI overrides

Extends the cloudflare-custom-hostname annotation to support Origin SNI
overrides
using the format <customHostname>=<customOriginSNI>.

  • No = suffix: SNI defaults to the origin server (existing behaviour, unchanged)
  • Trailing = with no value: SNI is set to the request Host header
  • =<value>: SNI is set to the specified hostname

The custom_origin_sni field is only sent to the Cloudflare API when
explicitly overridden (i.e. when SNI differs from the origin server),
since the field requires a Cloudflare account entitlement.

2. Skip self-referential custom hostnames

When a custom hostname resolves to a DNS name in the same managed zone,
creating a CH would be self-referential and is skipped with a warning.
This prevents Cloudflare API errors when the custom hostname and origin
are in the same zone.

3. Skip CH lifecycle with annotation value "-"

Setting the annotation to - tells external-dns to skip custom hostname
management for that record entirely — no create, update, or delete.
Existing custom hostnames in Cloudflare are left untouched.

This enables migrating CH ownership to another controller (e.g. a
Kubernetes operator) without triggering SSL re-validation from
delete+create cycles.

Motivation

Cloudflare for SaaS supports routing custom hostname traffic to different
backends based on SNI. Without SNI control, all custom hostnames must share
the same TLS routing, which is too restrictive for multi-tenant setups where
different customers need traffic routed to different origins at the TLS layer.

For example, when using Envoy Gateway in Merged Gateways deployment mode,
each custom hostname needs its own SNI to correctly route TLS traffic to the
right backend — something that wasn't possible before without manual
Cloudflare configuration.

The - sentinel value solves a critical migration problem: removing the
CH annotation causes external-dns to delete existing custom hostnames,
which triggers SSL re-validation (DCV). Cloudflare rate-limits DCV
attempts, and a delete+create loop can cause SSL outages that persist
until the rate limit expires (hours).

More

  • Yes, this PR title follows Conventional Commits
  • Yes, I added unit tests
  • Yes, I updated end user documentation accordingly

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mloiseleur for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added provider Issues or PRs related to a provider registry Issues or PRs related to a registry needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 21, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @mrozentsvayg. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 21, 2026
@ivankatliarchuk
Copy link
Copy Markdown
Member

ivankatliarchuk commented Feb 25, 2026

This PR is more about traffic routing than DNS management, so very high chances it may be out of scope. It would be worth double-checking with the owners on Slack to see what they think.

@ivankatliarchuk
Copy link
Copy Markdown
Member

Similar one #6085

@AndrewCharlesHay
Copy link
Copy Markdown
Contributor

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 14, 2026
@mrozentsvayg mrozentsvayg force-pushed the mrozentsvayg/custom_origin_sni branch from d264a2b to d812810 Compare March 14, 2026 18:18
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 14, 2026
@mrozentsvayg mrozentsvayg changed the title feat(cloudflare): support custom hostname Origin SNI overrides feat(cloudflare): custom hostname SNI overrides and migration support Mar 14, 2026
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 24, 2026
Adds <customHostname>=<customOriginSNI> annotation format for controlling
the Origin SNI on Cloudflare custom hostnames. A trailing = sets SNI to
the request Host header; omitting = defaults SNI to the origin server.

Only sends custom_origin_sni to the API when explicitly overridden, since
the field requires a Cloudflare account entitlement.
When the cloudflare-custom-hostname annotation is set to "-", external-dns
skips CH management entirely for that record — no create, update, or delete.
Existing custom hostnames in Cloudflare are left untouched. This enables
migrating CH ownership to another controller without triggering SSL
re-validation from delete+create cycles.
Store custom hostnames from Records() on the provider struct. In
AdjustEndpoints(), when annotation is "-", replace it with the current
CF state so desired matches current and the plan produces no changes.

Extract chAnnotationHostname() to deduplicate SNI annotation format
logic between groupByNameAndTypeWithCustomHostnames and the new
chAnnotationForOrigin() helper.
…r custom hostnames

The '-' sentinel prevents custom hostname creation/updates but does not
prevent deletion during DNS record lifecycle changes (e.g., parentRef
transitions). Document this limitation and provide a step-by-step
migration guide for transferring CH management to an external controller.
@mrozentsvayg mrozentsvayg force-pushed the mrozentsvayg/custom_origin_sni branch from 08bc665 to b7cec8b Compare March 24, 2026 17:56
@mrozentsvayg
Copy link
Copy Markdown
Contributor Author

@mloiseleur
this was discussed in #external-dns Slack; no blockers raised. With origin SNI support and coexistence with external controllers, this PR helps make the transition of non-DNS CF features out of external-dns smooth for existing users.

What do you think?


2. **Verify the other controller is managing all custom hostnames correctly.** Both can coexist safely at this stage -- `-` prevents external-dns from interfering with creates/updates.

3. **Disable custom hostnames in external-dns entirely** by removing the `--cloudflare-custom-hostnames` flag (or setting it to false) and redeploying. When disabled, external-dns does not perform any custom hostname operations -- no creates, no updates, and no deletes. DNS record lifecycle (A/CNAME records) continues normally without any custom hostname side effects.
Copy link
Copy Markdown
Collaborator

@mloiseleur mloiseleur Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I'm not sure I follow. Why someone who wants to migrate custom hostname management away from external-dns would take the risk to allow external-dns to delete those entries?

If I were in this case, I would probably disable this feature entirely in external-dns as my first step.

And if, for whatever reason, I need to do it progressively, I would most likely:

  1. Set external dns replicas to 0
  2. Remove CH annotations on all relevant endpoints
  3. Restore external dns replicas to 1

We can add a doc / guide about this, but I'm not sure if I understand the value in this skip feature that does not really skip and may delete

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use case is progressive migration in a production environment where external-dns manages both DNS records and custom hostnames for 170+ endpoints across multiple deployments.

You can't scale external-dns to 0 -- it manages A records for all endpoints, not just custom hostnames. And you can't remove CH annotations from all endpoints at once when migrating to a new CH manager (in our case a dedicated Kubernetes operator). You need to verify the new manager works per-endpoint before moving to the next.

The - sentinel enables per-endpoint opt-out:

  1. Set - on one endpoint → new manager takes over its CH → verify SSL active
  2. Repeat for remaining endpoints progressively
  3. After all endpoints migrated: --cloudflare-custom-hostnames=false (the global disable you're suggesting)

Step 3 is safe precisely because steps 1-2 already verified each endpoint. Without the sentinel, there's no way to do steps 1-2 -- you'd have to go from "external-dns manages all CHs" to "external-dns manages no CHs" in one shot, with no rollback if the new manager has issues.

You're right that the sentinel doesn't protect against deletion during A record lifecycle changes -- that's documented in the PR as a known limitation. In practice, A record changes are infrequent during migration (you avoid infrastructure changes during the coexistence window). We ran this in production across 3 clusters with zero incidents during the migration phase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. docs needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. provider Issues or PRs related to a provider registry Issues or PRs related to a registry size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants