Skip to content

feat(provider): add External Metrics provider#1863

Open
jlore-decathlon wants to merge 1 commit intofluxcd:mainfrom
jlore-decathlon:feat/externalmetrics
Open

feat(provider): add External Metrics provider#1863
jlore-decathlon wants to merge 1 commit intofluxcd:mainfrom
jlore-decathlon:feat/externalmetrics

Conversation

@jlore-decathlon
Copy link
Copy Markdown

@jlore-decathlon jlore-decathlon commented Nov 20, 2025

Proposed addition

The current Datadog metric provider relies on their Metric API.
However, this API has pretty low rate limits, and people with a moderately sized infrastructure tend to reach these limits quite easily when scaling their usage of Flagger or datadog-based autoscaling (like KEDA).

Datadog offers a more scalable alternative by making its Cluster Agent batch requests by groups of 35 see Cluster Agent Autoscaling Metrics. It then makes these metrics available within the cluster by exposing an endpoint following Kubernetes External Metrics API.

Note

This endpoint is not documented by Datadog, as they expect people to have the agent register against the control plane as the cluster's external metrics provider and then making these metrics available through k8s API Server, removing the need to query the endpoint directly.
However, by implementing a kubernetes API, its behavior is predictable and stable enough to be used directly.

We've relied on the way KEDA implemented a similar feature during design and implementation. However, Flagger is not an autoscaling solution so we're not going to mimic the metric proxy Keda operates. We simply propose to query the external metric server directly. By doing this, we also chose to make the provider generic and compatible with any external metrics server. The downside is that we cannot abstract the way datadog names its metrics which isn't trivial.

fix: #1235

Any alternatives you've considered?

We've pondered modifying the Datadog metric provider instead of making an external metrics provider. But we felt that this had the benefit of making other external metric providers compatible and kept the code datadog-agnostic.

We could theoretically make it even more generic and use any kubernetes metric API (standard, Custom or External), but I think Flagger already offers this

Disclaimer

  • This PR was peer programmed with my colleague @mveroone.
  • We're not Go developers, yet we did our best to follow the project's coding conventions and guidelines. Any feedback is welcome and we'll be happy to rework any part of that contribution you think needs it.
  • AI disclosure : AIL 1 (see https://danielmiessler.com/blog/ai-influence-level-ail). Minor code autocomplete, but mostly manual coding and writing.
  • We've built the docker image and tested end-to-end on one of our GKE clusters against Datadog cluster agent endpoint. it seems to work as expected, but the lack of feedback from Flagger leaves room for some doubts, but we were unsure if adding some logging was a good idea.

@jlore-decathlon jlore-decathlon marked this pull request as draft November 20, 2025 16:00
@jlore-decathlon jlore-decathlon force-pushed the feat/externalmetrics branch 2 times, most recently from 85c595d to 139a34a Compare November 24, 2025 16:33
@jlore-decathlon jlore-decathlon changed the title feat(externalmetrics): implement ExternalMetricsProvider for querying… feat(externalmetrics): implement ExternalMetricsProvider for querying external metrics Dec 1, 2025
@jlore-decathlon jlore-decathlon marked this pull request as ready for review December 1, 2025 12:39
@jlore-decathlon jlore-decathlon force-pushed the feat/externalmetrics branch 2 times, most recently from 3757b5a to 72ad54a Compare December 1, 2025 12:48
@jlore-decathlon jlore-decathlon changed the title feat(externalmetrics): implement ExternalMetricsProvider for querying external metrics feat(provider): add External Metrics provider Dec 1, 2025
@mveroone mveroone force-pushed the feat/externalmetrics branch 2 times, most recently from eeeccfc to 86cc361 Compare December 3, 2025 08:28
Copy link
Copy Markdown
Member

@aryan9600 aryan9600 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for opening this PR!

@mveroone mveroone force-pushed the feat/externalmetrics branch from eb8b59f to 2e0a69c Compare January 19, 2026 10:32
@mveroone
Copy link
Copy Markdown

Note : if that's okay, we'll squash commits after a few rounds of review so we can fix the DCO

Copy link
Copy Markdown
Member

@aryan9600 aryan9600 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good - a few nits

Copy link
Copy Markdown
Member

@aryan9600 aryan9600 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! 🎖️

please squash this into 1-2 commits and sign them off, thanks

@mveroone mveroone force-pushed the feat/externalmetrics branch from 9695948 to 6425301 Compare March 26, 2026 16:45
@mveroone
Copy link
Copy Markdown

lgtm! 🎖️

please squash this into 1-2 commits and sign them off, thanks

@aryan9600 Done !
Thanks for the support.

@aryan9600
Copy link
Copy Markdown
Member

ci is failing because of unformatted code - could you run make fmt and push again? thanks

Datadog provider is often meeting API rate limits on bigger
implementations. Datadog Cluster Agent can batch metric queries
and expose them through an endpoint compatible with Kubernetes External
Metrics API.

This implementations allows to use this endpoint and any other server
implementing Kubernetes External Metrics API. Including k8s API server
itself.

Co-authored-by: Johan Lore <johan.lore@decathlon.com>
Co-authored-by: Maxime Véroone <maxime.veroone@decathlon.com>
Signed-off-by: Johan Lore <johan.lore@decathlon.com>
Signed-off-by: Maxime Véroone <maxime.veroone@decathlon.com>
Signed-off-by: Johan Lore <johan.lore@decathlon.com>
@jlore-decathlon
Copy link
Copy Markdown
Author

ci is failing because of unformatted code - could you run make fmt and push again? thanks

@aryan9600 Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Using external metrics from the Kubernetes API server

3 participants