feat(prometheus): support disabling labels and metrics to reduce cardinality by janiussyafiq · Pull Request #13202 · apache/apisix

janiussyafiq · 2026-04-13T04:40:32Z

Description

Add a new per-metric configuration option to the Prometheus plugin's plugin_attr:

disable_labels: A list of built-in label names whose values will be collapsed to an empty string "" for a given metric, reducing cardinality without changing the metric schema.

This is a non-breaking change — all labels remain registered and present in the output, so existing Prometheus dashboards and recording rules are unaffected. Only the label values of the specified labels are zeroed out.

An alternative approach of disable: true (removing a metric entirely from /metrics) was considered but excluded from this PR as it is a breaking change — it would cause dashboards to show "No data", misfire absent() alerts, and break recording rules. Instead, the same effect of suppressing all label cardinality on a metric can be achieved by listing all its labels under disable_labels, which keeps the metric present in the output with empty-string values while remaining fully non-breaking.

Configured under plugin_attr.prometheus.metrics.<metric_name> alongside the existing expire and extra_labels fields.

Example config:

plugin_attr:
  prometheus:
    metrics:
      http_status:
        disable_labels:
          - node
          - consumer
      http_latency:
        disable_labels:
          - node

With the above config, node and consumer will appear as node="",consumer="" in the metric output instead of carrying real values, effectively collapsing all time series that differ only by those labels.

This addresses high-cardinality issues in dynamic environments (e.g. Kubernetes autoscaling where pod IPs churn rapidly), which can cause Prometheus shared dict overflow and excessive memory consumption.

Which issue(s) this PR fixes:

Fixes #12679

Checklist

I have explained the need for this PR and the problem it solves
I have explained the changes or the new features added to this PR
I have added tests corresponding to this change
I have updated the documentation to reflect this change
I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

Copilot

Pull request overview

Adds per-metric label value collapsing to the Prometheus plugin to help reduce metric cardinality (without removing labels/metrics), plus docs and tests to cover the new behavior.

Changes:

Implement plugin_attr.prometheus.metrics.<metric>.disable_labels to force selected built-in label values to "" at emission time.
Add a new test covering disable_labels behavior for http_status.
Document disable_labels in English/Chinese docs and the example config.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`apisix/plugins/prometheus/exporter.lua`	Applies per-metric disabled-label value collapsing when emitting Prometheus metrics.
`t/plugin/prometheus-label-filter.t`	Adds regression test ensuring disabled labels are emitted as empty-string values.
`docs/en/latest/plugins/prometheus.md`	Documents `disable_labels` usage in plugin_attr examples.
`docs/zh/latest/plugins/prometheus.md`	Documents `disable_labels` usage in plugin_attr examples (ZH).
`conf/config.yaml.example`	Adds example configuration snippet showing `disable_labels`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-21T02:26:58Z

+    for metric_name, metric_conf in pairs(metrics_conf) do
+        if metric_conf.disable_labels then
+            disabled_label_metric_map[metric_name] = {}
+            for _, label in ipairs(metric_conf.disable_labels) do
+                disabled_label_metric_map[metric_name][label] = true
+            end
+        end
+    end


disable_labels is iterated with ipairs(metric_conf.disable_labels) without any type checks. Because plugin_attr does not appear to be schema-validated, a misconfiguration (e.g. disable_labels: node as a string) will raise a runtime error on every request. Consider validating metrics_conf/metric_conf/disable_labels types before iterating (and logging + ignoring invalid config) to keep the datapath robust.

Suggested change

for metric_name, metric_conf in pairs(metrics_conf) do

if metric_conf.disable_labels then

disabled_label_metric_map[metric_name] = {}

for _, label in ipairs(metric_conf.disable_labels) do

disabled_label_metric_map[metric_name][label] = true

end

end

end

if type(metrics_conf) ~= "table" then

core.log.warn("invalid prometheus plugin_attr.metrics type: ",

type(metrics_conf), ", expected table; ignoring")

return disabled_label_metric_map

end

for metric_name, metric_conf in pairs(metrics_conf) do

if type(metric_conf) ~= "table" then

core.log.warn("invalid prometheus metric config for metric '", metric_name,

"': expected table, got ", type(metric_conf), "; ignoring")

elseif metric_conf.disable_labels ~= nil then

if type(metric_conf.disable_labels) ~= "table" then

core.log.warn("invalid prometheus disable_labels for metric '", metric_name,

"': expected table, got ", type(metric_conf.disable_labels), "; ignoring")

else

disabled_label_metric_map[metric_name] = {}

for _, label in ipairs(metric_conf.disable_labels) do

disabled_label_metric_map[metric_name][label] = true

end

end

end

end

Copilot · 2026-04-21T02:26:58Z

+local metric_label_map = {
+    http_status = {"code", "route", "matched_uri", "matched_host", "service", "consumer", "node",
+        "request_type", "request_llm_model", "llm_model", "response_source"},
+    http_latency = {"type", "route", "service", "consumer", "node",
+        "request_type", "request_llm_model", "llm_model"},
+    bandwidth = {"type", "route", "service", "consumer", "node",
+        "request_type", "request_llm_model", "llm_model"},
+    llm_latency = {"route_id", "service_id", "consumer", "node",
+        "request_type", "request_llm_model", "llm_model"},
+    llm_prompt_tokens = {"route_id", "service_id", "consumer", "node",
+        "request_type", "request_llm_model", "llm_model"},
+    llm_completion_tokens = {"route_id", "service_id", "consumer", "node",
+        "request_type", "request_llm_model", "llm_model"},
+    llm_active_connections = {"route", "route_id", "matched_uri", "matched_host",
+        "service", "service_id", "consumer", "node",
+        "request_type", "request_llm_model", "llm_model"},


metric_label_map duplicates the label-name lists already defined in http_init for each metric. This introduces a drift risk: if labels are added/reordered in the metric registration, disable_labels will silently stop applying to the intended label or apply to the wrong one. Consider defining the label-name arrays once (shared by both registration and filtering), or deriving the name list from the metric definition so ordering can’t diverge.

Suggested change

local metric_label_map = {

http_status = {"code", "route", "matched_uri", "matched_host", "service", "consumer", "node",

"request_type", "request_llm_model", "llm_model", "response_source"},

http_latency = {"type", "route", "service", "consumer", "node",

"request_type", "request_llm_model", "llm_model"},

bandwidth = {"type", "route", "service", "consumer", "node",

"request_type", "request_llm_model", "llm_model"},

llm_latency = {"route_id", "service_id", "consumer", "node",

"request_type", "request_llm_model", "llm_model"},

llm_prompt_tokens = {"route_id", "service_id", "consumer", "node",

"request_type", "request_llm_model", "llm_model"},

llm_completion_tokens = {"route_id", "service_id", "consumer", "node",

"request_type", "request_llm_model", "llm_model"},

llm_active_connections = {"route", "route_id", "matched_uri", "matched_host",

"service", "service_id", "consumer", "node",

"request_type", "request_llm_model", "llm_model"},

local HTTP_STATUS_LABELS = {"code", "route", "matched_uri", "matched_host", "service", "consumer",

"node", "request_type", "request_llm_model", "llm_model", "response_source"}

local HTTP_LATENCY_LABELS = {"type", "route", "service", "consumer", "node",

"request_type", "request_llm_model", "llm_model"}

local BANDWIDTH_LABELS = {"type", "route", "service", "consumer", "node",

"request_type", "request_llm_model", "llm_model"}

local LLM_LATENCY_LABELS = {"route_id", "service_id", "consumer", "node",

"request_type", "request_llm_model", "llm_model"}

local LLM_PROMPT_TOKENS_LABELS = {"route_id", "service_id", "consumer", "node",

"request_type", "request_llm_model", "llm_model"}

local LLM_COMPLETION_TOKENS_LABELS = {"route_id", "service_id", "consumer", "node",

"request_type", "request_llm_model", "llm_model"}

local LLM_ACTIVE_CONNECTIONS_LABELS = {"route", "route_id", "matched_uri", "matched_host",

"service", "service_id", "consumer", "node",

"request_type", "request_llm_model", "llm_model"}

local metric_label_map = {

http_status = HTTP_STATUS_LABELS,

http_latency = HTTP_LATENCY_LABELS,

bandwidth = BANDWIDTH_LABELS,

llm_latency = LLM_LATENCY_LABELS,

llm_prompt_tokens = LLM_PROMPT_TOKENS_LABELS,

llm_completion_tokens = LLM_COMPLETION_TOKENS_LABELS,

llm_active_connections = LLM_ACTIVE_CONNECTIONS_LABELS,

Copilot · 2026-04-21T02:26:59Z

 function _M.http_log(conf, ctx)
    local vars = ctx.var
+    local disabled_label_metric_map = get_disabled_label_metric_map()


get_disabled_label_metric_map() builds a fresh nested table on every request (http_log runs in the hot path). This adds avoidable allocations/GC pressure at high QPS (the exact scenario this feature targets). Consider caching the computed disabled-label map per worker and only recomputing when plugin_attr.prometheus.metrics changes (e.g. by tracking a config version/modifiedIndex or comparing the attr table reference).

Suggested change

function _M.http_log(conf, ctx)

local vars = ctx.var

local disabled_label_metric_map = get_disabled_label_metric_map()

local cached_prometheus_metrics_attr

local cached_disabled_label_metric_map = {}

local function get_cached_disabled_label_metric_map()

local attr = plugin.plugin_attr("prometheus")

local metrics_attr = attr and attr.metrics

if metrics_attr ~= cached_prometheus_metrics_attr then

cached_prometheus_metrics_attr = metrics_attr

cached_disabled_label_metric_map = get_disabled_label_metric_map() or {}

end

return cached_disabled_label_metric_map

end

function _M.http_log(conf, ctx)

local vars = ctx.var

local disabled_label_metric_map = get_cached_disabled_label_metric_map()

Copilot · 2026-04-21T02:26:59Z

+    local disabled_label_metric_map = get_disabled_label_metric_map()
+


Same hot-path allocation issue as in http_log: inc_llm_active_connections recomputes disabled_label_metric_map on each increment/decrement. If you implement caching for get_disabled_label_metric_map, it should be reused here as well to avoid extra per-request allocations.

Copilot · 2026-04-21T02:26:59Z

+    #    disable_labels: # List of built-in label names to drop (reduces cardinality).
+    #      - node # e.g. drop the upstream node IP label


The new disable_labels comment says "drop" labels, but the implementation/docs describe collapsing the label values to an empty string while keeping the label present (non-breaking). Wording here is misleading and could cause users to assume the label will be removed from the metric schema.

Suggested change

# disable_labels: # List of built-in label names to drop (reduces cardinality).

# - node # e.g. drop the upstream node IP label

# disable_labels: # List of built-in label names whose values are set to empty strings (reduces cardinality while keeping labels).

# - node # e.g. keep the upstream node IP label but set its value to an empty string

janiussyafiq added 8 commits April 9, 2026 14:32

add filtering in metrics

09ea5e4

complete disable metric implementation

3c93d52

fix spacing

239d4f0

add test cases for disabling label filters

2aca1d6

fix: optimize feature approach

c77069c

Merge branch 'master' into feat/prom-disable-metrics

0155042

fix: remove unnecessary test cases

bbb963e

doc: update en and zh to reflect changes

78af852

janiussyafiq marked this pull request as ready for review April 17, 2026 06:35

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Apr 17, 2026

shreemaan-abhishek requested a review from Copilot April 21, 2026 02:23

Copilot started reviewing on behalf of shreemaan-abhishek April 21, 2026 02:23 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(prometheus): support disabling labels and metrics to reduce cardinality#13202

feat(prometheus): support disabling labels and metrics to reduce cardinality#13202
janiussyafiq wants to merge 8 commits intoapache:masterfrom
janiussyafiq:feat/prom-disable-metrics

janiussyafiq commented Apr 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    for metric_name, metric_conf in pairs(metrics_conf) do
-        if metric_conf.disable_labels then
-            disabled_label_metric_map[metric_name] = {}
-            for _, label in ipairs(metric_conf.disable_labels) do
-                disabled_label_metric_map[metric_name][label] = true
-            end
-        end
-    end
+    if type(metrics_conf) ~= "table" then
+        core.log.warn("invalid prometheus plugin_attr.metrics type: ",
+            type(metrics_conf), ", expected table; ignoring")
+        return disabled_label_metric_map
+    end
+    for metric_name, metric_conf in pairs(metrics_conf) do
+        if type(metric_conf) ~= "table" then
+            core.log.warn("invalid prometheus metric config for metric '", metric_name,
+                "': expected table, got ", type(metric_conf), "; ignoring")
+        elseif metric_conf.disable_labels ~= nil then
+            if type(metric_conf.disable_labels) ~= "table" then
+                core.log.warn("invalid prometheus disable_labels for metric '", metric_name,
+                    "': expected table, got ", type(metric_conf.disable_labels), "; ignoring")
+            else
+                disabled_label_metric_map[metric_name] = {}
+                for _, label in ipairs(metric_conf.disable_labels) do
+                    disabled_label_metric_map[metric_name][label] = true
+                end
+            end
+        end
+    end

-function _M.http_log(conf, ctx)
-    local vars = ctx.var
-    local disabled_label_metric_map = get_disabled_label_metric_map()
+local cached_prometheus_metrics_attr
+local cached_disabled_label_metric_map = {}
+local function get_cached_disabled_label_metric_map()
+    local attr = plugin.plugin_attr("prometheus")
+    local metrics_attr = attr and attr.metrics
+    if metrics_attr ~= cached_prometheus_metrics_attr then
+        cached_prometheus_metrics_attr = metrics_attr
+        cached_disabled_label_metric_map = get_disabled_label_metric_map() or {}
+    end
+    return cached_disabled_label_metric_map
+end
+function _M.http_log(conf, ctx)
+    local vars = ctx.var
+    local disabled_label_metric_map = get_cached_disabled_label_metric_map()

		local disabled_label_metric_map = get_disabled_label_metric_map()

		# disable_labels: # List of built-in label names to drop (reduces cardinality).
		# - node # e.g. drop the upstream node IP label

Conversation

janiussyafiq commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Which issue(s) this PR fixes:

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

janiussyafiq commented Apr 13, 2026 •

edited

Loading