Skip to content

[operator] BuildAutoscalerContainer does not declare containerPort for autoscaler metrics (44217) #4671

@ashutosh1807

Description

@ashutosh1807

Bug

When enableInTreeAutoscaling: true, the KubeRay operator injects an autoscaler sidecar into the head pod via BuildAutoscalerContainer(). The autoscaler process starts a Prometheus metrics server on port 44217 (AUTOSCALER_METRIC_PORT), exposing metrics like autoscaler_active_nodes.

However, BuildAutoscalerContainer() does not declare any containerPort entries on the sidecar container. The official PodMonitor YAML (config/prometheus/podMonitor.yaml) scrapes port: as-metrics from head pods, which requires a named container port to be discoverable by Prometheus.

Since no container declares containerPort: 44217 with name: as-metrics, Prometheus silently fails to scrape autoscaler metrics.

Root cause

BuildAutoscalerContainer() in controllers/ray/common/pod.go creates the sidecar with Env, Command, Args, and Resources — but no Ports field.

The sample YAML ray-cluster.embed-grafana.yaml works around this by manually declaring the port on the head container, but that sample does not use enableInTreeAutoscaling. Users who enable autoscaling via the operator flag don't get this port.

How to reproduce

  1. Create a RayCluster with enableInTreeAutoscaling: true
  2. Deploy the PodMonitor from config/prometheus/podMonitor.yaml
  3. Check Prometheus targets — the as-metrics endpoint is not discovered
  4. autoscaler_active_nodes metric is missing

Fix

Add the port declaration to BuildAutoscalerContainer:

Ports: []corev1.ContainerPort{
    {
        ContainerPort: 44217,
        Name:          "as-metrics",
    },
},

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions