Bug
When enableInTreeAutoscaling: true, the KubeRay operator injects an autoscaler sidecar into the head pod via BuildAutoscalerContainer(). The autoscaler process starts a Prometheus metrics server on port 44217 (AUTOSCALER_METRIC_PORT), exposing metrics like autoscaler_active_nodes.
However, BuildAutoscalerContainer() does not declare any containerPort entries on the sidecar container. The official PodMonitor YAML (config/prometheus/podMonitor.yaml) scrapes port: as-metrics from head pods, which requires a named container port to be discoverable by Prometheus.
Since no container declares containerPort: 44217 with name: as-metrics, Prometheus silently fails to scrape autoscaler metrics.
Root cause
BuildAutoscalerContainer() in controllers/ray/common/pod.go creates the sidecar with Env, Command, Args, and Resources — but no Ports field.
The sample YAML ray-cluster.embed-grafana.yaml works around this by manually declaring the port on the head container, but that sample does not use enableInTreeAutoscaling. Users who enable autoscaling via the operator flag don't get this port.
How to reproduce
- Create a RayCluster with
enableInTreeAutoscaling: true
- Deploy the PodMonitor from
config/prometheus/podMonitor.yaml
- Check Prometheus targets — the
as-metrics endpoint is not discovered
autoscaler_active_nodes metric is missing
Fix
Add the port declaration to BuildAutoscalerContainer:
Ports: []corev1.ContainerPort{
{
ContainerPort: 44217,
Name: "as-metrics",
},
},
Bug
When
enableInTreeAutoscaling: true, the KubeRay operator injects an autoscaler sidecar into the head pod viaBuildAutoscalerContainer(). The autoscaler process starts a Prometheus metrics server on port 44217 (AUTOSCALER_METRIC_PORT), exposing metrics likeautoscaler_active_nodes.However,
BuildAutoscalerContainer()does not declare anycontainerPortentries on the sidecar container. The official PodMonitor YAML (config/prometheus/podMonitor.yaml) scrapesport: as-metricsfrom head pods, which requires a named container port to be discoverable by Prometheus.Since no container declares
containerPort: 44217withname: as-metrics, Prometheus silently fails to scrape autoscaler metrics.Root cause
BuildAutoscalerContainer()incontrollers/ray/common/pod.gocreates the sidecar withEnv,Command,Args, andResources— but noPortsfield.The sample YAML
ray-cluster.embed-grafana.yamlworks around this by manually declaring the port on the head container, but that sample does not useenableInTreeAutoscaling. Users who enable autoscaling via the operator flag don't get this port.How to reproduce
enableInTreeAutoscaling: trueconfig/prometheus/podMonitor.yamlas-metricsendpoint is not discoveredautoscaler_active_nodesmetric is missingFix
Add the port declaration to
BuildAutoscalerContainer: