A StackPack for SUSE Observability that transforms OpenTelemetry telemetry from GenAI workloads into high-level topology views, metric dashboards, and health monitors.
The extension auto-discovers GenAI components running on Kubernetes and organizes them into a layered topology:
| Layer | Components |
|---|---|
| Applications | GenAI apps, agents, UIs |
| Services | Inference engines, vector databases, search engines, model proxies, MCP servers, workflow engines, ML registries |
| Models | LLM models (vLLM, Ollama) |
| GPU Nodes | Kubernetes nodes with NVIDIA GPUs |
It provides out-of-the-box metric bindings and health monitors for vLLM, Ollama, Milvus, OpenSearch, Elasticsearch, and GPU infrastructure.
- SUSE Observability instance
- StackState CLI (
sts) installed - Task runner installed
podmanordockerfor building the container image
.
├── stackpack/suse-ai/ # The StackPack
│ ├── stackpack.conf # StackPack metadata and versioning
│ ├── provisioning/ # Groovy scripts, STY templates, icons
│ │ ├── SuseAiProvision.groovy
│ │ └── templates/ # Component types, metrics, monitors, views, sync
│ └── resources/ # Documentation shown in the SUSE Observability UI
├── integrations/
│ ├── otel-collector/ # OTel Collector Helm values (test environment)
│ └── oi-filter/ # OTel Collector Python filters
├── knowledge/ # Architecture docs, guides, conventions
├── Dockerfile # Multi-stage build for the setup container
├── init.sh # Install/uninstall script (runs inside the container)
└── Taskfile.yaml # Development task automation
Copy the example env file and adjust as needed:
cp .env.example .envAvailable settings:
| Variable | Default | Description |
|---|---|---|
IMAGE_NAME |
suse-ai-observability |
Container image name |
IMAGE_VERSION |
latest |
Container image tag |
CONTAINER_RUNTIME |
podman |
podman or docker |
task build
task pushThe container image packages the StackPack archive, the sts CLI, and the init.sh script.
The container expects the following environment variables:
| Variable | Required | Description |
|---|---|---|
STACKSTATE_API_URL |
Yes | SUSE Observability API URL |
STACKSTATE_TOKEN |
Yes | API or service token |
STACKSTATE_TOKEN_TYPE |
Yes | api or service |
KUBERNETES_CLUSTERS |
Install only | Comma-separated list of cluster names |
STS_SKIP_SSL |
No | Set to true to skip TLS verification |
STS_CA_CERT_PATH |
No | Path to a custom CA certificate |
UNINSTALL |
No | Set to true to uninstall the suse-ai StackPack |
Install:
podman run --rm \
-e STACKSTATE_API_URL=https://your-instance.example.com \
-e STACKSTATE_TOKEN=your-token \
-e STACKSTATE_TOKEN_TYPE=api \
-e KUBERNETES_CLUSTERS=cluster-a,cluster-b \
suse-ai-observability:latestThe init script will:
- Install a
kubernetes-v2StackPack instance for each cluster - Install the
open-telemetryStackPack - Upload and install (or upgrade) the
suse-aiStackPack
Uninstall:
podman run --rm \
-e STACKSTATE_API_URL=https://your-instance.example.com \
-e STACKSTATE_TOKEN=your-token \
-e STACKSTATE_TOKEN_TYPE=api \
-e UNINSTALL=true \
suse-ai-observability:latestThe fastest iteration loop during development is to upload the StackPack directly without building a container:
# Increment the patch version
task version-up
# Zip, upload, and upgrade in one step
task stackpack-uploadTo uninstall all instances (useful for a clean re-install):
task stackpack-uninstalltask sts-script FILE=path/to/script.groovy# List installed StackPack instances
sts stackpack list-instances --name suse-ai -o json
# Query the topology
sts script run --script "Topology.query('label = \"suse.ai.managed\"')"
# Inspect a topology sync
sts topology-sync list
sts topology-sync describe --id <id>The knowledge/ directory contains detailed documentation on the project architecture, component types, metric bindings, monitor creation, and design decisions. Start with knowledge/ARCH.md for an overview.