Thank you for your interest in contributing to Holodeck! This guide will help you get started.
-
Fork the repository
-
Clone your fork:
git clone https://github.com/your-username/holodeck.git cd holodeck -
Add the upstream repository:
git remote add upstream https://github.com/nvidia/holodeck.git
- Linux or macOS (Windows is not supported)
- Go 1.20 or later
- Make
- Git
The project uses a Makefile to manage common development tasks:
# Build the binary
make build
# Run tests
make test
# Run linters
make lint
# Clean build artifacts
make clean
# Run all checks (lint, test, build)
make checkAfter building, you can run the CLI directly:
./bin/holodeck --helpOr install it system-wide:
sudo mv ./bin/holodeck /usr/local/bin/holodeck-
Create a new branch for your feature/fix:
git checkout -b feature/your-feature-name
-
Make your changes and commit them:
git commit -s -m "feat: your feature description" -
Push to your fork:
git push origin feature/your-feature-name
-
Create a Pull Request against the main repository
- Use Conventional Commits:
feat: ...for new featuresfix: ...for bug fixesdocs: ...for documentation changesrefactor: ...for code refactoringtest: ...for adding or updating testschore: ...for maintenance
- Use the
-sflag to sign off your commits
- Follow the Go code style guidelines
- Run
make lintbefore submitting PRs - Ensure all tests pass with
make test
- Write unit tests for new features
- Update existing tests when modifying features
- Run the full test suite with
make test
Holodeck's end-to-end tests run on real AWS infrastructure. They are organized into two tiers that control when tests execute in CI.
Smoke tier (pre-merge) — .github/workflows/e2e-smoke.yaml
Runs on every PR push. Covers two label filters:
default && !rpm— standard single-node environment without RPM distroscluster && minimal— smallest valid multinode cluster
Each job takes roughly 20 minutes, giving fast feedback before merge.
Full tier (post-merge) — .github/workflows/e2e.yaml
Runs only when a commit lands on main or a release-* branch.
Covers 13 label filters plus a separate arm64 job and an
integration-test job that exercises holodeck as a GitHub Action.
| Label filter | What it covers |
|---|---|
legacy |
Kubernetes using a legacy version |
dra |
Dynamic Resource Allocation enabled |
kernel |
Kernel features / custom kernel |
ctk-git |
Container Toolkit installed from git source |
k8s-git |
Kubernetes built from git (kubeadm) |
k8s-kind-git |
Kubernetes built from git (KIND) |
k8s-latest |
Kubernetes tracking master branch |
cluster && gpu && !minimal && !ha && !dedicated |
Standard GPU cluster |
cluster && dedicated |
Cluster with dedicated CPU control-plane |
cluster && ha |
HA cluster (3 control-plane nodes) |
rpm-rocky |
Rocky Linux 9 — multiple container runtimes |
rpm-al2023 |
Amazon Linux 2023 — multiple container runtimes |
rpm-fedora |
Fedora 42 — multiple container runtimes |
The arm64 job is a separate workflow job (not a matrix entry) that only
runs on main. It uses --label-filter='arm64' — a test must carry
Label("arm64") to be selected.
Tests are tagged with Ginkgo Label() annotations. Each test can carry
multiple labels; CI selects tests using boolean filter expressions.
Single-node labels (defined in tests/aws_test.go):
| Label | Description |
|---|---|
default |
Basic AWS environment, default configuration |
legacy |
Legacy Kubernetes version |
dra |
Dynamic Resource Allocation |
kernel |
Custom kernel features |
ctk-git |
CTK from git source |
k8s-git |
Kubernetes from git (kubeadm) |
k8s-kind-git |
Kubernetes from git (KIND) |
k8s-latest |
Kubernetes master branch |
rpm |
Any RPM-based distribution |
rpm-rocky |
Rocky Linux 9 |
rpm-al2023 |
Amazon Linux 2023 |
rpm-fedora |
Fedora 42 |
post-merge |
Excluded from smoke tier; full tier only |
Cluster labels (defined in tests/aws_cluster_test.go):
| Label | Description |
|---|---|
cluster |
Multinode cluster test |
multinode |
Two or more nodes |
gpu |
GPU worker nodes |
minimal |
Smallest valid configuration (1 CP + 1 worker) |
dedicated |
Dedicated CPU control-plane node |
ha |
High-availability control plane (3 nodes) |
rpm |
RPM-based cluster OS |
rpm-rocky |
Rocky Linux 9 cluster |
rpm-al2023 |
Amazon Linux 2023 cluster |
post-merge |
Excluded from smoke tier; full tier only |
The post-merge label is the mechanism that keeps a test out of the smoke
tier. The smoke workflow's label filter "default && !rpm" already excludes
RPM tests, but adding post-merge makes the intent explicit and ensures the
test is skipped by any future smoke filter that might otherwise match it.
- Single-node test — add an
Entry(...)to theDescribeTableintests/aws_test.go. - Cluster test — add an
Entry(...)totests/aws_cluster_test.go. - Create the corresponding fixture file under
tests/data/. - Assign Ginkgo labels with
Label("label1", "label2", ...)as the last argument of theEntry. - If the test is an edge case, platform-specific variant, or is expensive
(> 30 min), add
"post-merge"to its label list so it runs only in the full tier.
Example:
Entry("My New Feature Test", testConfig{
name: "my-feature-test",
filePath: filepath.Join(packagePath, "data", "test_aws_my_feature.yml"),
description: "Tests my new feature end-to-end",
}, Label("default", "my-feature")),Use smoke tier (no post-merge) |
Use full tier (post-merge) |
|---|---|
| Core functionality every PR should validate | Edge cases and less-common paths |
| Fast tests (< 25 min) | Slow tests (> 30 min) |
| Platform-agnostic defaults | Platform-specific variants (RPM distros, arm64) |
| Minimal cluster configurations | Full-scale, HA, or dedicated cluster topologies |
If in doubt, start with post-merge and promote the label out of the full
tier once the test has demonstrated stability.
Use the Ginkgo label filter to select which tests to run:
# Run only the smoke-equivalent tests
make -f tests/Makefile test GINKGO_ARGS="--label-filter='default && !rpm'"
# Run a specific label
make -f tests/Makefile test GINKGO_ARGS="--label-filter='cluster && minimal'"
# Run all RPM tests for Rocky 9
make -f tests/Makefile test GINKGO_ARGS="--label-filter='rpm-rocky'"Required environment variables:
export AWS_ACCESS_KEY_ID=<your-key-id>
export AWS_SECRET_ACCESS_KEY=<your-secret>
export E2E_SSH_KEY=<path-to-ssh-private-key>Important: Always validate E2E tests locally before pushing. CI E2E runs provision real GPU instances on AWS, and unnecessary runs are expensive.
- Update relevant documentation when adding features
- Follow the existing documentation style
- Ensure your PR description clearly describes the problem and solution
- Include relevant issue numbers
- Add tests for new functionality
- Update documentation
- Ensure CI passes
- Version bump
- Update changelog
- Create release tag
- Build and publish release artifacts
- Open an issue for bugs or feature requests
- Join the community discussions
- Check existing documentation
Please read and follow our Code of Conduct.