Add GPG signing to linux binary tarball builds by PlatCore · Pull Request #5289 · ava-labs/avalanchego

PlatCore · 2026-04-16T00:42:56Z

Why this should be merged

Linux binary tarballs are uploaded to S3 unsigned. RPM packages already have GPG signing. This adds detached signatures (.tar.gz.sig) to close that gap.

Closes #5160

How this works

build-tgz-pkg.sh: imports GPG key into temp GNUPGHOME, signs each tarball with gpg --detach-sign, verifies inline, uploads .sig to S3. No-op when no key is provided.
build-linux-binaries.yml: both jobs import GPG key from RPM_GPG_PRIVATE_KEY secret, pass it to the script, and include .sig in artifacts.

How this was tested

No GPG key: tarballs produced unsigned (backward compat)
With ephemeral GPG key: .sig files produced, gpg --verify passes
Empty key file: signing skipped (fork build scenario)
~/.gnupg never touched (temp GNUPGHOME isolation verified)
shellcheck and yamllint clean

Need to be documented in RELEASES.md?

No

maru-ava

What’s here is entirely reasonable, and I appreciate you doing the work. My original instinct was to make the minimal change you’ve proposed here and add signing to the existing workflow.

But after having just reviewed the DEB packaging series, I’d prefer to see this follow the same general pattern: locally reproducible and locally testable. We shouldn’t be extending release/signing behavior in a way that can only really be validated in CI or through ad-hoc manual testing. That means a local task-driven validation path and CI exercising that same path so that we have regression coverage when we change the workflow or its supporting functionality. That suggests:

a Taskfile entrypoint
a separate validation step/script
detached-signature verification in a fresh environment
CI invoking that same validation path

maru-ava · 2026-05-01T18:12:24Z

 # Optional env vars:
-#   DOCKERFILE    - Dockerfile name within CONTEXT_DIR (default: "Dockerfile")
+#   DOCKERFILE        - Dockerfile name within CONTEXT_DIR (default: "Dockerfile")
+#   BUILDER_PLATFORM  - Target platform for the image (e.g., "linux/amd64").


Please explain why you think this is a good idea. Not everything that can be done, should be done.

Addressed.

PS the comment gives "if Jurassic Park scientists should" vibes )

Add detached GPG signatures (.sig) to the tarball packaging pipeline, reusing the same key infrastructure as RPM signing.

Use a job-scoped temp directory for GNUPGHOME instead of the default ~/.gnupg to avoid mutating shared state on persistent runners. Clean up via trap EXIT so the keyring is removed even on script failure. Pass passphrase via stdin (--passphrase-fd 0) instead of command line to avoid exposure in /proc/<pid>/cmdline.

Add an optional DOCKERFILE env var (defaulting to "Dockerfile") so the same script can build different packaging builder images. RPM's task continues to work unchanged because it does not set DOCKERFILE. This unblocks adding additional packaging builder images (e.g. for linux tarballs) without duplicating the script.

Add a new packaging path for signed linux tarballs that mirrors the existing RPM pattern: a Dockerfile, a build script that runs inside the container, a validation script that runs in a fresh container, and Taskfile entrypoints (build-tarballs, validate-tarballs, test-build-tarballs). The build container (Ubuntu 22.04 / glibc 2.35) compiles avalanchego and subnet-evm from source, stages each, tar+gzips them, and signs each archive with gpg --detach-sign (binary .sig). Signing is gated on a GPG_KEY_FILE env var so unsigned local builds work without secrets. The build script uses a temp GNUPGHOME with trap-EXIT cleanup so it never mutates a shared keyring. The validation script runs ubuntu:22.04 fresh, imports the public key (when present), verifies each detached signature, extracts the archives, runs --version on each binary, and asserts the embedded git commit matches the build's commit. The public key is exported into the local build dir solely for the validation container's use; it is never uploaded as a release artifact (the canonical public key continues to live only in S3). Existing build-tgz-pkg.sh / build-linux-binaries.yml wiring is unchanged in this commit; CI continues using the old path until the next commit flips the switch. Verified locally on macOS arm64: task test-build-tarballs produces signed linux/arm64 tarballs whose signatures verify cleanly in the validation container and whose binaries report the correct commit.

Replace the inline build-tgz-pkg.sh invocation with a call to task test-build-tarballs (defined under .github/packaging/), giving CI the same locally-reproducible build+validate path a developer can run on macOS via task --taskfile. Drop the host-side "Build the avalanchego binaries" and "Build subnet-evm plugin" steps since the build container now owns binary compilation as well as tarballing. This removes the host-vs-container glibc skew and makes "what CI does" identical to "what task test-build-tarballs does locally." S3 upload is now its own step that explicitly uploads only *.tar.gz and *.tar.gz.sig from build/tgz/ — the GPG public key file produced for the validation container stays local and is excluded from S3 and from GitHub artifacts. Refactor a couple of run-step expressions to env-var indirection (${{ github.event.inputs.tag }} and the GPG private key secret) to address the workflow security hardening hook. Delete the now-unused .github/workflows/build-tgz-pkg.sh.

Tarballs are overwritten by tar but .sig and GPG-KEY-avalanchego files are not, so a signed run followed by an unsigned run with the same tag/arch leaves stale signatures whose contents no longer match the freshly built tarballs. Validation then fails with "BAD signature", and signed-then-signed re-runs would also pick up a stale public key file. Remove *.tar.gz.sig and GPG-KEY-avalanchego from OUTPUT_DIR at the top of build-tgz.sh so each run starts from a clean signing state. The tarballs themselves are preserved (they are about to be overwritten by tar anyway).

The tarball build pipeline accepts TGZ_ARCH to choose the target architecture, but the underlying builder image was always built for the host arch (uname -m). When TGZ_ARCH differed from the host (e.g. TGZ_ARCH=amd64 on an arm64 workstation), the subsequent docker run --platform linux/<TGZ_ARCH> failed because the local image manifest had no matching platform. Add an optional BUILDER_PLATFORM env var to build-builder-image.sh: when set, derive the Go SHA256 checksum from that platform and pass --platform to docker build so the resulting image's manifest lines up with what `docker run --platform <target>` expects. When unset, behavior is unchanged (host arch), so the RPM task is unaffected. Plumb BUILDER_PLATFORM=linux/${TGZ_ARCH} into the Taskfile's build-tgz-builder-docker-image task so cross-builds work via: TGZ_ARCH=amd64 task --taskfile .github/packaging/Taskfile.yml \ test-build-tarballs Verified locally on macOS arm64 by cross-building amd64 tarballs; `file` on the extracted binaries reports ELF x86-64 and the validation container (also amd64 via Docker emulation) ran the binaries successfully.

Following review feedback, removing the BUILDER_PLATFORM env var and the --platform flags introduced earlier. Each invocation now builds for the host arch only; CI's per-arch runner matrix (ubuntu-22.04 + custom-arm64-jammy) provides coverage natively, matching how the RPM packaging path already works. The TGZ_ARCH var stays as a filename-only knob that defaults to the host arch via PACKAGING_TGZ_HOST_ARCH, mirroring RPM's RPM_ARCH | default .PACKAGING_HOST_ARCH pattern. Verified locally on macOS arm64: task test-build-tarballs still produces signed and unsigned arm64 tarballs that validate end-to-end in the fresh ubuntu:22.04 container.

Two regressions reported on the cross-arch simplification commit (`83ce0f1b99`): 1. DOCKER_DEFAULT_PLATFORM divergence breaks the builder image build. The Dockerfile uses ${TARGETARCH} (resolved by Docker from the host platform, --platform flag, or DOCKER_DEFAULT_PLATFORM) to download Go, while build-builder-image.sh fetched the SHA256 for `uname -m`. On Apple Silicon with DOCKER_DEFAULT_PLATFORM= linux/amd64 (a common setup) the values diverged and the sha256sum -c step inside the Dockerfile failed. Fix: build-builder-image.sh now passes --platform linux/${goarch} to docker build, pinning Docker's TARGETARCH to the same arch the script computed the checksum for. Same script is used by RPM, so this also closes the latent equivalent in the RPM path. Also pin --platform on the build-tarballs `docker run` and the validate-tgz.sh `docker run` so the entire pipeline stays at host arch even when DOCKER_DEFAULT_PLATFORM points elsewhere. 2. TGZ_ARCH override silently produced mislabeled tarballs. The Taskfile forwarded a user-supplied TGZ_ARCH into the build container as PACKAGING_TGZ_ARCH, but neither scripts/build.sh nor the subnet-evm build set GOARCH from it — the binaries were always at the container's native arch. On arm64, `task build-tarballs TGZ_ARCH=amd64` produced *-linux-amd64-*.tar.gz containing arm64 binaries, and validation would still pass because its container also ran at host arch. Fix: build-tgz.sh and validate-tgz.sh now derive arch from `uname -m` at runtime (deb-style mapping). The Taskfile no longer forwards TGZ_ARCH/PACKAGING_TGZ_ARCH to either script. Since each script computes arch from its own runtime env (which is pinned to host arch via --platform), filenames always match the binary contents. The validate-tarballs task is also moved from `env:` to command-line env-prefix for TAG/GIT_COMMIT, so parent shell env vars can no longer shadow them either (a separate Task v3 quirk: `env:` block doesn't override the parent shell). Verified locally on macOS arm64: - `DOCKER_DEFAULT_PLATFORM=linux/amd64 task test-build-tarballs` succeeds end-to-end and produces correctly-labeled arm64 tarballs (host arch overrides DOCKER_DEFAULT_PLATFORM). - `TGZ_ARCH=amd64 task test-build-tarballs` ignores the override and still produces -arm64- tarballs that validate cleanly. - Both unsigned and signed flows still work; signed run produces .sig files and a local GPG-KEY-avalanchego that validation imports and verifies in the fresh ubuntu:22.04 container.

The containerized build via test-build-tarballs writes build/plugins/<vm-id> as root (graft/subnet-evm/scripts/build.sh mkdir -p's it inside the container). On linux runners without userns-remap, that directory is root-owned on the host and the cleanup step's `rm -rf ./build` running as the runner user can't recurse into it — the cleanup exits non-zero and the workflow job is marked failed even after the artifacts uploaded successfully. Use `sudo rm -rf ./build` (passwordless on GitHub runners) to clean reliably across container-produced files. RPM doesn't hit this because its cleanup only removes build/rpm (a directory the script fully owns), not the whole build/ tree.

The combined check `[[ -n "${GPG_KEY_FILE:-}" && -s ... ]]` lumped two distinct cases into the same "skip signing" branch: - GPG_KEY_FILE unset entirely (local dev — unsigned OK) - GPG_KEY_FILE set but file is 0 bytes (CI signing secret missing or blank — should fail closed, not silently ship unsigned release artifacts) The release workflow writes secrets.RPM_GPG_PRIVATE_KEY to a temp file unconditionally. If that secret is misconfigured, the file ends up empty and the prior code silently produced unsigned .tar.gz files. Validation skipped sig-verify (no public key present), and the workflow happily uploaded unsigned tarballs to S3 — exactly the failure mode signing exists to prevent. Tri-state the check: unset → unsigned (local dev), set-but-empty → hard error with actionable message, set-and-non-empty → sign. Verified locally: - No GPG_KEY_FILE env: produces 2 unsigned tarballs (validation skips sig-verify, smoke tests pass). - GPG_KEY_FILE=$(mktemp) (empty): exits non-zero with the "Refusing to produce unsigned release artifacts" error, output dir stays empty. - GPG_KEY_FILE pointing at a real key: full signed flow, .sig + GPG-KEY-avalanchego produced, validation passes.

Previously the build-tarballs task templated the secret values directly into the docker run command line: {{if .GPG_KEY_FILE}}-e GPG_KEY_FILE={{.GPG_KEY_FILE}}{{end}} {{if .GPG_PASSPHRASE}}-e GPG_PASSPHRASE={{.GPG_PASSPHRASE}}{{end}} Task hands the rendered cmd to `sh -c`, so any whitespace or shell metacharacter in the templated value (a space splits the arg in two; a $ triggers shell expansion; a quote breaks parsing) makes docker see a truncated value or extra arguments instead of the real secret. With a real production passphrase that contains any of these characters, the release job would fail at signing even though the secret itself is valid. Switch to `-e VAR` (no `=value`), which tells docker to forward the variable from the host process's environment. The value never touches the shell command line, so it's insensitive to whitespace, $, quotes, or anything else the secret might contain. Verified locally with a passphrase containing a space and `$`: end-to-end signed build succeeds, all 5 expected artifacts are produced, validation passes.

The earlier cleanup at the top of build-tgz.sh removed stale .sig files and the exported public key but left .tar.gz files intact, on the assumption that tar would just overwrite them. That holds for same-tag re-runs but not when the tag changes between runs: the previous tag's tarballs persist with their original filenames. The release workflow's S3 upload step matches *.tar.gz with a wildcard, so on persistent runners (notably custom-arm64-jammy) or after a failed cleanup, a re-run for vY would publish vX tarballs alongside the vY release. Extend the cleanup to also remove *.tar.gz from the output dir. Same-tag re-runs are unaffected (the new run rewrites them); tag-switch re-runs no longer leak old archives. Verified locally: built v1.0.0-old, then re-ran with v2.0.0-new without manual cleanup; only the v2.0.0-new files remained.

maru-ava · 2026-05-02T11:31:36Z

+          # into it as the unprivileged runner user on linux runners.
+          sudo rm -rf ./build

  build-arm64-binaries-tarball:


There arm64 and amd64 workflows would appear to be substantially duplicative. Maybe refactor to use a common workflow that can be configured with arch and runner?

maru-ava · 2026-05-02T11:32:45Z

@@ -0,0 +1,115 @@
+#!/usr/bin/env bash


Is there room for sharing validation logic across tgz/rpm/deb?

github-project-automation Bot added this to avalanchego Apr 16, 2026

PlatCore moved this to In Progress 🏗️ in avalanchego Apr 16, 2026

PlatCore self-assigned this Apr 16, 2026

PlatCore added ci This focuses on changes to the CI process devinfra labels Apr 16, 2026

PlatCore force-pushed the PlatCore/5160-add-signing-linux-binaries branch from 7af6acf to 9b6a785 Compare April 16, 2026 18:31

PlatCore marked this pull request as ready for review April 16, 2026 19:06

PlatCore requested a review from a team as a code owner April 16, 2026 19:06

PlatCore requested a review from maru-ava April 16, 2026 19:06

PlatCore force-pushed the PlatCore/5160-add-signing-linux-binaries branch from 9b6a785 to ef7dc92 Compare April 27, 2026 18:54

maru-ava requested changes Apr 30, 2026

View reviewed changes

maru-ava reviewed May 1, 2026

View reviewed changes

PlatCore force-pushed the PlatCore/5160-add-signing-linux-binaries branch from 5dcefe7 to 5ce8a46 Compare May 2, 2026 01:10

PlatCore added 13 commits May 1, 2026 18:37

Add GPG signing to linux binary tarball builds

49e5ad3

Add detached GPG signatures (.sig) to the tarball packaging pipeline, reusing the same key infrastructure as RPM signing.

PlatCore force-pushed the PlatCore/5160-add-signing-linux-binaries branch from 5ce8a46 to 1d50125 Compare May 2, 2026 01:38

maru-ava reviewed May 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GPG signing to linux binary tarball builds#5289

Add GPG signing to linux binary tarball builds#5289
PlatCore wants to merge 13 commits intomasterfrom
PlatCore/5160-add-signing-linux-binaries

PlatCore commented Apr 16, 2026 •

edited

Loading

Uh oh!

maru-ava left a comment

Uh oh!

maru-ava May 1, 2026

Uh oh!

PlatCore May 2, 2026

Uh oh!

maru-ava May 2, 2026

Uh oh!

maru-ava May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

PlatCore commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why this should be merged

How this works

How this was tested

Need to be documented in RELEASES.md?

Uh oh!

maru-ava left a comment

Choose a reason for hiding this comment

Uh oh!

maru-ava May 1, 2026

Choose a reason for hiding this comment

Uh oh!

PlatCore May 2, 2026

Choose a reason for hiding this comment

Uh oh!

maru-ava May 2, 2026

Choose a reason for hiding this comment

Uh oh!

maru-ava May 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PlatCore commented Apr 16, 2026 •

edited

Loading