Add GPG signing to linux binary tarball builds#5289
Add GPG signing to linux binary tarball builds#5289
Conversation
7af6acf to
9b6a785
Compare
9b6a785 to
ef7dc92
Compare
maru-ava
left a comment
There was a problem hiding this comment.
What’s here is entirely reasonable, and I appreciate you doing the work. My original instinct was to make the minimal change you’ve proposed here and add signing to the existing workflow.
But after having just reviewed the DEB packaging series, I’d prefer to see this follow the same general pattern: locally reproducible and locally testable. We shouldn’t be extending release/signing behavior in a way that can only really be validated in CI or through ad-hoc manual testing. That means a local task-driven validation path and CI exercising that same path so that we have regression coverage when we change the workflow or its supporting functionality. That suggests:
- a Taskfile entrypoint
- a separate validation step/script
- detached-signature verification in a fresh environment
- CI invoking that same validation path
| # Optional env vars: | ||
| # DOCKERFILE - Dockerfile name within CONTEXT_DIR (default: "Dockerfile") | ||
| # DOCKERFILE - Dockerfile name within CONTEXT_DIR (default: "Dockerfile") | ||
| # BUILDER_PLATFORM - Target platform for the image (e.g., "linux/amd64"). |
There was a problem hiding this comment.
Please explain why you think this is a good idea. Not everything that can be done, should be done.
There was a problem hiding this comment.
Addressed.
PS the comment gives "if Jurassic Park scientists should" vibes )
5dcefe7 to
5ce8a46
Compare
Add detached GPG signatures (.sig) to the tarball packaging pipeline, reusing the same key infrastructure as RPM signing.
Use a job-scoped temp directory for GNUPGHOME instead of the default ~/.gnupg to avoid mutating shared state on persistent runners. Clean up via trap EXIT so the keyring is removed even on script failure. Pass passphrase via stdin (--passphrase-fd 0) instead of command line to avoid exposure in /proc/<pid>/cmdline.
Add an optional DOCKERFILE env var (defaulting to "Dockerfile") so the same script can build different packaging builder images. RPM's task continues to work unchanged because it does not set DOCKERFILE. This unblocks adding additional packaging builder images (e.g. for linux tarballs) without duplicating the script.
Add a new packaging path for signed linux tarballs that mirrors the existing RPM pattern: a Dockerfile, a build script that runs inside the container, a validation script that runs in a fresh container, and Taskfile entrypoints (build-tarballs, validate-tarballs, test-build-tarballs). The build container (Ubuntu 22.04 / glibc 2.35) compiles avalanchego and subnet-evm from source, stages each, tar+gzips them, and signs each archive with gpg --detach-sign (binary .sig). Signing is gated on a GPG_KEY_FILE env var so unsigned local builds work without secrets. The build script uses a temp GNUPGHOME with trap-EXIT cleanup so it never mutates a shared keyring. The validation script runs ubuntu:22.04 fresh, imports the public key (when present), verifies each detached signature, extracts the archives, runs --version on each binary, and asserts the embedded git commit matches the build's commit. The public key is exported into the local build dir solely for the validation container's use; it is never uploaded as a release artifact (the canonical public key continues to live only in S3). Existing build-tgz-pkg.sh / build-linux-binaries.yml wiring is unchanged in this commit; CI continues using the old path until the next commit flips the switch. Verified locally on macOS arm64: task test-build-tarballs produces signed linux/arm64 tarballs whose signatures verify cleanly in the validation container and whose binaries report the correct commit.
Replace the inline build-tgz-pkg.sh invocation with a call to
task test-build-tarballs (defined under .github/packaging/),
giving CI the same locally-reproducible build+validate path a
developer can run on macOS via task --taskfile.
Drop the host-side "Build the avalanchego binaries" and "Build
subnet-evm plugin" steps since the build container now owns
binary compilation as well as tarballing. This removes the
host-vs-container glibc skew and makes "what CI does" identical
to "what task test-build-tarballs does locally."
S3 upload is now its own step that explicitly uploads only
*.tar.gz and *.tar.gz.sig from build/tgz/ — the GPG public
key file produced for the validation container stays local
and is excluded from S3 and from GitHub artifacts.
Refactor a couple of run-step expressions to env-var indirection
(${{ github.event.inputs.tag }} and the GPG private key secret)
to address the workflow security hardening hook.
Delete the now-unused .github/workflows/build-tgz-pkg.sh.
Tarballs are overwritten by tar but .sig and GPG-KEY-avalanchego files are not, so a signed run followed by an unsigned run with the same tag/arch leaves stale signatures whose contents no longer match the freshly built tarballs. Validation then fails with "BAD signature", and signed-then-signed re-runs would also pick up a stale public key file. Remove *.tar.gz.sig and GPG-KEY-avalanchego from OUTPUT_DIR at the top of build-tgz.sh so each run starts from a clean signing state. The tarballs themselves are preserved (they are about to be overwritten by tar anyway).
The tarball build pipeline accepts TGZ_ARCH to choose the target
architecture, but the underlying builder image was always built for
the host arch (uname -m). When TGZ_ARCH differed from the host
(e.g. TGZ_ARCH=amd64 on an arm64 workstation), the subsequent
docker run --platform linux/<TGZ_ARCH> failed because the local
image manifest had no matching platform.
Add an optional BUILDER_PLATFORM env var to build-builder-image.sh:
when set, derive the Go SHA256 checksum from that platform and pass
--platform to docker build so the resulting image's manifest lines
up with what `docker run --platform <target>` expects. When unset,
behavior is unchanged (host arch), so the RPM task is unaffected.
Plumb BUILDER_PLATFORM=linux/${TGZ_ARCH} into the Taskfile's
build-tgz-builder-docker-image task so cross-builds work via:
TGZ_ARCH=amd64 task --taskfile .github/packaging/Taskfile.yml \
test-build-tarballs
Verified locally on macOS arm64 by cross-building amd64 tarballs;
`file` on the extracted binaries reports ELF x86-64 and the
validation container (also amd64 via Docker emulation) ran the
binaries successfully.
Following review feedback, removing the BUILDER_PLATFORM env var and the --platform flags introduced earlier. Each invocation now builds for the host arch only; CI's per-arch runner matrix (ubuntu-22.04 + custom-arm64-jammy) provides coverage natively, matching how the RPM packaging path already works. The TGZ_ARCH var stays as a filename-only knob that defaults to the host arch via PACKAGING_TGZ_HOST_ARCH, mirroring RPM's RPM_ARCH | default .PACKAGING_HOST_ARCH pattern. Verified locally on macOS arm64: task test-build-tarballs still produces signed and unsigned arm64 tarballs that validate end-to-end in the fresh ubuntu:22.04 container.
Two regressions reported on the cross-arch simplification commit
(`83ce0f1b99`):
1. DOCKER_DEFAULT_PLATFORM divergence breaks the builder image build.
The Dockerfile uses ${TARGETARCH} (resolved by Docker from the
host platform, --platform flag, or DOCKER_DEFAULT_PLATFORM) to
download Go, while build-builder-image.sh fetched the SHA256
for `uname -m`. On Apple Silicon with DOCKER_DEFAULT_PLATFORM=
linux/amd64 (a common setup) the values diverged and the
sha256sum -c step inside the Dockerfile failed.
Fix: build-builder-image.sh now passes --platform linux/${goarch}
to docker build, pinning Docker's TARGETARCH to the same arch the
script computed the checksum for. Same script is used by RPM, so
this also closes the latent equivalent in the RPM path.
Also pin --platform on the build-tarballs `docker run` and the
validate-tgz.sh `docker run` so the entire pipeline stays at host
arch even when DOCKER_DEFAULT_PLATFORM points elsewhere.
2. TGZ_ARCH override silently produced mislabeled tarballs.
The Taskfile forwarded a user-supplied TGZ_ARCH into the build
container as PACKAGING_TGZ_ARCH, but neither scripts/build.sh
nor the subnet-evm build set GOARCH from it — the binaries were
always at the container's native arch. On arm64,
`task build-tarballs TGZ_ARCH=amd64` produced
*-linux-amd64-*.tar.gz containing arm64 binaries, and validation
would still pass because its container also ran at host arch.
Fix: build-tgz.sh and validate-tgz.sh now derive arch from
`uname -m` at runtime (deb-style mapping). The Taskfile no
longer forwards TGZ_ARCH/PACKAGING_TGZ_ARCH to either script.
Since each script computes arch from its own runtime env (which
is pinned to host arch via --platform), filenames always match
the binary contents.
The validate-tarballs task is also moved from `env:` to
command-line env-prefix for TAG/GIT_COMMIT, so parent shell
env vars can no longer shadow them either (a separate Task v3
quirk: `env:` block doesn't override the parent shell).
Verified locally on macOS arm64:
- `DOCKER_DEFAULT_PLATFORM=linux/amd64 task test-build-tarballs`
succeeds end-to-end and produces correctly-labeled arm64
tarballs (host arch overrides DOCKER_DEFAULT_PLATFORM).
- `TGZ_ARCH=amd64 task test-build-tarballs` ignores the override
and still produces -arm64- tarballs that validate cleanly.
- Both unsigned and signed flows still work; signed run produces
.sig files and a local GPG-KEY-avalanchego that validation
imports and verifies in the fresh ubuntu:22.04 container.
The containerized build via test-build-tarballs writes build/plugins/<vm-id> as root (graft/subnet-evm/scripts/build.sh mkdir -p's it inside the container). On linux runners without userns-remap, that directory is root-owned on the host and the cleanup step's `rm -rf ./build` running as the runner user can't recurse into it — the cleanup exits non-zero and the workflow job is marked failed even after the artifacts uploaded successfully. Use `sudo rm -rf ./build` (passwordless on GitHub runners) to clean reliably across container-produced files. RPM doesn't hit this because its cleanup only removes build/rpm (a directory the script fully owns), not the whole build/ tree.
The combined check `[[ -n "${GPG_KEY_FILE:-}" && -s ... ]]` lumped
two distinct cases into the same "skip signing" branch:
- GPG_KEY_FILE unset entirely (local dev — unsigned OK)
- GPG_KEY_FILE set but file is 0 bytes (CI signing secret
missing or blank — should fail closed, not silently ship
unsigned release artifacts)
The release workflow writes secrets.RPM_GPG_PRIVATE_KEY to a temp
file unconditionally. If that secret is misconfigured, the file
ends up empty and the prior code silently produced unsigned
.tar.gz files. Validation skipped sig-verify (no public key
present), and the workflow happily uploaded unsigned tarballs to
S3 — exactly the failure mode signing exists to prevent.
Tri-state the check: unset → unsigned (local dev), set-but-empty
→ hard error with actionable message, set-and-non-empty → sign.
Verified locally:
- No GPG_KEY_FILE env: produces 2 unsigned tarballs (validation
skips sig-verify, smoke tests pass).
- GPG_KEY_FILE=$(mktemp) (empty): exits non-zero with the
"Refusing to produce unsigned release artifacts" error,
output dir stays empty.
- GPG_KEY_FILE pointing at a real key: full signed flow,
.sig + GPG-KEY-avalanchego produced, validation passes.
Previously the build-tarballs task templated the secret values
directly into the docker run command line:
{{if .GPG_KEY_FILE}}-e GPG_KEY_FILE={{.GPG_KEY_FILE}}{{end}}
{{if .GPG_PASSPHRASE}}-e GPG_PASSPHRASE={{.GPG_PASSPHRASE}}{{end}}
Task hands the rendered cmd to `sh -c`, so any whitespace or
shell metacharacter in the templated value (a space splits the
arg in two; a $ triggers shell expansion; a quote breaks parsing)
makes docker see a truncated value or extra arguments instead of
the real secret. With a real production passphrase that contains
any of these characters, the release job would fail at signing
even though the secret itself is valid.
Switch to `-e VAR` (no `=value`), which tells docker to forward
the variable from the host process's environment. The value
never touches the shell command line, so it's insensitive to
whitespace, $, quotes, or anything else the secret might contain.
Verified locally with a passphrase containing a space and `$`:
end-to-end signed build succeeds, all 5 expected artifacts are
produced, validation passes.
The earlier cleanup at the top of build-tgz.sh removed stale .sig files and the exported public key but left .tar.gz files intact, on the assumption that tar would just overwrite them. That holds for same-tag re-runs but not when the tag changes between runs: the previous tag's tarballs persist with their original filenames. The release workflow's S3 upload step matches *.tar.gz with a wildcard, so on persistent runners (notably custom-arm64-jammy) or after a failed cleanup, a re-run for vY would publish vX tarballs alongside the vY release. Extend the cleanup to also remove *.tar.gz from the output dir. Same-tag re-runs are unaffected (the new run rewrites them); tag-switch re-runs no longer leak old archives. Verified locally: built v1.0.0-old, then re-ran with v2.0.0-new without manual cleanup; only the v2.0.0-new files remained.
5ce8a46 to
1d50125
Compare
| # into it as the unprivileged runner user on linux runners. | ||
| sudo rm -rf ./build | ||
|
|
||
| build-arm64-binaries-tarball: |
There was a problem hiding this comment.
There arm64 and amd64 workflows would appear to be substantially duplicative. Maybe refactor to use a common workflow that can be configured with arch and runner?
| @@ -0,0 +1,115 @@ | |||
| #!/usr/bin/env bash | |||
There was a problem hiding this comment.
Is there room for sharing validation logic across tgz/rpm/deb?
Why this should be merged
Linux binary tarballs are uploaded to S3 unsigned. RPM packages already have GPG signing. This adds detached signatures (
.tar.gz.sig) to close that gap.Closes #5160
How this works
build-tgz-pkg.sh: imports GPG key into tempGNUPGHOME, signs each tarball withgpg --detach-sign, verifies inline, uploads.sigto S3. No-op when no key is provided.build-linux-binaries.yml: both jobs import GPG key fromRPM_GPG_PRIVATE_KEYsecret, pass it to the script, and include.sigin artifacts.How this was tested
.sigfiles produced,gpg --verifypasses~/.gnupgnever touched (tempGNUPGHOMEisolation verified)Need to be documented in RELEASES.md?
No