Skip to content

ci: migrate primary CI jobs to ARM runners#7232

Draft
thepastaclaw wants to merge 1 commit intodashpay:developfrom
thepastaclaw:ci/migrate-to-arm-runners
Draft

ci: migrate primary CI jobs to ARM runners#7232
thepastaclaw wants to merge 1 commit intodashpay:developfrom
thepastaclaw:ci/migrate-to-arm-runners

Conversation

@thepastaclaw
Copy link
Copy Markdown

@thepastaclaw thepastaclaw commented Mar 16, 2026

Summary

Migrate the primary CI path from x86 to ARM runners, reducing costs and providing native aarch64 build+test coverage.

Changes

Moved to ARM runners (runner-arm64)

  • depends-linux64 (shared depends for linux64, fuzz, sqlite, ubsan)
  • src-linux64 + test-linux64 (primary native build+test)
  • src-linux64_fuzz (fuzz build)
  • src-linux64_sqlite + test-linux64_sqlite (SQLite build+test)
  • src-linux64_ubsan + test-linux64_ubsan (UBSan build+test)
  • multiprocess + tsan were already on ARM — no change

Removed

  • depends-aarch64-linux + src-aarch64-linux — the cross-compile job for aarch64-linux-gnu from x86 runners. We don't ship arm-linux-gnueabihf binaries, and native aarch64 coverage is now provided by the migrated jobs.
  • ci/test/00_setup_env_aarch64.sh — the cross-compile setup script

Remaining on x86 (runner-amd64)

  • nowallet (build + test) — sole native x86_64 coverage
  • mac, win64 — cross-compile targets
  • lint — architecture-independent

HOST detection centralization

Moved architecture-based HOST detection (aarch64-linux-gnu / x86_64-pc-linux-gnu) from individual setup scripts into ci/test/00_setup_env.sh. This:

  • Uses explicit triplets matching depends (e.g. aarch64-linux-gnu instead of config.guess's aarch64-unknown-linux-gnu)
  • Eliminates duplicated case blocks in 00_setup_env_native_qt5.sh, 00_setup_env_native_multiprocess.sh, 00_setup_env_native_tsan.sh
  • Automatically provides correct HOST for all native targets (fuzz, sqlite, ubsan) without per-script changes
  • Fixes PREVIOUS_RELEASES_DIR path computation which depended on HOST being set before the target script ran

Previous releases download fix

Added exact-match patterns for aarch64-linux-gnu and x86_64-linux-gnu in test/get_previous_releases.py, since the existing fnmatch glob aarch64-*-linux* requires a vendor field (e.g. aarch64-unknown-linux-gnu) and won't match the vendorless triplet.

Validation

  • Verified all moved jobs reference runner-arm64 in build.yml
  • Verified HOST detection in 00_setup_env.sh covers aarch64/x86_64 with dpkg fallback
  • Verified fuzz, sqlite, and ubsan setup scripts inherit HOST correctly (none set it explicitly)
  • Verified cross-compile scripts (mac, win64) still override HOST as needed
  • Verified previous release binaries exist for aarch64-linux-gnu in SHA256_SUMS
  • No references to deleted depends-aarch64-linux or src-aarch64-linux remain in build.yml

@thepastaclaw
Copy link
Copy Markdown
Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 16, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 16, 2026

✅ No Merge Conflicts Detected

This PR currently has no conflicts with other open PRs.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 16, 2026

Walkthrough

CI workflows and test environment scripts were adjusted to remove the aarch64-specific setup and shift many linux64 jobs to run on ARM runners. The GitHub Actions workflow removed aarch64 job blocks, renamed and migrated several linux64 jobs to run on arm64, and changed the bootstrap job runner default. The ci matrix and setup scripts were updated: the aarch64 environment script was deleted, several native setup scripts had HOST detection/export removed or altered, and ci/test/00_setup_env.sh now derives HOST from uname/dpkg before falling back to config.guess. A release helper expanded exact-platform mappings for aarch64/x86_64 GNU triplets.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'ci: migrate primary CI jobs to ARM runners' directly and concisely describes the main objective of the changeset—migrating CI infrastructure from x86 to ARM runners.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check ✅ Passed The PR description is directly related to the changeset, detailing a migration of CI jobs from x86 to ARM runners with specific technical changes and validation steps.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Member

@PastaPastaPasta PastaPastaPasta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your branch ci fails here: https://github.com/thepastaclaw/dash/actions/runs/23168985036/job/67317597479

please continue tracking this until push ci on your repo is successful

@thepastaclaw thepastaclaw force-pushed the ci/migrate-to-arm-runners branch from 4b72d41 to cba08c0 Compare March 17, 2026 18:31
@thepastaclaw
Copy link
Copy Markdown
Author

The CI failure on the old commit (4b72d41) has been fixed. The latest commit (cba08c0) passes all CI — both on the fork and on upstream:

  • Fork CI (push): Run 23210233386 — ✅ success (skipped redundant jobs since upstream CI covers them)
  • Upstream CI (PR): Run 23210234315 — ✅ all jobs pass (depends, builds, tests across all targets)

The fixes in the force-push:

  1. HOST detection centralized in 00_setup_env.sh — all native targets (fuzz, sqlite, ubsan) now inherit correct HOST automatically instead of needing per-script detection
  2. Duplicate HOST detection removed from 00_setup_env_native_multiprocess.sh and 00_setup_env_native_tsan.sh
  3. Previous releases fix — added exact-match patterns for aarch64-linux-gnu and x86_64-linux-gnu in get_previous_releases.py (vendorless triplets didn't match the existing fnmatch globs)

@thepastaclaw
Copy link
Copy Markdown
Author

thepastaclaw commented Mar 30, 2026

✅ Review complete (commit e7f1160)

@thepastaclaw thepastaclaw force-pushed the ci/migrate-to-arm-runners branch from d619b9f to cba08c0 Compare March 30, 2026 14:29
Copy link
Copy Markdown
Author

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The CI host-triplet cleanup is mostly correct, but it introduces one real regression in previous-release handling for ARM-native linux64 functional jobs. The only confirmed issue is blocking: the new ARM host mapping reaches a legacy release path that has no matching aarch64 artifact, so the job aborts before tests run.

Reviewed commit: cba08c0

🔴 1 blocking

1 additional finding

🔴 blocking: ARM native linux64 jobs fail when downloading the oldest previous-release binary

test/get_previous_releases.py (lines 127-134)

test-linux64 now runs on an ARM runner, 00_setup_env.sh sets HOST=aarch64-linux-gnu for that environment, and the native linux64 test target still enables --previous-releases. In check_host(), that host maps to args.platform = "aarch64-linux-gnu", and the default tag list includes v0.12.1.5. For tags earlier than v0.12.3, download_binary() only remaps arm-linux-gnueabihf to RPi2 and x86_64-linux-gnu to linux64, so v0.12.1.5 is requested as dashcore-0.12.1.5-aarch64-linux-gnu.tar.gz. That artifact is not present in SHA256_SUMS and does not exist in the release assets, so the script returns Binary tag was not found and the functional job fails before running tests.

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `test/get_previous_releases.py`:
- [BLOCKING] lines 127-134: ARM native linux64 jobs fail when downloading the oldest previous-release binary
  `test-linux64` now runs on an ARM runner, `00_setup_env.sh` sets `HOST=aarch64-linux-gnu` for that environment, and the native `linux64` test target still enables `--previous-releases`. In `check_host()`, that host maps to `args.platform = "aarch64-linux-gnu"`, and the default tag list includes `v0.12.1.5`. For tags earlier than `v0.12.3`, `download_binary()` only remaps `arm-linux-gnueabihf` to `RPi2` and `x86_64-linux-gnu` to `linux64`, so `v0.12.1.5` is requested as `dashcore-0.12.1.5-aarch64-linux-gnu.tar.gz`. That artifact is not present in `SHA256_SUMS` and does not exist in the release assets, so the script returns `Binary tag was not found` and the functional job fails before running tests.

@thepastaclaw thepastaclaw force-pushed the ci/migrate-to-arm-runners branch from cba08c0 to a9c7f69 Compare April 7, 2026 07:06
Copy link
Copy Markdown
Author

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The arm64 CI migration is mostly coherent, but one test path still assumes legacy x86_64 release binaries and will break the native linux64 functional job. I also confirmed a cache-key mismatch that will keep the arm64 previous-releases cache cold after the workflow switch, but that is a performance issue rather than a correctness blocker.

Reviewed commit: a9c7f69

🔴 1 blocking | 🟡 1 suggestion(s)

1 additional finding

🟡 suggestion: The releases cache key no longer distinguishes x86_64 and arm64 contents

.github/workflows/test-src.yml (lines 46-52)

PREVIOUS_RELEASES_DIR now includes $HOST, so the migrated linux64 job writes downloaded releases under releases/aarch64-linux-gnu instead of the old x86_64 directory. The cache key here still depends only on file hashes, so an existing cache entry created by the former x86_64 linux64 job can be restored on the new arm64 job under the same key. Because that restore is an exact key hit, the cache action will not publish the newly downloaded arm64 subtree back under a distinct key, which leaves the arm64 job redownloading previous releases on subsequent runs.

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `test/get_previous_releases.py`:
- [BLOCKING] lines 127-134: Native arm64 `linux64` jobs still fetch the x86_64 v0.12.1.5 tarball
  The new `linux64` workflow now runs on arm64 runners and `ci/test/00_setup_env.sh` derives `HOST=aarch64-linux-gnu` on those machines, while the Qt5 functional job still enables `--previous-releases`. In `download_binary()`, tags older than `v0.12.3` still remap both `x86_64-linux-gnu` and `aarch64-linux-gnu` to the legacy `linux64` tarball, which is the x86_64 build. `feature_unsupported_utxo_db.py` explicitly starts `v0.12.1.5`, and the functional harness launches that binary directly from `PREVIOUS_RELEASES_DIR` without any emulator path. On the new native arm64 runner this turns into an exec-format failure, so the `linux64` functional job will fail until old-release handling is made architecture-aware or the test is excluded there.

In `.github/workflows/test-src.yml`:
- [SUGGESTION] lines 46-52: The releases cache key no longer distinguishes x86_64 and arm64 contents
  `PREVIOUS_RELEASES_DIR` now includes `$HOST`, so the migrated `linux64` job writes downloaded releases under `releases/aarch64-linux-gnu` instead of the old x86_64 directory. The cache key here still depends only on file hashes, so an existing cache entry created by the former x86_64 `linux64` job can be restored on the new arm64 job under the same key. Because that restore is an exact key hit, the cache action will not publish the newly downloaded arm64 subtree back under a distinct key, which leaves the arm64 job redownloading previous releases on subsequent runs.

Comment on lines 130 to 134
elif platform in ["x86_64-apple-darwin", "arm64-apple-darwin"]:
print(f"Binaries not available for {tag} on {platform}")
return 1
elif platform in ["x86_64-linux-gnu"]:
elif platform in ["x86_64-linux-gnu", "aarch64-linux-gnu"]:
platform = "linux64"
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Blocking: Native arm64 linux64 jobs still fetch the x86_64 v0.12.1.5 tarball

The new linux64 workflow now runs on arm64 runners and ci/test/00_setup_env.sh derives HOST=aarch64-linux-gnu on those machines, while the Qt5 functional job still enables --previous-releases. In download_binary(), tags older than v0.12.3 still remap both x86_64-linux-gnu and aarch64-linux-gnu to the legacy linux64 tarball, which is the x86_64 build. feature_unsupported_utxo_db.py explicitly starts v0.12.1.5, and the functional harness launches that binary directly from PREVIOUS_RELEASES_DIR without any emulator path. On the new native arm64 runner this turns into an exec-format failure, so the linux64 functional job will fail until old-release handling is made architecture-aware or the test is excluded there.

source: ['codex']

🤖 Fix this with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `test/get_previous_releases.py`:
- [BLOCKING] lines 127-134: Native arm64 `linux64` jobs still fetch the x86_64 v0.12.1.5 tarball
  The new `linux64` workflow now runs on arm64 runners and `ci/test/00_setup_env.sh` derives `HOST=aarch64-linux-gnu` on those machines, while the Qt5 functional job still enables `--previous-releases`. In `download_binary()`, tags older than `v0.12.3` still remap both `x86_64-linux-gnu` and `aarch64-linux-gnu` to the legacy `linux64` tarball, which is the x86_64 build. `feature_unsupported_utxo_db.py` explicitly starts `v0.12.1.5`, and the functional harness launches that binary directly from `PREVIOUS_RELEASES_DIR` without any emulator path. On the new native arm64 runner this turns into an exec-format failure, so the `linux64` functional job will fail until old-release handling is made architecture-aware or the test is excluded there.

Copy link
Copy Markdown
Author

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The ARM migration is mostly consistent, but it introduces a real previous-release functional test failure on the native linux64 job.

Reviewed commit: 3841458

🔴 1 blocking

🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `test/get_previous_releases.py`:
- [BLOCKING] lines 133-134: ARM linux64 job now downloads an x86_64-only v0.12.1.5 binary for previous-release tests
  Validated against the checked-out CI wiring: the migrated `linux64` job now runs on ARM runners, `00_setup_env.sh` sets `HOST=aarch64-linux-gnu` there, and `00_setup_env_native_qt5.sh` still enables `--previous-releases`. The linux64 functional test list still includes `feature_unsupported_utxo_db.py`, which explicitly starts version `120105` (v0.12.1.5). For tags `< v0.12.3`, this code maps `aarch64-linux-gnu` to `linux64`, which resolves to `dashcore-0.12.1.5-linux64.tar.gz` from `SHA256_SUMS` and that artifact is the legacy x86_64 tarball. Unlike the removed cross-build path, the new native ARM job does not configure qemu to run x86_64 binaries, so this test path will fail when it tries to launch the downloaded release.

Comment on lines +133 to 134
elif platform in ["x86_64-linux-gnu", "aarch64-linux-gnu"]:
platform = "linux64"
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Blocking: ARM linux64 job now downloads an x86_64-only v0.12.1.5 binary for previous-release tests

Validated against the checked-out CI wiring: the migrated linux64 job now runs on ARM runners, 00_setup_env.sh sets HOST=aarch64-linux-gnu there, and 00_setup_env_native_qt5.sh still enables --previous-releases. The linux64 functional test list still includes feature_unsupported_utxo_db.py, which explicitly starts version 120105 (v0.12.1.5). For tags < v0.12.3, this code maps aarch64-linux-gnu to linux64, which resolves to dashcore-0.12.1.5-linux64.tar.gz from SHA256_SUMS and that artifact is the legacy x86_64 tarball. Unlike the removed cross-build path, the new native ARM job does not configure qemu to run x86_64 binaries, so this test path will fail when it tries to launch the downloaded release.

source: ['claude']

🤖 Fix this with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `test/get_previous_releases.py`:
- [BLOCKING] lines 133-134: ARM linux64 job now downloads an x86_64-only v0.12.1.5 binary for previous-release tests
  Validated against the checked-out CI wiring: the migrated `linux64` job now runs on ARM runners, `00_setup_env.sh` sets `HOST=aarch64-linux-gnu` there, and `00_setup_env_native_qt5.sh` still enables `--previous-releases`. The linux64 functional test list still includes `feature_unsupported_utxo_db.py`, which explicitly starts version `120105` (v0.12.1.5). For tags `< v0.12.3`, this code maps `aarch64-linux-gnu` to `linux64`, which resolves to `dashcore-0.12.1.5-linux64.tar.gz` from `SHA256_SUMS` and that artifact is the legacy x86_64 tarball. Unlike the removed cross-build path, the new native ARM job does not configure qemu to run x86_64 binaries, so this test path will fail when it tries to launch the downloaded release.

@thepastaclaw thepastaclaw force-pushed the ci/migrate-to-arm-runners branch 2 times, most recently from 18e3dce to 49bf536 Compare April 8, 2026 00:32
Copy link
Copy Markdown
Author

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The current HEAD still has one real CI regression in the native ARM functional-test path. The new ARM-specific exclusion is injected through TEST_RUNNER_EXTRA, but the workflow passes a second --exclude later on the command line, so test_runner.py drops feature_unsupported_utxo_db from the final exclusion set.

Reviewed commit: 49bf536

🔴 1 blocking

1 additional finding

🔴 blocking: ARM-specific functional-test exclusion is still overridden by workflow arguments

ci/dash/test_integrationtests.sh (line 47)

On native ARM, ci/test/00_setup_env_native_qt5.sh now exports TEST_RUNNER_EXTRA with --exclude feature_pruning,feature_dbcrash,feature_unsupported_utxo_db, because the previous-release binary required by feature_unsupported_utxo_db is unavailable on aarch64. This script still expands ${TEST_RUNNER_EXTRA} before $PASS_ARGS, and test/functional/test_runner.py defines --exclude as a single-value option, so the later workflow-provided --exclude feature_pruning,feature_dbcrash replaces the ARM-specific list instead of extending it. The linux64 ARM job will therefore continue to schedule feature_unsupported_utxo_db and fail in the exact scenario the setup change was meant to avoid.

💡 Suggested change
LD_LIBRARY_PATH="$DEPENDS_DIR/$HOST/lib" ./test/functional/test_runner.py --ci --attempts=3 --ansi --combinedlogslen=4000 --timeout-factor="${TEST_RUNNER_TIMEOUT_FACTOR}" $PASS_ARGS ${TEST_RUNNER_EXTRA} --failfast --nocleanup --tmpdir="$(pwd)/testdatadirs" $EXTRA_ARGS
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.

In `ci/dash/test_integrationtests.sh`:
- [BLOCKING] line 47: ARM-specific functional-test exclusion is still overridden by workflow arguments
  On native ARM, `ci/test/00_setup_env_native_qt5.sh` now exports `TEST_RUNNER_EXTRA` with `--exclude feature_pruning,feature_dbcrash,feature_unsupported_utxo_db`, because the previous-release binary required by `feature_unsupported_utxo_db` is unavailable on aarch64. This script still expands `${TEST_RUNNER_EXTRA}` before `$PASS_ARGS`, and `test/functional/test_runner.py` defines `--exclude` as a single-value option, so the later workflow-provided `--exclude feature_pruning,feature_dbcrash` replaces the ARM-specific list instead of extending it. The linux64 ARM job will therefore continue to schedule `feature_unsupported_utxo_db` and fail in the exact scenario the setup change was meant to avoid.

Move depends-linux64 and its consumers (linux64, fuzz, sqlite, ubsan)
to ARM runners, making aarch64 the primary CI architecture. This
includes both build and test stages.

Drop the aarch64-linux cross-compile job (depends + src) and its
setup env script since we don't ship arm-linux-gnueabihf binaries
and now have native aarch64 coverage through the migrated jobs.

Jobs remaining on x86:
- nowallet (build + test): sole x86_64 native coverage
- mac, win64: cross-compile targets (x86 host)
- lint: architecture-independent

Update 00_setup_env_native_qt5.sh to dynamically detect HOST based
on the runner architecture instead of hardcoding x86_64.
@thepastaclaw thepastaclaw force-pushed the ci/migrate-to-arm-runners branch 2 times, most recently from e7f1160 to 7b0e40c Compare April 8, 2026 18:36
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

This pull request has conflicts, please rebase.

@thepastaclaw
Copy link
Copy Markdown
Author

@coderabbitai review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

This pull request has conflicts, please rebase.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 8, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Author

@thepastaclaw thepastaclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

I validated the current HEAD at e7f1160 against the available rerun artifacts and the workflow/script changes in the working tree. The agent findings input was empty, there were no available CodeRabbit findings to react to in .review-e7f11606, and I did not confirm any actionable correctness issue in the current patch.

Reviewed commit: e7f1160

@thepastaclaw thepastaclaw force-pushed the ci/migrate-to-arm-runners branch from 7b0e40c to e7f1160 Compare April 8, 2026 18:43
PastaPastaPasta added a commit that referenced this pull request Apr 8, 2026
229d466 ci: use cheaper ARM bootstrap runner for check-skip (PastaClaw)

Pull request description:

  # Summary

  Move only the `check-skip` bootstrap job onto a cheaper ARM runner path.

  ## Problem

  `check-skip` currently uses the generic amd64 runner setting, so even the
  lightweight bootstrap step can consume the more expensive primary runner pool.

  ## Solution

  - add a dedicated `RUNNER_CHECK_SKIP` override for the `check-skip` job
  - default that job to GitHub-hosted ARM (`ubuntu-24.04-arm`)
  - leave downstream dynamic runner selection unchanged for the heavier CI jobs

  This keeps the tiny bootstrap step cheap by default while still allowing an
  org-level Blacksmith ARM label, such as a small 1 vCPU runner, to be
  configured later without affecting the main build/test jobs.

  ## Related

  - split out from <#7232>

  ## Validation

  - parsed `.github/workflows/build.yml` successfully with PyYAML after the
    change
  - reviewed the workflow context around `check-skip`,
    `select_dynamic_runner.py`, and `cache-depends-sources.yml`
  - ran the mandatory code-review gate on
    `upstream/develop..ci/check-skip-arm-bootstrap` → *ship*

Top commit has no ACKs.

Tree-SHA512: a0163facade75a60bc69f668225b069660a8cb3faa12da4b9d43fbbd248b7291a0db521873f46ba98f37929031d09d1571f3c020bcaade5bce6c1fadc4614b70
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants