Skip to content

ci: require secrets.PYCLOUDLIB_TOML, add lxd_vm and ec2 scheduled jobs#6715

Open
blackboxsw wants to merge 11 commits intocanonical:mainfrom
blackboxsw:ci-lxd-vm
Open

ci: require secrets.PYCLOUDLIB_TOML, add lxd_vm and ec2 scheduled jobs#6715
blackboxsw wants to merge 11 commits intocanonical:mainfrom
blackboxsw:ci-lxd-vm

Conversation

@blackboxsw
Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw commented Feb 5, 2026

Improve our public scheduled integration test coverage to cover lxd_vm and ec2 platforms.

Running integration tests will require 3 repository secrets PYCLOUDLIB_TOML, SSH_PUBLIC_KEY and SSH_PRIVATE_KEY in order to configure pycloudlib to run integration tests.

This PR renames and orders Github workflow files to aid visibility when manually reviewing and running actions.

Add assertion on a non-empty Github repo-level secret named PYCLOUDLIB_TOML which will
be written to a temporary file which is set in the environment variable PYCLOUDLIB_CONFIG. This config is used sa runtime integration test configuration for pycloudlib.

Add an assertion step in 20-dispatch-common.yml to error when
secrets.PYCLOUDLIB_TOML is absent and conditional logic which skips
each integration test step to avoid wasting runner cycles on integration tests.

Proposed Commit Message

See individual commits

Additional Context

Test Steps

Failed run example empty or absent secret PYCLOUDLIB_TOML https://github.com/blackboxsw/cloud-init/actions/runs/21699149948/job/62575866822

Success RUN with PYCLOUDLIB_TOML secret set to "[ec2]"
https://github.com/blackboxsw/cloud-init/actions/runs/24084983529/job/70255277685

Merge type

  • Squash merge using "Proposed Commit Message"
  • Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

@blackboxsw blackboxsw force-pushed the ci-lxd-vm branch 2 times, most recently from 4f73b0f to ed8690d Compare February 5, 2026 05:20
@blackboxsw blackboxsw marked this pull request as draft February 5, 2026 05:21
Copy link
Copy Markdown
Member

@holmanb holmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first pass

Comment on lines +94 to +96
awk '/cloud-init version: /{printf DEB_VERSION=$NF; exit}' pytest-${{ inputs.platform }}-${{ inputs.release }}-${{ inputs.image_type }}.log
awk '/image-serial: /{printf IMAGE_SERIAL=$NF; exit}' pytest-${{ inputs.platform }}-${{ inputs.release }}-${{ inputs.image_type }}.log
shell: bash
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is are the awk commands doing here?

And why bash?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I was going to use these to help create a top-level report output that would announce what version of cloud-init was installed during the test. I'll drop this from this PR until I have a working approach. I didn't like how opaque our GH workflow runs are when compared to jenkins jobs which announce the version of cloud-init being tested.

Copy link
Copy Markdown
Collaborator Author

@blackboxsw blackboxsw Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally stuffing env vars them into GITHUB_ENV per these docs and the ctrf.io step can extract and use the environment variables for report headers or summary.

@holmanb holmanb self-assigned this Feb 5, 2026
@holmanb holmanb added the incomplete Action required by submitter label Feb 10, 2026
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Setup LXD
if: ${{ env.REQUIRED_SECRET != '' and contains(fromJSON('["lxd_vm", "lxd_container"]'), env.CLOUD_INIT_PLATFORM ) }}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid the setup-lxd step if we are not on CLOUD_INIT_OS_PLATFORM lxd_container of lxd_vm.

@blackboxsw blackboxsw requested a review from holmanb February 11, 2026 23:27
@blackboxsw blackboxsw removed the incomplete Action required by submitter label Feb 11, 2026
@blackboxsw blackboxsw marked this pull request as ready for review February 13, 2026 04:32
Comment on lines +67 to +68
test '${{ secrets.PYCLOUDLIB_TOML }}' != '' || echo "ERROR: Missing required repo secrets.PYCLOUDLIB_TOML non-empty value."
test '${{ secrets.PYCLOUDLIB_TOML }}' == '' && exit 1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This checks for the same thing twice, I think an if / else would be cleaner

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. We only really care to exit 1 in the face of no secrets.PYCLOUDLIB_TOML or empty string. So, I just put the operation in a single if clause.

- name: Checkout
if: ${{ env.REQUIRED_SECRET != '' }}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since env.REQUIRED_SECRET is assigned from secrets.PYCLOUDLIB_TOML, and the first step has an exit 1 when secrets.PYCLOUDLIB_TOML is empty, is this not redundant?

Same comment elsewhere.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped all unneessary conditionals checks. An exit 1 above will prevent running these steps anyway.

@holmanb holmanb added the incomplete Action required by submitter label Feb 26, 2026
@blackboxsw blackboxsw force-pushed the ci-lxd-vm branch 7 times, most recently from 0c75a04 to 3937aae Compare March 10, 2026 21:15
@holmanb
Copy link
Copy Markdown
Member

holmanb commented Mar 16, 2026

@blackboxsw let me know when this is ready for re-review

@github-actions github-actions bot added the stale-pr Pull request is stale; will be auto-closed soon label Mar 31, 2026
@canonical canonical deleted a comment from github-actions bot Apr 6, 2026
Copy link
Copy Markdown
Member

@holmanb holmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments.

on:
workflow_dispatch:
schedule:
# Run Mon & Thurs under reduced frequency for high-cost test runs
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"under reduced frequency" seems out of place

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed phrasing in all ec2 workflows.

@@ -0,0 +1,13 @@
name: "Daily: Integration Resolute lxd_vm"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming scheme isn't going to work with future expansion of new series, additional platforms, etc.

I would prefer that we do it in a way that isn't going to force us to do the same thing again the next time we add more platforms - something a little bit more sustainable would be better.

Also - numbers simultaneously increasing and decreasing is chaotic.

Copy link
Copy Markdown
Collaborator Author

@blackboxsw blackboxsw Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have dropped all unrelated renames and moved all integration platform runs to the end of the ordered list. Each platform is now grouped and reserves 10 workflow test slots for each platform.
Platform group prefix will be the following for each platform:

  • 100: lxd_container
  • 110: lxd_vm
  • 120: ec2

The oldest tested series (22.04) will be the first index at prefix 100-. Each additional supported series will increment from that initial platform index:

  • XX0: 22.04
  • XX1: 24.04
  • XX2: 25.10
  • XX3: 26.04

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions github-actions bot removed the stale-pr Pull request is stale; will be auto-closed soon label Apr 7, 2026
@blackboxsw blackboxsw requested a review from holmanb April 7, 2026 14:11
@blackboxsw blackboxsw removed the incomplete Action required by submitter label Apr 7, 2026
@blackboxsw
Copy link
Copy Markdown
Collaborator Author

@holmanb I have updated this branch and linked a successful runs on Ec2.

@holmanb
Copy link
Copy Markdown
Member

holmanb commented Apr 7, 2026

Thanks for the additional organization, @blackboxsw. I left some more feedback.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands scheduled integration test coverage by adding new scheduled workflows for lxd_vm (daily) and ec2 (twice weekly) and consolidates scheduled workflows to call a shared reusable workflow that now expects repository secrets to configure pycloudlib at runtime.

Changes:

  • Add new scheduled integration workflows for lxd_vm (22.04/24.04/25.10/26.04) and ec2 (22.04/24.04/25.10/26.04).
  • Update existing lxd_container scheduled workflows to use 100-dispatch-common.yml and standardize workflow names.
  • Update 100-dispatch-common.yml to consume repo secrets (PYCLOUDLIB_TOML, SSH keys) and add basic secret presence assertions.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
.github/workflows/100-dispatch-common.yml Central reusable workflow updated to handle secrets and support lxd_vm and ec2 platforms.
.github/workflows/110-daily-integration-22.04-lxd_container.yml Standardize name; switch to 100-dispatch-common.yml.
.github/workflows/111-daily-integration-24.04-lxd_container.yml Standardize name; switch to 100-dispatch-common.yml.
.github/workflows/112-daily-integration-25.10-lxd_container.yml Standardize name; switch to 100-dispatch-common.yml.
.github/workflows/113-daily-integration-26.04-lxd_container.yml Standardize name; switch to 100-dispatch-common.yml.
.github/workflows/120-daily-integration-22.04-lxd_vm.yml New daily scheduled lxd_vm integration run.
.github/workflows/121-daily-integration-24.04-lxd_vm.yml New daily scheduled lxd_vm integration run.
.github/workflows/122-daily-integration-25.10-lxd_vm.yml New daily scheduled lxd_vm integration run.
.github/workflows/123-daily-integration-26.04-lxd_vm.yml New daily scheduled lxd_vm integration run.
.github/workflows/130-daily-integration-22.04-ec2.yml New twice-weekly scheduled ec2 integration run.
.github/workflows/131-daily-integration-24.04-ec2.yml New twice-weekly scheduled ec2 integration run.
.github/workflows/132-daily-integration-25.10-ec2.yml New twice-weekly scheduled ec2 integration run.
.github/workflows/133-daily-integration-26.04-ec2.yml New twice-weekly scheduled ec2 integration run.
Comments suppressed due to low confidence (7)

.github/workflows/100-dispatch-common.yml:66

  • The image_type and install_source inputs are optional, but this workflow unconditionally exports CLOUD_INIT_OS_IMAGE_TYPE and CLOUD_INIT_CLOUD_INIT_SOURCE from them. When these inputs are omitted (as in the scheduled callers added in this PR), they become empty strings and override the non-empty defaults in tests/integration_tests/integration_settings.py, likely breaking image selection / install source resolution. Consider providing defaults in workflow_call/workflow_dispatch input definitions or using expression fallbacks in env (e.g., default generic and ppa:cloud-init-dev/daily/IN_PLACE, whichever is intended).
    .github/workflows/100-dispatch-common.yml:76
  • The required secrets are documented as PYCLOUDLIB_TOML, SSH_PUBLIC_KEY, and SSH_PRIVATE_KEY, but only PYCLOUDLIB_TOML is asserted. If either SSH key secret is missing/empty, the workflow will silently write empty key files and fail later in less obvious ways. Suggest adding similar non-empty assertions for SSH_PRIVATE_KEY and SSH_PUBLIC_KEY here (or in the same assertion step).
    .github/workflows/100-dispatch-common.yml:75
  • This assertion uses workflow expression interpolation (${{ env.REQUIRED_SECRET }}) inside the shell script, which injects the secret value directly into the generated script text. It’s safer to reference the environment variable at runtime (e.g., $REQUIRED_SECRET) to avoid quoting/newline edge cases and reduce chances of accidental exposure in debug output.
    .github/workflows/100-dispatch-common.yml:99
  • The PR description and linked “success run” indicate secrets.PYCLOUDLIB_TOML can be a plain TOML snippet like [ec2], but this step base64-decodes the secret. If the secret isn’t base64-encoded, base64 -d will fail (or write garbage) and break the workflow. Either store PYCLOUDLIB_TOML as base64 and document that requirement, or write the secret to the file directly using a quoting-safe approach.
    .github/workflows/100-dispatch-common.yml:106
  • Piping the tox run into tee can mask test failures because, without pipefail, the step’s exit status will typically be tee’s exit status (often 0) rather than tox’s. This could cause the workflow to pass even when integration tests fail. Consider enabling set -o pipefail for this step (or avoid the pipe) so tox failures correctly fail the job.
    .github/workflows/100-dispatch-common.yml:112
  • The CTRF reporter title includes ${{ env.DEB_VERSION }}, but this workflow doesn’t set DEB_VERSION anywhere. As written, the title will contain an empty version (e.g., :v). Either set DEB_VERSION earlier (e.g., from dpkg-parsechangelog/packaging metadata) or drop it from the title to avoid misleading report labels.
    .github/workflows/100-dispatch-common.yml:76
  • PR description mentions conditional logic to skip integration test steps when PYCLOUDLIB_TOML is absent to avoid wasting runner cycles, but this workflow currently exits 1 immediately when the secret is empty. If skipping (with a neutral conclusion) is the intended behavior for scheduled/dispatch runs, this implementation doesn’t match that description.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +7
name: "Bi-weekly 22.04: ec2"

on:
workflow_dispatch:
schedule:
# Run Mon & Thurs for high-cost test runs
- cron: '2 22 * * 1,4'
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow name says “Bi-weekly”, but the cron (1,4) schedules it twice per week (Mon & Thu). “Bi-weekly” is ambiguous (every two weeks vs twice a week). Consider renaming to “Twice weekly”/“Semiweekly” (or similar) to avoid confusion in the Actions UI.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +7
name: "Bi-weekly 24.04: ec2"

on:
workflow_dispatch:
schedule:
# Run Mon & Thurs for high-cost test runs
- cron: '2 22 * * 1,4'
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow name says “Bi-weekly”, but the cron (1,4) schedules it twice per week (Mon & Thu). “Bi-weekly” is ambiguous (every two weeks vs twice a week). Consider renaming to “Twice weekly”/“Semiweekly” (or similar) to avoid confusion in the Actions UI.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +7
name: "Bi-weekly 25.10: ec2"

on:
workflow_dispatch:
schedule:
# Run Mon & Thurs for high-cost test runs
- cron: '2 22 * * 1,4'
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow name says “Bi-weekly”, but the cron (1,4) schedules it twice per week (Mon & Thu). “Bi-weekly” is ambiguous (every two weeks vs twice a week). Consider renaming to “Twice weekly”/“Semiweekly” (or similar) to avoid confusion in the Actions UI.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +7
name: "Bi-weekly 26.04: ec2"

on:
workflow_dispatch:
schedule:
# Run Mon & Thurs for high-cost test runs
- cron: '2 22 * * 1,4'
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow name says “Bi-weekly”, but the cron (1,4) schedules it twice per week (Mon & Thu). “Bi-weekly” is ambiguous (every two weeks vs twice a week). Consider renaming to “Twice weekly”/“Semiweekly” (or similar) to avoid confusion in the Actions UI.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

@holmanb holmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @blackboxsw.

echo "[lxd]" > /home/$USER/.config/pycloudlib.toml
mkdir -p ~/.ssh
sh -c 'printf "%s\n" "$SSH_PRIVATE_KEY" > ~/.ssh/cloudinit_id_rsa'
chmod 600 ~/.ssh/cloudinit_id_rsa
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This order feels odd.

sh -c ... *_rsa.pub and sh -c ... *_rsa are related because they are similar calls for a pair of keys
sh -c *_rsa and chmod ... are related due to an ordering dependency

so if we instead order

sh -c ... *_rsa.pub
sh -c ... *_rsa
chmod ...

Then we can keep code adjacent that is more closely related - and it is also easier to read.

slowest-report: true
upload-artifact: true
github-report: true
artifact-name: ctrf-report-${{ inputs.platform }}-${{inputs.release}}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While making changes nearby can we please make the formats consistent?

ctrf-report-${{ inputs.platform }}-${{ inputs.release }}

Comment on lines +94 to +95
PYCLOUDLIB_CONFIG="${{ runner.temp }}/pycloudlib.toml"
export PYCLOUDLIB_CONFIG
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We set other environment variables in the dedicated section above - this is not consistent.

PYCLOUDLIB_CONFIG="${{ runner.temp }}/pycloudlib.toml"
export PYCLOUDLIB_CONFIG

# Dump secrets using a subshell to avoid leaks due to xtrace.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this comment to the first time that this pattern is used?

Also, this comment is technically wrong. It uses a subprocess not a subshell - lets correct it as well. And since x-trace isn't used by default this isn't required, though I'm in favor of keeping it and clarifying that it is to avoid accidental leaks while debugging:

# Dump secrets using a subprocess to avoid accidental leaks when using set -x.

uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
with:
name: failure-${{ github.job }}
path: ${{ github.workspace }}/cloud_init_test_logs/
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't the logs include secrets?

insights-report: true
flaky-rate-report: true
slowest-report: true
upload-artifact: true
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly - if uploading, are we certain that there are no secrets?

with:
report-path: '${{ github.workspace }}/reports/junit-report-${{ inputs.platform}}-${{ inputs.release }}.xml'
title: '${{ inputs.platform }}-${{ inputs.release }}-${{ inputs.image_type }}:${{ inputs.install_source }}:v${{ env.DEB_VERSION }}'
integrations-config: |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be used for integrating with other tools - not currently used. Can we drop this for now?

- name: Publish Test Report with Insights
if: always()
if: ${{ always() }}
uses: ctrf-io/github-test-reporter@024bc4b64d997ca9da86833c6b9548c55c620e40 # v1.0.26
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't fully vetted this action yet. Were others considered?

uses: ctrf-io/github-test-reporter@024bc4b64d997ca9da86833c6b9548c55c620e40 # v1.0.26
with:
report-path: '${{ github.workspace }}/reports/junit-report-${{ inputs.platform}}-${{ inputs.release }}.xml'
title: '${{ inputs.platform }}-${{ inputs.release }}-${{ inputs.image_type }}:${{ inputs.install_source }}:v${{ env.DEB_VERSION }}'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be redundant? I would expect the job to encode this information already.

@@ -72,20 +86,30 @@
rm -rf ${{ github.workspace }}/cloud_init_test_logs/
- name: Setup pycloudlib
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This does two distinct things:

  1. Configure ssh
  2. Set pycloudlib configuration

I would probably make this two distinct steps.

Copy link
Copy Markdown
Member

@holmanb holmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants