Skip to content

cmd/status: add --timeout to cloud-init status --wait#6796

Open
koushik717 wants to merge 4 commits intocanonical:mainfrom
koushik717:feature/status-wait-timeout
Open

cmd/status: add --timeout to cloud-init status --wait#6796
koushik717 wants to merge 4 commits intocanonical:mainfrom
koushik717:feature/status-wait-timeout

Conversation

@koushik717
Copy link
Copy Markdown

@koushik717 koushik717 commented Mar 17, 2026

Proposed Changes

Add an optional --timeout SECONDS argument to cloud-init status --wait.

When --timeout is given and cloud-init has not finished within that period, the command exits with return code 1 and prints:

Timed out waiting for cloud-init to complete after Xs

When --timeout is not passed, behavior is completely unchanged.

Closes #4059

Test Coverage

Three new unit tests added to tests/unittests/cmd/test_status.py:

  • test_status_wait_timeout_completes_before_timeout finishes before deadline, exits 0
    • test_status_wait_timeout_expires deadline exceeded, exits 1 with message
    • test_status_wait_no_timeout_unchanged no --timeout passed, original behavior preserved
      All 47 tests in the status test suite pass.

Notes

Help text for --timeout includes the warning requested in the issue:

"Note: using --timeout means cloud-init may not have completed configuration at exit."

Add an optional --timeout <seconds> argument to cloud-init status --wait.
When --timeout is given and cloud-init has not finished within that period,
exit with return code 1 and print a clear message. When --timeout is not
passed, behavior is unchanged.

Fixes: canonicalGH-4059
@holmanb holmanb self-assigned this Mar 17, 2026
- Reformat tests/unittests/cmd/test_status.py with black (dedent style)
- Shorten long docstring line to satisfy ruff E501 (max 79 chars)
Black moved the # noqa: E501 comments from the closing triple-quote
lines to the closing paren lines, breaking ruff suppression. Move
them back to the """ lines where ruff checks them.
@koushik717 koushik717 force-pushed the feature/status-wait-timeout branch from 2f9fdce to f2b38ad Compare March 18, 2026 00:53
@holmanb
Copy link
Copy Markdown
Member

holmanb commented Mar 18, 2026

What is your use case? Is there something that this accomplishes that you cannot achieve with something like the timeout utility? Please see my latest comment on the linked bug.

@koushik717
Copy link
Copy Markdown
Author

What is your use case? Is there something that this accomplishes that you cannot achieve with something like the timeout utility? Please see my latest comment on the linked bug.

My main use case is provisioning scripts and CI pipelines where cloud-init needs to finish before the next step, but hanging indefinitely isn’t acceptable.
The timeout utility gets close but has a few issues: it exits with code 124 which conflicts with cloud-init’s existing exit codes and makes script error handling messy. It also isn’t available in all environments where cloud-init runs, particularly minimal containers. And since wait polls at intervals, timeout can fire a SIGTERM mid-poll rather than between cycles, which is less clean.
A native flag handles all of this and is self-contained and discoverable via help. That said, I saw your comment on the linked bug and happy to rework the approach if you

@github-actions github-actions bot added the stale-pr Pull request is stale; will be auto-closed soon label Apr 2, 2026
@koushik717
Copy link
Copy Markdown
Author

Hi @blackboxsw just saw the stale notice. I'm still actively interested in getting this merged. Happy to rebase, address any review feedback, or make any changes needed. Would appreciate a look when you get a chance. Thanks!

@github-actions github-actions bot removed the stale-pr Pull request is stale; will be auto-closed soon label Apr 3, 2026
koushik717 added a commit to koushik717/cloud-init-builder that referenced this pull request Apr 3, 2026
- Schema-driven forms for 8 cloud-init modules (users, packages, runcmd,
  write_files, ssh, hostname, timezone, ntp)
- Real-time YAML output with Monaco Editor starting with #cloud-config
- Client-side validation with Ajv against official cloud-init JSON schema
- Server-side validation via FastAPI backend
- 5 built-in templates: Ubuntu Server, Docker Host, Kubernetes Node,
  Web Server, Developer Workstation
- Shareable config links via lz-string URL encoding
- Full keyboard navigation and ARIA accessibility
- axe-core accessibility tests in CI
- Built with @canonical/react-components (Vanilla Framework)
- Motivated by canonical/cloud-init#6796 and canonical/react-components#1339
@canonical canonical deleted a comment from github-actions bot Apr 6, 2026
@blackboxsw
Copy link
Copy Markdown
Collaborator

Hello @koushik717, I'm not fully convinced of the avoidance of script complexity by introducing a --timeout flag in cloud-init status --wait.

If we were to provide a --timeout option, I'd want cloud-init to exit with a different exit code than 1 because a timeout is not the same condition as a real error, it is a symptom of cloud-init not being able to complete all boot stages due to any number of conditions which are not necessarily cloud-init encountering an error.

  • sluggish rate-limited CPU on emulated platforms or 'tiny' instances
  • unavailable or intermittent IMDS outage due to temporary network outages or limited band-width
  • systemd environments with competing service dependency ordering forcing one of cloud-init's 4 boot stages to be ejected from the boot target and therefore never run.

If we moved forward with this feature, under the premise of supporting minimal images support which don't have access to coreutils, I would want to:

  • use a differentiated exit code 124 to distinguish from cloud-init-proper failure
  • understand which distribution images we are supporting which don't have access to coreutils timeout (Ubuntu minimal images come with coreutils)

@koushik717
Copy link
Copy Markdown
Author

Thanks for the detailed feedback @blackboxsw .
On the exit code: exit code 124 makes sense here. It is the standard convention for timeout and preserves the distinction between a real cloud-init failure and a wait that simply ran out of time. I will update the implementation to use 124.
On the coreutils question: I will research which specific minimal images actually lack access to timeout. You mentioned Ubuntu minimal images come with coreutils, so I need to narrow down the actual use case more precisely. I will check Alpine, Arch minimal, Debian slim, and a few others and follow up with specific examples.
If it turns out most real-world minimal images do have coreutils, I will reconsider whether the flag justifies the added complexity. I want to make sure this solves a real problem before pushing further.
Will update the PR with the exit code change and follow up with the distro research within a few days.

@holmanb
Copy link
Copy Markdown
Member

holmanb commented Apr 6, 2026

On the exit code: exit code 124 makes sense here. It is the standard convention for timeout and preserves the distinction between a real cloud-init failure and a wait that simply ran out of time. I will update the implementation to use 124.

Before you said that error handling with different error codes was the problem you were trying to solve. Now you agree that timeout's behavior makes sense. So your complaint is about those instances of not having timeout. But you didn't even have a target distro in mind. This sounds a lot more like a pet feature than a problem that needs to be solved.

If you have a platform without timeout, this would work:

cloud-init status --wait & pid=$!; (sleep 30; kill $pid 2>/dev/null) & watcher=$!; wait $pid 2>/dev/null; kill $watcher 2>/dev/null

For how trivial this is to do with timeout or any programming language, this just doesn't seem worth it to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add --timeout to cloud-init status --wait

3 participants