Skip to content

[codex] Apply generated cpflow GitHub Actions flow#732

Merged
justin808 merged 30 commits intomasterfrom
codex/cpflow-github-actions-flow
May 9, 2026
Merged

[codex] Apply generated cpflow GitHub Actions flow#732
justin808 merged 30 commits intomasterfrom
codex/cpflow-github-actions-flow

Conversation

@justin808
Copy link
Copy Markdown
Member

@justin808 justin808 commented Apr 30, 2026

Summary

  • Replace the older hand-written Control Plane GitHub Actions with the generated cpflow-* action and workflow set from cpflow generate-github-actions.
  • Keep review apps opt-in, staging deploys branch-gated, production promotion manual, and nightly review-app cleanup in the generated flow.
  • Update Control Plane docs links so they point at the new cpflow-* action/workflow names.

Verification

  • BUNDLE_GEMFILE=/tmp/cpflow-pr-278/Gemfile bin/conductor-exec bundle exec ruby /tmp/cpflow-pr-278/bin/cpflow github-flow-readiness — no blocking readiness issues detected.
  • bin/conductor-exec ruby -e 'require "yaml"; paths = Dir[".github/**/*.yml"] + Dir[".github/**/*.yaml"] + Dir[".controlplane/**/*.yml"]; paths.sort.each { |path| YAML.load_file(path, aliases: true); puts path }'
  • bin/conductor-exec git diff --cached --check
  • bin/conductor-exec bundle exec rubocop

Not Run / Blocked

  • bin/conductor-exec bundle exec rspec could not complete locally because PostgreSQL was not running at /tmp/.s.PGSQL.5432; after the first Rails load failure, Coveralls setup also reported repeated coverage initialization errors.

Note

Medium Risk
Replaces all Control Plane deployment automation (review apps, staging deploys, production promotion/rollback, cleanup) with a newly generated workflow set, so misconfiguration could block or mis-deploy environments. Changes also touch secret-handling paths (tokens/SSH key for Docker builds), requiring careful review of permissions and trust boundaries.

Overview
Switches Control Plane CI/CD from the hand-written GitHub Actions to the generated cpflow-* composite actions and workflows, covering opt-in review apps, branch-gated staging deploys, manual staging→production promotion, and nightly stale review-app cleanup.

The new flow adds shared actions for environment setup/version pinning, config validation, safer Docker builds (optional SSH key + extra args), release-phase detection, health polling, and stricter review-app deletion safeguards (prefix-based). Docs in .controlplane/readme.md and .controlplane/shakacode-team.md are updated to match the new workflows, required repo variables/secrets, and operational guidance (slash-command exact matching, staging branch filter caveats).

Reviewed by Cursor Bugbot for commit 79c6b97. Bugbot is set up for automated code reviews on this repo. Configure here.

Summary by CodeRabbit

  • Documentation

    • Updated review app and staging deployment guidance with new slash-command workflows and repository configuration requirements.
  • New Features

    • Enhanced deployment automation with health-check validation, release-phase detection, and improved staging-to-production promotion flow.
  • Refactor

    • Reorganized GitHub Actions workflows with standardized naming convention for improved clarity and maintainability.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: feaec977-e214-4ed3-b67b-45daf88e82f8

📥 Commits

Reviewing files that changed from the base of the PR and between 2f7d333 and 79c6b97.

📒 Files selected for processing (28)
  • .controlplane/readme.md
  • .controlplane/shakacode-team.md
  • .github/actions/build-docker-image/action.yml
  • .github/actions/cpflow-build-docker-image/action.yml
  • .github/actions/cpflow-delete-control-plane-app/action.yml
  • .github/actions/cpflow-delete-control-plane-app/delete-app.sh
  • .github/actions/cpflow-detect-release-phase/action.yml
  • .github/actions/cpflow-setup-environment/action.yml
  • .github/actions/cpflow-validate-config/action.yml
  • .github/actions/cpflow-wait-for-health/action.yml
  • .github/actions/delete-control-plane-app/action.yml
  • .github/actions/delete-control-plane-app/delete-app.sh
  • .github/actions/setup-environment/action.yml
  • .github/cpflow-help.md
  • .github/workflows/cpflow-cleanup-stale-review-apps.yml
  • .github/workflows/cpflow-delete-review-app.yml
  • .github/workflows/cpflow-deploy-review-app.yml
  • .github/workflows/cpflow-deploy-staging.yml
  • .github/workflows/cpflow-help-command.yml
  • .github/workflows/cpflow-promote-staging-to-production.yml
  • .github/workflows/cpflow-review-app-help.yml
  • .github/workflows/delete-review-app.yml
  • .github/workflows/deploy-to-control-plane-review-app.yml
  • .github/workflows/deploy-to-control-plane-staging.yml
  • .github/workflows/help-command.yml
  • .github/workflows/nightly-remove-stale-review-apps.yml
  • .github/workflows/promote-staging-to-production.yml
  • .github/workflows/review-app-help.yml

Walkthrough

This PR refactors Control Plane CI/CD automation from manual setup/build/deploy actions to a unified cpflow CLI–based system. It introduces parameterized composite actions for environment setup, Docker image building with SSH support, and application deletion; adds new workflows for PR-based review apps, staging deployment, and manual production promotion with health checks and rollback; and updates documentation covering slash commands, repository secrets/variables, and deployment processes.

Changes

cpflow GitHub Actions Refactor

Layer / File(s) Summary
Documentation
.controlplane/readme.md, .controlplane/shakacode-team.md, .github/cpflow-help.md
Updated review app instructions, deployment flows, repository configuration requirements (secrets/variables), and new help guide for slash commands (/deploy-review-app, /delete-review-app, /help).
Validation & Health Actions
.github/actions/cpflow-validate-config/action.yml, .github/actions/cpflow-detect-release-phase/action.yml, .github/actions/cpflow-wait-for-health/action.yml
Added configuration validation, release-phase feature detection, and health-polling composite actions used by deployment workflows.
Environment, Build & Delete Actions
.github/actions/cpflow-setup-environment/action.yml, .github/actions/cpflow-build-docker-image/action.yml, .github/actions/cpflow-delete-control-plane-app/...
Added unified setup action installing Control Plane CLI/cpflow gem, docker build action with SSH key/known-hosts support, and deletion action with safety prefix validation.
Legacy Actions Removed
.github/actions/setup-environment/action.yml, .github/actions/build-docker-image/action.yml, .github/actions/delete-control-plane-app/...
Removed older composite actions replaced by cpflow-based versions.
Review App Workflows
.github/workflows/cpflow-deploy-review-app.yml, .github/workflows/cpflow-delete-review-app.yml, .github/workflows/cpflow-review-app-help.yml
Added workflows for deploy/delete review apps triggered by slash commands or PR events, with GitHub comment/deployment status updates and health checks.
Staging & Production Workflows
.github/workflows/cpflow-deploy-staging.yml, .github/workflows/cpflow-promote-staging-to-production.yml
Added workflows for auto-deploying staging on master push and manual staging-to-production promotion with environment parity verification, health checks, and image rollback.
Help & Cleanup Workflows
.github/workflows/cpflow-help-command.yml, .github/workflows/cpflow-cleanup-stale-review-apps.yml
Added workflows for posting help documentation on /help commands and cleaning up stale review applications on daily cron schedule.
Legacy Workflows Removed
.github/workflows/delete-review-app.yml, .github/workflows/deploy-to-control-plane-*.yml, .github/workflows/help-command.yml, .github/workflows/nightly-remove-stale-review-apps.yml, .github/workflows/promote-staging-to-production.yml, .github/workflows/review-app-help.yml
Removed older workflows replaced by cpflow-based equivalents.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • shakacode/react-webpack-rails-tutorial#615: Related refactoring of Control Plane GitHub Actions for review/staging deployment and app deletion that also modifies build-docker-image, setup-environment, and delete-app actions/scripts.

Poem

🐰 The cpflow bunnies hop with glee,
From legacy actions, now set free!
Review apps bloom on PR commands,
Staging deploys at master's hands,
Promotion workflows standing tall—
One unified CLI rules them all! 🚀

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/cpflow-github-actions-flow

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Upstream token not passed to copy-image command
    • Added the required -t upstream token argument to the cpflow copy-image-from-upstream command while keeping the token sourced from secrets via the environment.

Create PR

Or push these changes by commenting:

@cursor push 3606e16506
Preview (3606e16506)
diff --git a/.github/workflows/cpflow-promote-staging-to-production.yml b/.github/workflows/cpflow-promote-staging-to-production.yml
--- a/.github/workflows/cpflow-promote-staging-to-production.yml
+++ b/.github/workflows/cpflow-promote-staging-to-production.yml
@@ -203,14 +203,13 @@
 
       - name: Copy image from staging
         env:
-          # Pass the upstream token via env rather than `-t` so it doesn't appear in /proc/<pid>/cmdline.
           CPLN_UPSTREAM_TOKEN: ${{ secrets.CPLN_TOKEN_STAGING }}
           PRODUCTION_APP_NAME: ${{ vars.PRODUCTION_APP_NAME }}
           CPLN_ORG_PRODUCTION: ${{ vars.CPLN_ORG_PRODUCTION }}
         shell: bash
         run: |
           set -euo pipefail
-          cpflow copy-image-from-upstream -a "${PRODUCTION_APP_NAME}" --org "${CPLN_ORG_PRODUCTION}"
+          cpflow copy-image-from-upstream -a "${PRODUCTION_APP_NAME}" -t "${CPLN_UPSTREAM_TOKEN}" --org "${CPLN_ORG_PRODUCTION}"
 
       - name: Deploy image to production
         env:

You can send follow-ups to the cloud agent here.

Comment thread .github/workflows/cpflow-promote-staging-to-production.yml Outdated
@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Code Review: cpflow GitHub Actions Migration

This is a solid migration from hand-written to generated cpflow-* workflows. The overall architecture is a clear improvement. Here's a summary of what stands out.


Security — meaningful fixes

  • Token exposure fixed: The old setup-environment action passed the token as --token "$TOKEN" (visible in /proc/<pid>/cmdline and ps). The new action correctly uses CPLN_TOKEN env var. Same for CPLN_UPSTREAM_TOKEN in the promote workflow.
  • Critical: /deploy-review-app now requires author_association. The old workflow had no author_association check, meaning any GitHub user who could comment on a PR could trigger a deployment. The new check ("OWNER","MEMBER","COLLABORATOR") correctly gates this.
  • Exact command match (== '/deploy-review-app') replaces contains(). Prevents accidental triggers from comments like "I ran /deploy-review-app earlier and it worked".
  • SSH key isolation: The cpflow-build-docker-image action scopes DOCKER_BUILD_SSH_KEY to a dedicated step so it never appears in the build step's environment — even when ACTIONS_STEP_DEBUG=true is set. The EXIT trap cleans up the key file and kills the agent. Well designed.
  • Fork PR handling: Fork PRs correctly skip auto-deploy (which can't access secrets) while still allowing approved collaborators to deploy via /deploy-review-app. The cpflow-review-app-help.yml is explicit about why pull_request_target is used there (no code checkout = safe for write access from forks).

Bugs fixed

  • cancel-in-progress: false on all deploy workflows. The old staging workflow used cancel-in-progress: true, which could interrupt a cpflow deploy-image mid-rollout and leave the workload partially updated. Now fixed across all three deployment workflows.
  • Delete on PR close now uses pull_request_target. The old delete-review-app.yml used pull_request: types: [closed], which runs in a restricted context without secret access when a fork PR closes. The new workflow correctly uses pull_request_target for the close event.
  • --run-release-phase is now conditional. The old staging deploy hardcoded it unconditionally; the new workflows detect release_script: presence at runtime before deciding.
  • Delete safety uses prefix allowlist, not blocklist. The old delete-app.sh used grep -iqE '(production|staging)' to block "dangerous" names — fragile against future app naming. The new delete-app.sh requires the app name to start with REVIEW_APP_PREFIX, which is a stronger guarantee.

Potential issues

  1. cpflow exists in the deploy workflow doesn't distinguish auth errors from "not found" (see inline comment). The delete-app.sh carefully pattern-matches error output to surface API/auth failures, but the deploy workflow's check-app step treats all non-zero exits as exists=false. An intermittent auth error on a push event would silently skip the deploy.

  2. staging deploy job has no needs: validate-branch: The deploy job correctly declares needs: [validate-branch, build] and has an if: guard, but if validate-branch produces no output (e.g., first run where the variable step fails), the if: could evaluate against an empty string. In practice the validation step exits non-zero, so build never runs and deploy is skipped by the needs: dependency. This is fine but the if: guard is slightly redundant with the needs: dependency on validate-branch.

  3. Help command has no contents: read permission: cpflow-help-command.yml declares only issues: write and pull-requests: write. There's no explicit contents: read but GitHub's default GITHUB_TOKEN grants read on public repos, so this works — just worth noting that it relies on the repo's default permissions.


Nits

  • cpflow-deploy-staging.yml uses GITHUB_REF_NAME (simpler), whereas the old workflow used ${GITHUB_REF#refs/heads/} (strip-prefix approach that would silently behave oddly for tag refs). Small but cleaner.
  • The shakacode-team.md link fix (-info org URL corrected to shakacode-open-source-examples-production) is a separate correctness fix bundled in cleanly.

Overall this is well-structured with meaningful security and reliability improvements. The one item I'd prioritize is the cpflow exists error handling consistency noted in the inline comment.

if cpflow exists -a "${APP_NAME}" --org "${CPLN_ORG}"; then
echo "exists=true" >> "$GITHUB_OUTPUT"
else
echo "exists=false" >> "$GITHUB_OUTPUT"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if/else around cpflow exists conflates two different failure modes: "app does not exist" (expected non-zero) and "API/auth error" (unexpected non-zero). Both land in the else branch and set exists=false.

Compare delete-app.sh in this same PR, which carefully pattern-matches the output ("Double check your org", "Unknown API token format", "ERROR", etc.) to distinguish a real failure from a genuine not-found and surface it as a hard error.

If there's a transient auth issue when a PR is pushed, this step sets exists=false and the pull_request path silently writes a step-summary saying the app doesn't exist yet — masking the real problem until the deploy is manually retried.

Suggested approach: mirror delete-app.sh's pattern — capture the output, check exit code, inspect the text for known error tokens, and exit non-zero on a real failure rather than treating it as "not found".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b9095c6 by capturing cpflow exists output and failing hard on auth/API error tokens instead of treating every non-zero exit as app-not-found.


if [[ -f "${HOME}/.ssh/cpflow_build_key" ]]; then
eval "$(ssh-agent -s)"
trap 'ssh-agent -k >/dev/null; rm -f "${HOME}/.ssh/cpflow_build_key"' EXIT
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EXIT trap kills the SSH agent and removes cpflow_build_key, but the ~/.ssh/known_hosts file written in the previous step is not cleaned up here. On ephemeral GitHub-hosted runners this is harmless (the VM is discarded), but if this action is ever adopted on self-hosted runners with a persistent ~/.ssh, the custom known_hosts entries would accumulate across runs.

Consider adding rm -f "${HOME}/.ssh/known_hosts" to the trap, or using a temp file under $RUNNER_TEMP instead of ~/.ssh for the known_hosts.

default: "3.3.1"
cpflow_version:
description: cpflow gem version
required: false
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI version is pinned here as a default, which is good for reproducibility. Worth noting: since sudo npm install -g fetches from the npm registry at runtime, a compromised release of @controlplane/cli@3.3.1 would silently affect all runs. Consider adding a SHA/integrity check (e.g. via npm install --integrity) or at minimum locking to a digest-pinned download, especially since this step runs in workflows that access staging/production tokens.


current_image="${selected_image:-${first_image}}"
current_version="${selected_version:-${first_version}}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The staging env-var parity check only compares names, not values. It will catch a missing DATABASE_URL in production, but it won't catch a case where production has the right key names but wrong or stale values (e.g. a staging secret was rotated but the production counterpart wasn't updated). This is expected behaviour for a name-only check — just worth documenting explicitly so it's clear what the check does and doesn't guarantee.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented in b9095c6: the production promotion workflow checks that production has the same environment variable names as staging, but it does not compare secret values.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Code Review: Replace hand-written CI/CD Actions with generated cpflow- flow*

Well-considered overhaul that hardens security, safety, and operational reliability. No blocking issues found.

Security improvements

Token no longer passed via --token flag — old setup-environment exposed it in /proc/cmdline and ps; now uses CPLN_TOKEN env var only. SSH key scoped to a dedicated step so ACTIONS_STEP_DEBUG cannot leak it from the build step. Delete command now gated on author_association (old workflow had no permission check). Fork PR protection blocks syncs before any secret is consumed. Old contains() command matching replaced with exact match.

Safety improvements

cancel-in-progress changed from true to false everywhere — old staging deploy risked partial GVC deployments mid-rollout. Prefix-based delete guard replaces the old substring check for production/staging names. Production rollback logic added with per-workload state capture and restore — old workflow had no rollback at all. Health checks now configurable (HEALTH_CHECK_RETRIES, HEALTH_CHECK_INTERVAL, HEALTH_CHECK_ACCEPTED_STATUSES).

Issue 1: Fork-PR /deploy-review-app comment silently fails with no PR feedback

When a collaborator comments /deploy-review-app on a fork PR, the job condition passes but Validate review app deployment source exits 1 with only a runner log message. No PR comment explains the rejection. Suggest posting an issues.createComment before exit 1, similar to how the unconfigured-secrets path writes to GITHUB_STEP_SUMMARY.

Issue 2: cpflow exists error detection is string-matching fragile

delete-app.sh uses case matching over stderr to distinguish app-not-found from auth/network errors. The inline TODO acknowledges this. If cpflow changes an error message, the pattern will not match and a real failure will be silently treated as app not found, causing delete to return success vacuously. The TODO to switch to a structured exit code is the right long-term fix.

Issue 3: gem install / npm install without integrity pinning

cpflow-setup-environment/action.yml installs @controlplane/cli@3.3.1 and cpflow 4.2.0 without hash verification. A re-tagged or compromised release would install silently. For production-adjacent CI, consider verifying SHA256 after install or using lockfiles with integrity entries.

Issue 4: Nightly cleanup has no failure notification

cpflow-cleanup-stale-review-apps.yml runs on cron with no failure notification. Silent failures accumulate stale apps over days. Consider a step gated on if: failure() that posts a GitHub issue or annotation.

Minor observations

APP_NAME / PR_NUMBER double-assignment is intentional (workflow-level env needed early for concurrency.group, step overwrites with resolved values) but a short comment would clarify intent. cpflow-review-app-help.yml using pull_request_target is correct and well-documented. deploy job in cpflow-deploy-staging.yml re-runs cpflow-setup-environment unnecessarily (only needs cpflow deploy-image) — minor redundancy, acceptable for simplicity.

echo "🗑️ Deleting application: $APP_NAME"
cpflow delete -a "$APP_NAME" --org "$CPLN_ORG" --yes

echo "✅ Successfully deleted application: $APP_NAME"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case pattern matching over stderr is the right approach given cpflow exists doesn't expose a structured signal, and the inline TODO is the right way to track this. One additional defensive measure worth considering: log the full exists_output to the runner even when the "not found" path is taken, so there's an audit trail if unexpected output silently passed through the filter. Right now the output is printed only if non-empty, but it's easy to miss in logs.

Also worth noting: the pattern list doesn't include "timeout", "connection refused", or "ENOTFOUND" — all plausible network failure messages. If cpflow's Ruby HTTP client times out, it may surface Errno::ECONNREFUSED or Net::ReadTimeout, which would match "Net::", so that's covered. But curl-style timeouts would not. Low risk given this is a Ruby gem, but worth keeping in mind when bumping the cpflow version.

shell: bash
run: |
set -euo pipefail
cpflow cleanup-stale-apps -a "${{ vars.REVIEW_APP_PREFIX }}" --org "${{ vars.CPLN_ORG_STAGING }}" --yes
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No failure notification exists for this cron job. If cpflow cleanup-stale-apps fails (auth error, API hiccup, etc.), stale review apps accumulate silently across days.

Consider appending a step:

      - name: Notify on failure
        if: failure()
        run: |
          echo "::error::Stale review app cleanup failed. Check the workflow logs."

Or, if the team monitors GitHub issues, use gh issue create to open a tracking issue on failure so it's not lost in workflow history.

} >> "$GITHUB_STEP_SUMMARY"
exit 0
fi

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a collaborator comments /deploy-review-app on a fork PR, execution reaches this exit 1 with only a runner log message — no feedback is posted to the PR thread. The collaborator sees a red workflow run with no explanation in the PR.

Consider posting a comment before exiting. The GH_TOKEN is available from the Resolve PR ref and commit step but not inherited here; you'd need to add GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} to this step's env block. Something like:

gh api "repos/${GITHUB_REPOSITORY}/issues/${PR_NUMBER}/comments" \
  --method POST \
  --field body="Fork PRs cannot use /deploy-review-app because this workflow uses repository secrets. Ask a maintainer to deploy from a branch in the base repository." \
  --silent || true

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1e46bfe: fork PR /deploy-review-app rejections now post a PR comment before exiting, while still keeping the deployment blocked because the workflow uses repository secrets.

run: |
set -euo pipefail

sudo npm install -g @controlplane/cli@${{ inputs.cpln_cli_version }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version-pinned but not hash-verified. A malicious actor who gains write access to the @controlplane/cli npm package (or re-tags a release) could substitute a compromised binary. Same applies to gem install cpflow below.

For production-adjacent pipelines, consider adding a post-install checksum check, or at minimum document the expected SHA256 for each pinned version in a comment so drift is detectable during future version bumps.

run: |
# Best-effort rollback: try every workload, aggregate failures, exit non-zero at the end
# if any failed. A single cpln hiccup shouldn't leave other workloads mid-promotion.
set -uo pipefail
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intentional omission of -e here (using set -uo pipefail instead of set -euo pipefail) is the correct design — it lets rollback continue across all workloads even if one cpln call fails, with rollback_failures tracking the aggregate. Worth a brief comment explaining the omission so a future reader doesn't add -e thinking it was forgotten:

Suggested change
set -uo pipefail
# Intentionally no -e: we want every workload to attempt rollback even if earlier ones fail.
set -uo pipefail

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1e46bfe with a comment explaining that set -e is intentionally omitted so rollback can attempt every workload and aggregate failures.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Token not persisted for subsequent workflow steps
    • The composite action now writes CPLN_TOKEN to GITHUB_ENV so subsequent cpflow and cpln workflow steps inherit authentication without passing the token on the command line.

Create PR

Or push these changes by commenting:

@cursor push f1eee564c7
Preview (f1eee564c7)
diff --git a/.github/actions/cpflow-setup-environment/action.yml b/.github/actions/cpflow-setup-environment/action.yml
--- a/.github/actions/cpflow-setup-environment/action.yml
+++ b/.github/actions/cpflow-setup-environment/action.yml
@@ -63,6 +63,12 @@
           exit 1
         fi
 
+        {
+          echo "CPLN_TOKEN<<__CPLN_TOKEN__"
+          echo "$CPLN_TOKEN"
+          echo "__CPLN_TOKEN__"
+        } >> "$GITHUB_ENV"
+
         create_output=""
         if ! create_output="$(cpln profile create default --org "$ORG" 2>&1)"; then
           if ! echo "$create_output" | grep -qi "already exists"; then

You can send follow-ups to the cloud agent here.

Comment thread .github/actions/cpflow-setup-environment/action.yml
@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Code Review — PR #732: Generated cpflow GitHub Actions Flow

Overview

This is a well-structured replacement of hand-written CI/CD workflows with the generated cpflow-* composite actions and workflows. The new flows are meaningfully safer and more correct than the old ones across several dimensions. The PR is in draft but the code quality is high enough to call out specifics.


Significant Improvements ✅

  • Secret handling: Old code passed CPLN_TOKEN via --token (visible in /proc/<pid>/cmdline and ps output). New code passes it exclusively via environment — good security hygiene across all three cpln/cpflow call sites.
  • Prefix guard on deletion: delete-app.sh now validates that the app name starts with $REVIEW_APP_PREFIX- before deleting, which is a much stronger guard than the old "reject names containing 'production' or 'staging'" heuristic. The old approach would have blocked deletion of any app with those words in the name (too broad) while leaving other non-review apps unprotected (too narrow).
  • Fork PR protection: The deploy workflow explicitly detects fork PRs and blocks them from accessing repository secrets — the old workflow had no such guard.
  • Author association gating: Slash commands now check author_association is OWNER, MEMBER, or COLLABORATOR. The old delete workflow had no collaborator check at all.
  • Non-cancelling concurrency (cancel-in-progress: false): Prevents partial deployments and half-deleted apps — this was cancel-in-progress: true in the old staging workflow.
  • Multi-workload rollback: The promote workflow rolls back all workloads independently on failure, aggregating errors rather than stopping at the first failure.
  • Env-var parity check: Pre-promotion check that production has all staging env var names prevents silent nil variable errors after promotion.
  • Health checks with configurable accepted statuses: The default 200 301 302 is well-reasoned for auth-gated Rails apps; HEALTH_CHECK_ACCEPTED_STATUSES makes it overridable without editing the file.
  • Randomized heredoc delimiter in capture-current: EOF_$(openssl rand -hex 8) correctly prevents heredoc injection if rollback state ever contained a literal EOF line.

Issues

Bug: deploy job may run even when build fails in cpflow-deploy-staging.yml

In GitHub Actions, an explicit if: on a job replaces the implicit success() guard. The deploy job condition is:

if: needs.validate-branch.outputs.is_deployable == 'true'

Because this doesn't check needs.build.result, the deploy job will run even if the build job fails (as long as the branch is deployable). This burns runner minutes and produces a confusing deploy attempt against a missing image.

Fix: add && needs.build.result == 'success' to the condition.

Risk: String-matching on cpflow exists error output is fragile

delete-app.sh (and the inline version in cpflow-deploy-review-app.yml) detect "real" errors by checking for specific strings like "Double check your org", "ERROR", "Traceback", "Net::". The file already has a # TODO comment acknowledging this, but the consequence of missing a new error pattern is silently treating a real failure as "app not found" — which could skip deletion without any alert.

This is acceptable given the stated cpflow limitation, but the heuristic string list should be kept in sync with cpflow's actual output. An alternative interim approach: treat any non-zero exit with non-empty stderr as a failure unless it matches a known "not found" pattern, rather than enumerating known error patterns.

Minor: Workflow-level APP_NAME env var is redundant and overridden in cpflow-deploy-review-app.yml

The top-level env: block in the deploy review app workflow sets APP_NAME from the raw event payload, but the Resolve PR ref and commit step later overwrites it in $GITHUB_ENV with the normalized PR number. The step-level value wins, making the workflow-level declaration dead configuration. It's also the only workflow in this set that has this pattern — the others don't duplicate APP_NAME this way.


Nitpicks

  • cpflow-setup-environment/action.yml: gem install cpflow -v ... without --no-document is slower than needed on CI runners.
  • cpflow-cleanup-stale-review-apps.yml: The Validate required secrets and variables step runs as a plain shell: bash step but has no id: — this is fine, but if you ever want to condition a downstream step on it, you'll need to add one.
  • The old nightly-remove-stale-review-apps.yml hardcoded the app name (qa-react-webpack-rails-tutorial) rather than using ${{ vars.REVIEW_APP_PREFIX }}. The new workflow correctly uses the variable — worth calling out as a subtle bug fix.

Summary

The core logic is solid and the security posture is materially improved. The one real bug (deploy running after build failure) should be fixed before merge. The string-matching fragility is an acknowledged limitation with a TODO already in place.

fi
deploy_args+=(--org "${CPLN_ORG}" --verbose)

cpflow deploy-image "${deploy_args[@]}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: deploy runs even when build fails.

When a job has an explicit if:, GitHub Actions drops the implicit success() guard. Because this condition only checks validate-branch's output, the deploy job will start even if the build job failed, wasting runner minutes and attempting to deploy a missing image.

Suggested change
cpflow deploy-image "${deploy_args[@]}"
if: needs.validate-branch.outputs.is_deployable == 'true' && needs.build.result == 'success'

echo "🗑️ Deleting application: $APP_NAME"
cpflow delete -a "$APP_NAME" --org "$CPLN_ORG" --yes

echo "✅ Successfully deleted application: $APP_NAME"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string-matching heuristic on cpflow exists error output (lines 43–54) is the main reliability risk in this file. The consequence of an unmatched error pattern is silently exiting 0 ("app not found") when the API actually returned an error, meaning the delete is skipped without any visible failure.

An alternative that inverts the logic: treat any non-zero exit with non-empty output as a failure unless it matches a known "not found" pattern:

# instead of matching known error strings, match known "not found" patterns
case "$exists_output" in
  *"not found"*|*"does not exist"*)
    echo "⚠️ Application does not exist: $APP_NAME"
    exit 0
    ;;
  *)
    echo "❌ ERROR: failed to determine whether application exists" >&2
    printf '%s\n' "$exists_output" >&2
    exit 1
    ;;
esac

This is safer by default — unknown output is treated as an error rather than a no-op. The tradeoff is that you'd need to know the exact "not found" wording from cpflow exists, which has the same coupling problem in the other direction. The TODO comment already captures this; just flagging it for awareness.

echo "ready=false" >> "$GITHUB_OUTPUT"
{
echo "Control Plane review app automation is not configured yet."
echo
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This top-level APP_NAME is computed from the raw event payload and is later overwritten in $GITHUB_ENV by the Resolve PR ref and commit step (echo "APP_NAME=..." >> "$GITHUB_ENV"). The step-level write wins, so this declaration is dead configuration and can be confusing. Consider removing it and relying solely on the step-computed value, which is already the authoritative one (it handles all three event types correctly).

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Code Review: cpflow Generated GitHub Actions Flow

This is a meaningful security and reliability upgrade over the handwritten workflows. Several issues from the old setup are fixed here; I've also flagged a few new items worth addressing.


✅ Notable Improvements

  • Token passed via env, not --token flag — prevents leakage in /proc/<pid>/cmdline and ps output (old setup-environment used cpln profile update default --org "$ORG" --token "$TOKEN")
  • Author association gating on /deploy-review-app and /delete-review-app — the old delete workflow accepted the command from any commenter
  • Fork PR protection in the deploy workflow — old workflow ran on all pull_request events without a fork check, allowing fork PRs to attempt using repo secrets
  • Prefix guard in delete-app.sh — prevents accidental deletion of non-review-app resources
  • SSH key isolated to a dedicated step — keeps it out of the main build step's env where ACTIONS_STEP_DEBUG=true would expose it
  • Randomized heredoc delimiters in the promotion workflow — prevents content injection if rollback state contains EOF
  • Old staging workflow fixed — the old deploy-to-control-plane-staging.yml triggered on branches: ['*'] (every push), burning runners on every feature branch; new workflow correctly filters to main/master
  • cancel-in-progress: false on all deployment concurrency groups — prevents partial rollout state from mid-cancelled deploys

🔴 Issues

1. cpflow-delete-review-app.yml uses pull_request_target without an explanation comment

The delete workflow triggers on pull_request_target: [closed]. This event runs with write permissions in the base repo context and can post PR comments on fork PRs. It's the correct choice here (since no PR code is checked out), but it's a well-known footgun — future maintainers may switch it to pull_request thinking it's "safer" and break fork PR cleanup. cpflow-review-app-help.yml has a good comment explaining this; the delete workflow should have one too.

2. cpflow exists error-pattern matching is duplicated

The same fragile string-matching logic ("Double check your org", "ERROR", "Traceback", "Net::", etc.) appears both in delete-app.sh and inline in the Check if review app exists step of cpflow-deploy-review-app.yml. The delete-app.sh has a TODO explaining the fragility; the workflow copy does not. If cpflow changes its error output, both need updating in sync. Consider either extracting this into a shared shell script, or at minimum adding the same TODO comment in the workflow.


🟡 Warnings

3. CPLN_TOKEN is written to GITHUB_ENV and persists to all subsequent steps

In cpflow-setup-environment/action.yml (lines 67–72), the token is written to GITHUB_ENV so later workflow steps can authenticate. This is necessary for cpflow to work, but it means the token is present in the environment of every subsequent step — including any step that might env | sort or log environment state. Consider documenting this tradeoff in a comment.

4. Production health check accepts 301/302 by default

HEALTH_CHECK_ACCEPTED_STATUSES defaults to '200 301 302'. A redirect to a login page or a broken URL returns a redirect, so the health check passes even if the app is misconfigured post-deploy. The docs note the recommended mitigation (set a dedicated /health endpoint and override to "200"). Consider making the default stricter, or adding a warning to the step summary when a redirect is accepted.

5. Nightly cleanup has no failure notification

cpflow-cleanup-stale-review-apps.yml runs at midnight UTC. If it fails (bad token, cpflow error, etc.), there is no alert — stale apps silently accumulate cost. Consider adding a Slack/email notification on failure, or at minimum ensure the repo has GitHub workflow failure notifications enabled for scheduled runs.


ℹ️ Informational

6. /help command has no author_association check

cpflow-help-command.yml responds to /help from any commenter (no association guard). This is probably intentional — help is informational — but on a high-volume repo the bot will reply to every /help comment from any user. Noting for awareness.

7. ruby/setup-ruby@v1 and actions/github-script@v7 use floating major versions

These will pick up any non-breaking updates automatically, which is generally fine. However, actions/checkout@v4 is also a floating major — all three could be pinned to a specific SHA for a more locked-down supply chain posture if that's a concern.

name: Delete Review App

on:
pull_request_target:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull_request_target deserves an explanation comment here.

pull_request_target runs in the context of the base repo (write permissions, access to secrets) and is required for fork PR cleanup — pull_request can't post comments on fork PRs because the token is read-only. This is safe here because no PR code is ever checked out (the actions/checkout@v4 below fetches base-branch code only).

cpflow-review-app-help.yml has a good comment explaining this exact pattern. Without a parallel comment here, a future maintainer may "fix" this to pull_request thinking it's safer, silently breaking fork PR cleanup.

Suggested change
pull_request_target:
# pull_request_target is intentional: it has write permission to comment on PRs from
# forks, where `pull_request` would be read-only. This is safe because no PR code is
# checked out — the job only calls cpflow with base-repo credentials.
# Do not switch this to `pull_request` or add a checkout of the PR ref without re-evaluating.
pull_request_target:

Comment on lines +66 to +72
# Make the token available to later workflow steps without putting it on argv.
token_delimiter="CPLN_TOKEN_$(openssl rand -hex 8)"
{
echo "CPLN_TOKEN<<${token_delimiter}"
printf '%s\n' "$CPLN_TOKEN"
echo "${token_delimiter}"
} >> "$GITHUB_ENV"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing CPLN_TOKEN to GITHUB_ENV propagates it to every subsequent step in the calling workflow — not just the steps that explicitly use it. This is a necessary tradeoff (cpflow picks it up from the environment automatically), but it's worth documenting so future maintainers don't inadvertently add steps that log environment state.

Suggested change
# Make the token available to later workflow steps without putting it on argv.
token_delimiter="CPLN_TOKEN_$(openssl rand -hex 8)"
{
echo "CPLN_TOKEN<<${token_delimiter}"
printf '%s\n' "$CPLN_TOKEN"
echo "${token_delimiter}"
} >> "$GITHUB_ENV"
# Persist CPLN_TOKEN to GITHUB_ENV so all subsequent steps can authenticate
# with cpflow/cpln without putting the token on the command line.
# Trade-off: the token is present in every subsequent step's environment.
# Do not add steps that dump env vars (e.g., `env | sort`) after this point.
token_delimiter="CPLN_TOKEN_$(openssl rand -hex 8)"
{
echo "CPLN_TOKEN<<${token_delimiter}"
printf '%s\n' "$CPLN_TOKEN"
echo "${token_delimiter}"
} >> "$GITHUB_ENV"

Comment on lines +28 to +35
# TODO: replace this string-matching with a structured signal once `cpflow exists` exposes one
# (e.g. a distinct exit code for "not found" vs. API/auth errors, or `cpflow exists --json`).
# Until then, keep this list in sync if `cpflow exists` starts emitting new error patterns —
# any unmatched error string would otherwise be silently treated as "app not found".
exists_output=""
if ! exists_output="$(cpflow exists -a "$APP_NAME" --org "$CPLN_ORG" 2>&1)"; then
case "$exists_output" in
*"Double check your org"*|*"Unknown API token format"*|*"ERROR"*|*"Error:"*|*"Traceback"*|*"Net::"*)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same cpflow exists error-pattern matching appears verbatim in the Check if review app exists step in cpflow-deploy-review-app.yml (line ~195). The TODO comment here is accurate but only covers this file — the workflow copy is uncovered.

If cpflow adds a new error pattern (e.g. "rate limit" or "timeout"), both locations need updating in sync. Consider extracting this into a shared script (e.g. .github/scripts/cpflow-app-exists.sh) that both the action and the workflow call, so the pattern list is maintained in one place.

# repo variable to tighten this for apps that expose a dedicated health endpoint
# (e.g. "200" for a plain /health, or "200 401 403" for apps that auth-gate / without
# redirecting).
HEALTH_CHECK_ACCEPTED_STATUSES: ${{ vars.HEALTH_CHECK_ACCEPTED_STATUSES || '200 301 302' }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepting 301/302 as "healthy" means a redirect to a login page, maintenance page, or even a misconfigured CDN passes the health check. The comment correctly explains why (curl without -L, Rails auth-gating /), but a redirect can pass even when the destination is broken.

Consider adding a step summary warning when a non-200 status is accepted, so operators reviewing the promotion log have visibility:

if [[ "${http_status}" == "${accepted}" ]]; then
  if [[ "${http_status}" != "200" ]]; then
    echo "::warning::Health check passed with HTTP ${http_status} (not 200) — verify the redirect destination is healthy."
  fi
  echo "healthy=true" >> "$GITHUB_OUTPUT"
  exit 0
fi

Also worth documenting in .controlplane/shakacode-team.md that setting HEALTH_CHECK_ACCEPTED_STATUSES=200 and adding a /health endpoint is the recommended production posture.

on:
workflow_dispatch:
schedule:
- cron: "0 0 * * *"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scheduled workflows that fail silently are easy to miss — GitHub does send an email for failed scheduled runs but only to the repo owner by default, and only after 7 days of consecutive failures (as of current GitHub behavior).

Consider adding a failure notification step, or at minimum documenting that the team should verify GitHub's "Notify me of failed workflow runs" notification setting is enabled. Stale review apps that aren't cleaned up accrue Control Plane costs.

Suggested change
- cron: "0 0 * * *"
- cron: "0 3 * * *" # 3 AM UTC — avoids exact midnight when many repos run scheduled jobs simultaneously

(The cron timing is a minor suggestion — staggering off midnight reduces GitHub's scheduler contention, making the run more likely to start on time.)

@claude
Copy link
Copy Markdown

claude Bot commented May 8, 2026

This is a well-structured migration from hand-written workflows to generated cpflow-* GitHub Actions. The security thinking is thorough — fork gating, double-checkout pattern, persist-credentials: false, secrets via env vars, and a prefix-safety check on deletion are all done correctly.

Security

  1. Expression interpolation in github-script JavaScript: Several actions/github-script blocks interpolate ${{ steps.workload.outputs.workload_url }} directly into a JS string literal. The URL originates from an external API (cpln workload get ... | jq -r .status.endpoint) and could contain characters that break the JS string (e.g. double quotes or backticks). The safer pattern is to export the value to GITHUB_ENV first (already done for WORKFLOW_URL and CONSOLE_URL via core.exportVariable) and read it as process.env.APP_URL in JavaScript. See inline comment.

  2. Bootstrap step pins an RC version: gem install cpflow -v "5.0.0.rc.0" is only a fallback when generated action files are absent, but using a pre-release gem in a deployment pipeline adds risk (possible breaking changes, lower stability). The bootstrap step also patches the generated action with ruby -0pi inline; if the upstream output format changes, the substitution silently becomes a no-op. Recommend either committing the generated actions to the base branch so the bootstrap never runs, or pinning a stable GA release here.

Code Quality

  1. Missing persist-credentials: false in validate-branch checkout (cpflow-deploy-staging.yml line 58): Every other actions/checkout in this PR uses persist-credentials: false. The validate-branch job is the lone exception. While the job only reads action files, the inconsistency could confuse future editors and leaves the GITHUB_TOKEN in .git/config unnecessarily.

  2. Greedy strip in staging image extraction (cpflow-promote-staging-to-production.yml line 263): ${staging_image_ref##*/image/} uses the greedy ## operator. For a ref like /org/myorg/image/myapp it gives myapp (correct), but for /org/myorg/image/myapp/image/v2 it gives v2 instead of myapp/image/v2. Using the non-greedy form ${staging_image_ref#*/image/} strips only the first /image/ segment, which is the intended behavior.

  3. Health check accepts 301/302 by default for production promotions: The comments explain this tradeoff and recommend overriding with HEALTH_CHECK_ACCEPTED_STATUSES=200. Worth adding a ::notice:: annotation during the promote workflow when redirect-permissive defaults are in use, so operators are reminded at promote time.

Minor

  • cpflow-review-app-help.yml posts a comment on every newly opened PR (including bots/Dependabot) once REVIEW_APP_PREFIX is set. Consider gating on github.event.pull_request.user.type != Bot to reduce noise.
  • cpflow-help-command.yml workflow_dispatch path can post the help comment on any PR number including closed ones. Low impact but could be a UX surprise.

Overall: The PR is a significant improvement — better fork safety, structured rollback, env-var parity checks before production promotion, and prefix-guarded deletion are all good additions. The issues above are mostly low-to-medium severity, but the persist-credentials inconsistency and RC version bootstrap are easy wins to clean up before merge.

script: |
const commentId = Number("${{ steps.create-comment.outputs.comment-id }}");
const deploymentId = "${{ steps.init-deployment.outputs.result }}";
const appUrl = "${{ steps.workload.outputs.workload_url }}";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workload_url is sourced from cpln workload get ... | jq -r '.status.endpoint // empty' — an external API value. If the endpoint string contains " (or a backtick in template-literal context), it will break this JS string literal and could be exploited if the Control Plane API response is ever tampered with.

All other dynamic values here (WORKFLOW_URL, CONSOLE_URL) are already routed through core.exportVariable and read as process.env.*. Apply the same pattern:

Suggested change
const appUrl = "${{ steps.workload.outputs.workload_url }}";
const appUrl = process.env.APP_URL || "";

And add the export alongside the existing core.exportVariable calls in the "Set deployment links" step above:

core.exportVariable("APP_URL", workload_url);  // in the "Retrieve app URL" shell step, or pass via a prior github-script step

Comment on lines +71 to +78
if: ${{ hashFiles('.github/actions/cpflow-validate-config/action.yml') == '' }}
shell: bash
run: |
set -euo pipefail
gem install cpflow -v "5.0.0.rc.0" --no-document
ruby -S cpflow generate-github-actions --staging-branch master
# shellcheck disable=SC2016
ruby -0pi -e '$_.gsub!(/so callers can pass `\$\{\{ vars\.CPLN_CLI_VERSION \}\}` unconditionally\./, "so callers can pass the repository variable value unconditionally."); $_.gsub!(/so callers can pass `\$\{\{ vars\.CPFLOW_VERSION \}\}` unconditionally\./, "so callers can pass the repository variable value unconditionally.")' .github/actions/cpflow-setup-environment/action.yml
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two concerns with this bootstrap block:

  1. RC version in production CI: 5.0.0.rc.0 is a pre-release. If the actions are already committed to the base branch (the normal case), this step is skipped — but for a fresh repo setup or if the actions are accidentally deleted, this RC gem installs and generates new action files mid-run. Using a GA release (or no version pin with gem install cpflow) would be safer.

  2. Inline patching of generated files: The ruby -0pi one-liner patches cpflow-setup-environment/action.yml after generation. If the upstream template text changes in a future cpflow release, this substitution silently becomes a no-op, and the literal ${{ vars.CPLN_CLI_VERSION }} Markdown in the description won't be fixed. Consider either:

    • Committing the generated actions to the base branch so this bootstrap never runs in practice, or
    • Adding an explicit check after the substitution (grep -q "repository variable value" ... || exit 1) to fail loudly if the patch didn't apply.

Comment on lines +56 to +58
- name: Checkout repository
if: steps.check-branch.outputs.is_deployable == 'true'
uses: actions/checkout@v4
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This checkout is missing persist-credentials: false, which is present on every other actions/checkout call in this PR. This leaves the GITHUB_TOKEN credential helper stored in .git/config for the duration of the job, even though nothing in this job pushes to git.

Suggested change
- name: Checkout repository
if: steps.check-branch.outputs.is_deployable == 'true'
uses: actions/checkout@v4
- name: Checkout repository
if: steps.check-branch.outputs.is_deployable == 'true'
uses: actions/checkout@v4
with:
persist-credentials: false

fi

if [[ "${staging_image_ref}" == /org/*/image/* ]]; then
staging_image="${staging_image_ref##*/image/}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ## (greedy) operator strips the longest prefix matching */image/. For the expected Control Plane path format /org/myorg/image/myapp this is fine, but an image ref that contains a second /image/ segment (e.g. /org/myorg/image/myapp/image/v2) would be stripped to v2 instead of myapp/image/v2.

Using the non-greedy # strips only the first /image/ occurrence, which is more predictable:

Suggested change
staging_image="${staging_image_ref##*/image/}"
staging_image="${staging_image_ref#*/image/}"

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Health check variables ignore documented repository variable overrides
    • Updated the workflow env defaults to read the matching repository variables for health check and rollback readiness settings.

Create PR

Or push these changes by commenting:

@cursor push ed231dd54e
Preview (ed231dd54e)
diff --git a/.github/workflows/cpflow-promote-staging-to-production.yml b/.github/workflows/cpflow-promote-staging-to-production.yml
--- a/.github/workflows/cpflow-promote-staging-to-production.yml
+++ b/.github/workflows/cpflow-promote-staging-to-production.yml
@@ -16,8 +16,8 @@
   # Worst-case wall time per attempt is HEALTH_CHECK_INTERVAL plus the curl --max-time below
   # (10s), so the defaults give a ~10 minute window (24 × (15 + 10) = 600s) — enough for
   # most Rails cold boots (asset precompile + db:migrate + workload readiness).
-  HEALTH_CHECK_RETRIES: 24
-  HEALTH_CHECK_INTERVAL: 15
+  HEALTH_CHECK_RETRIES: ${{ vars.HEALTH_CHECK_RETRIES || '24' }}
+  HEALTH_CHECK_INTERVAL: ${{ vars.HEALTH_CHECK_INTERVAL || '15' }}
   # Space-separated list of HTTP statuses considered healthy. The default accepts 301/302
   # because `curl` is invoked without `-L`, so a root `/` that redirects to a login page
   # (common for Rails apps that auth-gate `/`) would otherwise be reported as unhealthy
@@ -31,8 +31,8 @@
   # expose a dedicated health endpoint (e.g. "200" for a plain /health, or "200 401 403"
   # for apps that auth-gate / without redirecting).
   HEALTH_CHECK_ACCEPTED_STATUSES: ${{ vars.HEALTH_CHECK_ACCEPTED_STATUSES || '200 301 302' }}
-  ROLLBACK_READINESS_RETRIES: 24
-  ROLLBACK_READINESS_INTERVAL: 15
+  ROLLBACK_READINESS_RETRIES: ${{ vars.ROLLBACK_READINESS_RETRIES || '24' }}
+  ROLLBACK_READINESS_INTERVAL: ${{ vars.ROLLBACK_READINESS_INTERVAL || '15' }}
   PRIMARY_WORKLOAD: ${{ vars.PRIMARY_WORKLOAD }}
 
 concurrency:

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit ff6a820. Configure here.

# for apps that auth-gate / without redirecting).
HEALTH_CHECK_ACCEPTED_STATUSES: ${{ vars.HEALTH_CHECK_ACCEPTED_STATUSES || '200 301 302' }}
ROLLBACK_READINESS_RETRIES: 24
ROLLBACK_READINESS_INTERVAL: 15
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Health check variables ignore documented repository variable overrides

Medium Severity

HEALTH_CHECK_RETRIES, HEALTH_CHECK_INTERVAL, ROLLBACK_READINESS_RETRIES, and ROLLBACK_READINESS_INTERVAL are hardcoded in the workflow env block, so setting them as repository variables has no effect. This contradicts the workflow's own comment ("Override these by … setting the matching repository variable") and the documentation in cpflow-help.md and shakacode-team.md, which list all four as optional repository variables. HEALTH_CHECK_ACCEPTED_STATUSES on the same block correctly uses ${{ vars.HEALTH_CHECK_ACCEPTED_STATUSES || '200 301 302' }}; these four need the same pattern.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ff6a820. Configure here.

@claude
Copy link
Copy Markdown

claude Bot commented May 8, 2026

Code Review: Replace hand-written GitHub Actions with generated cpflow-* flow

Overview

This PR replaces five hand-written Control Plane GitHub Actions workflows/actions with generated cpflow-* equivalents. The generated code is substantially better than what it replaces in security, reliability, and correctness.


Security Improvements (significant wins)

  • Token leak fix: Old setup-environment passed --token "$TOKEN" as a CLI argument, exposing it in /proc/<pid>/cmdline and ps output. New code uses the CPLN_TOKEN env var instead.
  • eval elimination: Old workflows used eval(process.env.GET_CONSOLE_LINK) in github-script steps — a serious code-injection vector if any env var was influenced by user input. Removed entirely.
  • Exact-match slash commands: Old code used contains(github.event.comment.body, '/deploy-review-app') — a comment like please /deploy-review-app now would have triggered a deployment. New code uses strict equality.
  • Prefix-guarded app deletion: Old delete-app.sh blocked names containing production or staging. New code checks against a strict REVIEW_APP_PREFIX- prefix, which is both safer and more explicit.
  • Bash nameref guard: cpflow-validate-config validates indirect variable names against ^[A-Z_][A-Z0-9_]*$ before ${!name} expansion, preventing BASH_FUNC_foo%%-style injection.
  • SSH key isolation: SSH key is written in a dedicated step so DOCKER_BUILD_SSH_KEY never appears in the build step environment (relevant when ACTIONS_STEP_DEBUG=true).
  • persist-credentials: false: Now consistently set on checkout steps to keep GITHUB_TOKEN out of .git/config.
  • Fork PR handling: Properly detects fork PRs and skips secret-dependent steps rather than silently passing secrets to untrusted code.

Issues and Concerns

1. Release candidate in production default (medium risk)

In .github/actions/cpflow-setup-environment/action.yml:

default_cpflow_version="5.0.0.rc.0"

A release candidate ships as the default for a production-critical deployment tool. If 5.0.0.rc.0 contains a bug, all repos inheriting the default are affected. Consider pinning the stable release before merge.

2. Fragile bootstrap-and-patch step (maintainability concern)

In cpflow-deploy-review-app.yml, there is a bootstrap step that reinstalls cpflow and regenerates actions if cpflow-validate-config/action.yml is missing — followed by an inline ruby -0pi -e patch that regex-substitutes specific strings in the generated output. This is brittle: if the upstream template changes the text being patched, the substitution silently no-ops and the generated file drifts. The bootstrap step is also unnecessary for a repo that has already committed the generated files. Consider removing the bootstrap and replacing it with a CI check that detects generated-file drift instead.

3. validate-branch checkout missing persist-credentials: false

In cpflow-deploy-staging.yml, the validate-branch job checks out the repo without persist-credentials: false, while the subsequent build and deploy jobs both set it. Minor inconsistency worth fixing for auditability.

4. Health check defaults accept redirects (documented footgun)

The default HEALTH_CHECK_ACCEPTED_STATUSES: '200 301 302' will pass a maintenance-mode redirect or an auth-gate redirect as healthy. The code contains a good inline comment recommending a dedicated /health endpoint. Consider surfacing this recommendation in cpflow-help.md so teams see it before an incident.

5. Rollback does not reverse DB migrations

Documented correctly in shakacode-team.md. Worth adding: a warning in the promotion workflow job summary when --run-release-phase was used and rollback was triggered, so operators know to check DB state manually.

6. Rollback readiness polling inconsistency

"Wait for deployment health" uses the cpflow-wait-for-health composite action (HTTP endpoint poll). "Wait for rollback readiness" rolls its own loop polling workload.status.ready. These check different things — a workload can report ready: true while still returning HTTP errors. Consider reusing cpflow-wait-for-health for rollback readiness too.

7. CONTRIBUTOR not in author_association allowlist

Both deploy and delete workflows gate slash commands on ["OWNER","MEMBER","COLLABORATOR"]. CONTRIBUTOR (GitHub's association for someone whose PR was previously merged) is excluded. This may be intentional, but worth confirming if past contributors should be able to trigger review apps.


Minor observations

  • Two overlapping help workflows: cpflow-help-command.yml (responds to /help comment) and cpflow-review-app-help.yml (auto-comments on PR open) both serve useful purposes but the auto-comment may be noisy on high-volume repos.
  • cpflow-review-app-help.yml missing contents: read: The workflow omits contents: read from permissions:. Explicit is better for auditability.
  • Nightly cleanup correctness: cpflow cleanup-stale-apps -a "${REVIEW_APP_PREFIX}" — verify that cpflow interprets this as a prefix match and not an exact app name; otherwise no stale apps will ever be deleted.

Removed issues worth noting

The old code had several bugs now fixed:

  • cpflow exists -a ${{ env.APP_NAME }} without quoting — word-split risk on unusual app names
  • sudo npm install -g — fails on self-hosted runners without sudo; new code uses a user-local npm prefix
  • ${{ github.event_name }} interpolated directly into a case statement inside run: — context injection vector
  • eval(process.env.GET_CONSOLE_LINK) in github-script — dangerous arbitrary code execution from an env var
  • Workflow URL construction via listJobsForWorkflowRun could return the wrong job under concurrency

Summary

This is a well-executed, security-positive replacement. The main concerns before moving out of draft are the 5.0.0.rc.0 release candidate default and the fragile bootstrap-and-patch step. The other items are improvements worth addressing but not blockers.

# Override per-repo by setting `CPLN_CLI_VERSION` / `CPFLOW_VERSION` repo variables;
# an empty input falls back to the action's pinned default below.
default_cpln_cli_version="3.3.1"
default_cpflow_version="5.0.0.rc.0"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pins a release candidate as the default for a production-critical deployment tool. If 5.0.0.rc.0 contains a bug, every repo that inherits this default is affected. Please update to the stable release before merge (or document a concrete plan to do so before this workflow runs on master).

Comment on lines +70 to +78
- name: Bootstrap generated cpflow actions
if: ${{ hashFiles('.github/actions/cpflow-validate-config/action.yml') == '' }}
shell: bash
run: |
set -euo pipefail
gem install cpflow -v "5.0.0.rc.0" --no-document
ruby -S cpflow generate-github-actions --staging-branch master
# shellcheck disable=SC2016
ruby -0pi -e '$_.gsub!(/so callers can pass `\$\{\{ vars\.CPLN_CLI_VERSION \}\}` unconditionally\./, "so callers can pass the repository variable value unconditionally."); $_.gsub!(/so callers can pass `\$\{\{ vars\.CPFLOW_VERSION \}\}` unconditionally\./, "so callers can pass the repository variable value unconditionally.")' .github/actions/cpflow-setup-environment/action.yml
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bootstrap block has two concerns:

  1. It should never run for this repo: cpflow-validate-config/action.yml is committed here, so hashFiles(...) will always return a non-empty string. The block is dead code for this repo and only matters for repos that haven't yet committed the generated files.

  2. The ruby -0pi -e patch is very fragile: It regex-substitutes specific hardcoded strings in the generated cpflow-setup-environment/action.yml. If the upstream template changes that wording, the substitution silently no-ops and the file drifts undetected.

Recommend either removing the bootstrap entirely (since the generated files are committed) or replacing it with a CI drift-detection step that fails loudly if the committed generated files differ from what cpflow generate-github-actions would produce today.

Comment on lines +56 to +58
- name: Checkout repository
if: steps.check-branch.outputs.is_deployable == 'true'
uses: actions/checkout@v4
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build and deploy jobs both set persist-credentials: false on their checkout steps, but this validate-branch checkout does not. For consistency and to keep GITHUB_TOKEN out of .git/config, add persist-credentials: false here too.

Suggested change
- name: Checkout repository
if: steps.check-branch.outputs.is_deployable == 'true'
uses: actions/checkout@v4
- name: Checkout repository
if: steps.check-branch.outputs.is_deployable == 'true'
uses: actions/checkout@v4
with:
persist-credentials: false

(github.event_name == 'issue_comment' &&
github.event.issue.pull_request &&
github.event.comment.body == '/deploy-review-app' &&
contains(fromJson('["OWNER","MEMBER","COLLABORATOR"]'), github.event.comment.author_association))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allowlist covers OWNER, MEMBER, and COLLABORATOR, but not CONTRIBUTOR. GitHub assigns CONTRIBUTOR to anyone whose PR has previously been merged into the repo. Depending on the project's intent, CONTRIBUTOR may be a reasonable addition so past contributors can deploy review apps without needing explicit collaborator access. Worth confirming whether this is intentional.

@claude
Copy link
Copy Markdown

claude Bot commented May 9, 2026

Code Review

Overview

This PR replaces hand-written Control Plane GitHub Actions with a generated cpflow-* action/workflow set. The new code is a significant improvement across security, correctness, and operational capability. Worth merging once the concerns below are addressed.


Security: Notable Improvements

  • Token leakage fixed: Old setup-environment passed the Control Plane token via CLI flag --token, visible in /proc/<pid>/cmdline and ps. New code routes it through CPLN_TOKEN env var.
  • Slash-command injection closed: Old workflow used contains() to match the comment body, which triggers on "please /deploy-review-app now". New code uses exact equality (==).
  • Author association checks added: Slash commands now gate on OWNER, MEMBER, or COLLABORATOR association; the old delete workflow had no such check.
  • pull_request_target used correctly for fork PR cleanup: Delete workflow checks out only base-branch trusted code — no PR code executed. Well-documented in step comments.
  • SSH key isolation: cpflow-build-docker-image handles the key in a separate step so it never lands in the build step's environment, preventing ACTIONS_STEP_DEBUG exposure.
  • Prefix-based deletion guard: New delete-app.sh validates the REVIEW_APP_PREFIX- prefix before deleting. Old script used grep -iqE '(production|staging)' which is bypassable with creative naming.

Issues to Address

1. Release candidate pinned in production CI (blocking)

default_cpflow_version="5.0.0.rc.0" is set in both cpflow-setup-environment/action.yml and the bootstrap step of cpflow-deploy-review-app.yml. An RC should not be the default for production workflows. Pin a stable release before merging, or add a clear comment explaining why the RC is acceptable in this context.

2. Bootstrap regex is tightly coupled to upstream wording (low risk, but fragile)

The bootstrap step in cpflow-deploy-review-app.yml (lines 78-84) patches the generated cpflow-setup-environment/action.yml in-place using a hard-coded regex to strip GitHub expression syntax from input descriptions. If cpflow changes that exact wording, the regex silently no-ops; the guard check below it catches the problem — but only at runtime as a workflow failure. Since this bootstrap only fires when the action file does not exist (new repo initial setup), it does not affect this repo's ongoing CI. A comment linking to the upstream issue or explaining the stability expectation would help future maintainers.

3. CONTRIBUTOR not included in slash-command allow-list (document if intentional)

The /deploy-review-app and /delete-review-app guards check OWNER, MEMBER, COLLABORATOR but not CONTRIBUTOR (users who have merged a commit but are not formally members). External contributors who have shipped code cannot self-serve review apps. If intentional, note it in cpflow-help.md.


Minor Observations

  • Old staging trigger branches: '*' fired on every push to every branch. New code is correctly scoped to master. Good fix.
  • Rollback captures all containers correctly: capture-current stores .spec.containers | map({name, image}) in rollback_state. The separate .spec.containers[0].image extraction is only used for the promotion summary display, not for rollback logic. This is fine.
  • Randomized heredoc delimiter EOF_$(openssl rand -hex 8) in capture-current prevents a JSON payload from accidentally terminating the output block early.
  • cancel-in-progress: false across all deployment concurrency groups is correct — letting in-flight deploys finish avoids partial-rollout state.
  • persist-credentials: false consistently applied across all checkouts that do not push. Good hygiene.
  • The two-checkout pattern in cpflow-deploy-review-app.yml (base branch for trusted action code, then PR commit under ./app) is the correct approach for issue_comment-triggered deploys that need secrets without running untrusted PR code.

# Override per-repo by setting `CPLN_CLI_VERSION` / `CPFLOW_VERSION` repo variables;
# an empty input falls back to the action's pinned default below.
default_cpln_cli_version="3.3.1"
default_cpflow_version="5.0.0.rc.0"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: 5.0.0.rc.0 is a release candidate. This file is the default for all workflows in this repo, including production promotion. Pin a stable version before merging (e.g., 5.0.0 once released), or add an explicit comment confirming this RC is intentionally used in production and why it is acceptable.

Suggested change
default_cpflow_version="5.0.0.rc.0"
default_cpflow_version="5.0.0"

gem install cpflow -v "5.0.0.rc.0" --no-document
ruby -S cpflow generate-github-actions --staging-branch master
# shellcheck disable=SC2016
ruby -0pi -e '$_.gsub!(/so callers can pass `\$\{\{ vars\.CPLN_CLI_VERSION \}\}` unconditionally\./, "so callers can pass the repository variable value unconditionally."); $_.gsub!(/so callers can pass `\$\{\{ vars\.CPFLOW_VERSION \}\}` unconditionally\./, "so callers can pass the repository variable value unconditionally.")' .github/actions/cpflow-setup-environment/action.yml
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex is tightly coupled to the exact wording that cpflow generate-github-actions emits in the input description of cpflow-setup-environment/action.yml. If cpflow changes that sentence in a future release, the gsub! call silently no-ops and the grep guard below fails the bootstrap at runtime.

Two suggestions:

  1. Add a comment here referencing the exact cpflow version/commit whose output this regex is written against, so future maintainers know when to update it.
  2. Since this bootstrap only fires when .github/actions/cpflow-validate-config/action.yml doesn't exist (i.e., initial repo setup), consider whether patching the generated file in-place is worth the coupling — alternatively, generate a small wrapper action that avoids the need for post-generation text surgery.

(github.event_name == 'issue_comment' &&
github.event.issue.pull_request &&
github.event.comment.body == '/deploy-review-app' &&
contains(fromJson('["OWNER","MEMBER","COLLABORATOR"]'), github.event.comment.author_association))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allow-list covers OWNER, MEMBER, and COLLABORATOR but not CONTRIBUTOR (GitHub's association for users who have had a commit merged but are not formal org members). This means external contributors who have shipped code to the repo cannot self-serve review apps via slash commands — they must ask a member to trigger it for them.

If this is intentional (reasonable from a security standpoint — CONTRIBUTOR association does not imply ongoing trust), add a comment here and a note in cpflow-help.md so contributors know to ask a maintainer instead of assuming the command is broken.

shell: bash
run: |
set -euo pipefail
gem install cpflow -v "5.0.0.rc.0" --no-document
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same RC version concern as in cpflow-setup-environment/action.yml — this bootstrap step pins 5.0.0.rc.0 independently. If the setup action's default is updated to a stable release, update this line to match so both paths use the same version.

@justin808 justin808 marked this pull request as ready for review May 9, 2026 08:25
@justin808 justin808 merged commit b75a6ac into master May 9, 2026
16 checks passed
@justin808 justin808 deleted the codex/cpflow-github-actions-flow branch May 9, 2026 08:25
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 9, 2026

Greptile Summary

This PR replaces the hand-written Control Plane GitHub Actions workflows with a generated cpflow-* action/workflow set, and updates the accompanying Control Plane docs to reference the new names. The generated set covers opt-in review apps, branch-gated staging deploys, manual staging→production promotion with automatic rollback, nightly stale-app cleanup, and slash-command help.

  • Security model: cpflow-deploy-review-app.yml separates trusted workflow sources (base branch checkout) from PR application code (separate ./app checkout), routing nearly all ${{ }} expressions through env:—with one inconsistency in the Finalize deployment status step where workload_url and deployment_id are still inlined into JS strings.
  • Promotion workflow: cpflow-promote-staging-to-production.yml derives workload names from the production app config and reuses them to query staging—if staging has fewer workloads than production, the staging image capture step will fail with a cryptic cpln error before promotion starts.
  • Staging trigger: cpflow-deploy-staging.yml's on.push.branches filter is hardcoded to [\"master\"] and must be kept manually in sync with the STAGING_APP_BRANCH repo variable; a mismatch causes silent deploy gaps.

Confidence Score: 3/5

Safe to merge for single-workload apps; multi-workload production apps where staging omits background job workers will find promotion blocked at the staging image capture step.

The production promotion workflow derives the workload name list from the production app config and iterates those same names against the staging GVC. For repos where staging intentionally runs fewer workloads, cpln workload get exits non-zero and set -euo pipefail aborts the promotion with a cryptic CLI error—no indication it is a staging/production topology mismatch. Everything else—trust boundaries on review apps, SSH key scoping, prefix-guarded deletes, rollback state capture, and env-routing for secrets—is well-constructed.

cpflow-promote-staging-to-production.yml (workload-name cross-use between staging and production) and cpflow-deploy-review-app.yml (bootstrap step version pin + direct JS interpolation).

Important Files Changed

Filename Overview
.github/workflows/cpflow-promote-staging-to-production.yml New promotion workflow with rollback; derives WORKLOAD_NAMES from the production app config and uses those same names to look up workloads in staging—fails if staging has fewer workloads than production.
.github/workflows/cpflow-deploy-review-app.yml New 476-line review-app deploy workflow; has trust-boundary controls, but directly interpolates Control Plane step outputs into inline JavaScript rather than routing through env.
.github/workflows/cpflow-deploy-staging.yml New 3-job staging deploy workflow; push trigger hardcoded to branches: ["master"] while runtime STAGING_APP_BRANCH guard reads from a repository variable—silent on mismatch.
.github/actions/cpflow-build-docker-image/action.yml New composite action for Docker builds; SSH key scoped to a dedicated step and cleanup trap removes key files on exit.
.github/actions/cpflow-validate-config/action.yml Validates required secrets/vars via indirect bash nameref with a regex guard against unsafe variable names; PR-friendly mode degrades gracefully.
.github/actions/cpflow-delete-control-plane-app/delete-app.sh Prefix-guards delete to prevent accidental deletion of non-review apps; correctly handles exit code 3 as a no-op.
.github/workflows/cpflow-delete-review-app.yml Uses pull_request_target for fork PR secrets access; checks out base branch code and documents the trust boundary decision.
.github/actions/cpflow-wait-for-health/action.yml Polls workload endpoint via cpln JSON + curl; configurable accepted statuses with sensible defaults for auth-redirect apps.
.github/workflows/cpflow-cleanup-stale-review-apps.yml Nightly cleanup workflow; cancel-in-progress: false to avoid partial deletes, validates required config before running.

Sequence Diagram

sequenceDiagram
    participant Dev as Developer
    participant GH as GitHub Actions
    participant CP_S as Control Plane Staging
    participant CP_P as Control Plane Production

    Dev->>GH: Push to master
    GH->>GH: cpflow-deploy-staging validate-branch
    GH->>CP_S: build-image + deploy-image

    Dev->>GH: /deploy-review-app comment
    GH->>GH: cpflow-deploy-review-app resolve PR ref
    GH->>CP_S: setup-app + build-image + deploy-image
    GH-->>Dev: PR comment with review app URL

    Dev->>GH: PR closed or /delete-review-app
    GH->>CP_S: cpflow delete review app

    GH->>GH: nightly cleanup-stale-review-apps
    GH->>CP_S: cpflow cleanup-stale-apps

    Dev->>GH: "workflow_dispatch confirm_promotion=promote"
    GH->>CP_S: cpln workload get capture staging image
    GH->>CP_P: copy-image-from-upstream
    GH->>CP_P: deploy-image
    GH->>CP_P: wait-for-health poll
    alt health check passes
        GH-->>Dev: Promotion success + GitHub release
    else health check fails
        GH->>CP_P: cpln workload update rollback
        GH-->>Dev: Promotion failed
    end
Loading

Reviews (1): Last reviewed commit: "Address cpflow workflow review fixes" | Re-trigger Greptile

Comment on lines +222 to +271
- name: Capture deployed staging image
id: staging-image
env:
CPLN_TOKEN_STAGING: ${{ secrets.CPLN_TOKEN_STAGING }}
STAGING_APP_NAME: ${{ vars.STAGING_APP_NAME }}
CPLN_ORG_STAGING: ${{ vars.CPLN_ORG_STAGING }}
WORKLOAD_NAMES: ${{ steps.workloads.outputs.names }}
shell: bash
run: |
set -euo pipefail

selected_workload="${PRIMARY_WORKLOAD:-}"
selected_image=""
first_image=""

while IFS= read -r workload_name; do
[[ -n "${workload_name}" ]] || continue

workload_json="$(CPLN_TOKEN="${CPLN_TOKEN_STAGING}" cpln workload get "${workload_name}" --gvc "${STAGING_APP_NAME}" --org "${CPLN_ORG_STAGING}" -o json)"
workload_image="$(echo "${workload_json}" | jq -r '.spec.containers[0].image // empty')"

if [[ -z "${workload_image}" ]]; then
echo "::error::Could not find an image on staging workload '${workload_name}'." >&2
exit 1
fi

if [[ -z "${first_image}" ]]; then
first_image="${workload_image}"
fi

if [[ -n "${selected_workload}" && "${workload_name}" == "${selected_workload}" ]]; then
selected_image="${workload_image}"
fi
done < <(tr ',' '\n' <<< "${WORKLOAD_NAMES}")

staging_image_ref="${selected_image:-${first_image}}"
if [[ -z "${staging_image_ref}" ]]; then
echo "::error::Could not determine the deployed staging image." >&2
exit 1
fi

if [[ "${staging_image_ref}" == /org/*/image/* ]]; then
staging_image="${staging_image_ref##*/image/}"
elif [[ "${staging_image_ref}" == *.registry.cpln.io/* ]]; then
staging_image="${staging_image_ref#*.registry.cpln.io/}"
else
staging_image="${staging_image_ref}"
fi

echo "image=${staging_image}" >> "$GITHUB_OUTPUT"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Staging image lookup uses production workload names

WORKLOAD_NAMES is resolved from the production app's app_workloads config (the Resolve production app workloads step), but those same names are then used to query workloads in the staging GVC. If production has a workload (e.g. sidekiq, worker) that was never stood up in staging, cpln workload get "${workload_name}" --gvc "${STAGING_APP_NAME}" exits non-zero, set -euo pipefail kills the script, and the promotion is blocked even though the staging rails app is healthy and ready to promote. The failure message comes from cpln itself with no indication that the problem is a staging/production workload-set mismatch. Consider resolving workload names separately for staging vs. production, or fall back to PRIMARY_WORKLOAD alone when app_workloads lists workloads that don't exist in staging.

Comment on lines +427 to +476
- name: Finalize deployment status
if: always() && steps.config.outputs.ready == 'true' && steps.source.outputs.allowed == 'true' && (steps.check-app.outputs.exists == 'true' || steps.setup-review-app.outcome == 'success')
uses: actions/github-script@v7
with:
script: |
const commentId = Number("${{ steps.create-comment.outputs.comment-id }}");
const deploymentId = "${{ steps.init-deployment.outputs.result }}";
const appUrl = "${{ steps.workload.outputs.workload_url }}";
const success = "${{ job.status }}" === "success";

if (deploymentId) {
await github.rest.repos.createDeploymentStatus({
owner: context.repo.owner,
repo: context.repo.repo,
deployment_id: Number(deploymentId),
state: success ? "success" : "failure",
environment: `review/${process.env.APP_NAME}`,
environment_url: success && appUrl ? appUrl : undefined,
log_url: process.env.WORKFLOW_URL,
description: success ? "Review app ready" : "Review app deployment failed"
});
}

const successBody = [
"## Review app ready",
"",
appUrl ? `[Open review app](${appUrl})` : "Review app deployed, but no endpoint URL was detected.",
"",
`[Open Control Plane console](${process.env.CONSOLE_URL})`,
`[View workflow logs](${process.env.WORKFLOW_URL})`
].join("\n");

const failureBody = [
`❌ Review app deployment failed for PR #${process.env.PR_NUMBER}`,
"",
`[Open Control Plane console](${process.env.CONSOLE_URL})`,
`[View workflow logs](${process.env.WORKFLOW_URL})`
].join("\n");

if (!Number.isFinite(commentId) || commentId <= 0) {
core.warning("Skipping PR comment update because no comment id was created.");
return;
}

await github.rest.issues.updateComment({
owner: context.repo.owner,
repo: context.repo.repo,
comment_id: commentId,
body: success ? successBody : failureBody
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Step outputs interpolated directly into JS strings

Lines 433–434 inline ${{ steps.init-deployment.outputs.result }} and ${{ steps.workload.outputs.workload_url }} directly into the JavaScript string literal rather than routing them through env: as is done consistently for all other values in this file. The workload_url comes from cpln workload get … -o json | jq -r '.status.endpoint' — not user-controlled — but the pattern is still inconsistent and would become a real injection risk if the data source ever changes.

Comment on lines +5 to +11
on:
push:
# GitHub does not allow repository vars in branch filters. Default to the common
# deploy branches unless `cpflow generate-github-actions --staging-branch BRANCH`
# was used. If STAGING_APP_BRANCH is later changed in repository variables, keep
# this list in sync so pushes to that branch actually trigger the workflow.
branches: ["master"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Push trigger branch is hardcoded; STAGING_APP_BRANCH change requires manual file edit

The on.push.branches list is ["master"] while the runtime guard reads vars.STAGING_APP_BRANCH. When a team changes the variable to a different branch, pushes to that branch will never trigger this workflow and the misconfiguration is silent. Consider adding a prominent comment at the top of this file (in addition to the inline note) that the branches: filter must stay in sync with the variable value.

Comment on lines +64 to +82
- name: Set up Ruby for cpflow bootstrap
if: ${{ hashFiles('.github/actions/cpflow-validate-config/action.yml') == '' }}
uses: ruby/setup-ruby@v1
with:
ruby-version: "3.4"

- name: Bootstrap generated cpflow actions
if: ${{ hashFiles('.github/actions/cpflow-validate-config/action.yml') == '' }}
shell: bash
run: |
set -euo pipefail
gem install cpflow -v "5.0.0.rc.0" --no-document
ruby -S cpflow generate-github-actions --staging-branch master
# shellcheck disable=SC2016
ruby -0pi -e '$_.gsub!(/so callers can pass `\$\{\{ vars\.CPLN_CLI_VERSION \}\}` unconditionally\./, "so callers can pass the repository variable value unconditionally."); $_.gsub!(/so callers can pass `\$\{\{ vars\.CPFLOW_VERSION \}\}` unconditionally\./, "so callers can pass the repository variable value unconditionally.")' .github/actions/cpflow-setup-environment/action.yml
if grep -n '\$''{{ vars\.\(CPLN_CLI_VERSION\|CPFLOW_VERSION\) }}' .github/actions/cpflow-setup-environment/action.yml; then
echo "::error::Bootstrapped cpflow setup action still contains GitHub metadata expressions in input descriptions."
exit 1
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Bootstrap step hardcodes cpflow version independently of cpflow-setup-environment

The bootstrap path hard-codes cpflow -v "5.0.0.rc.0". When cpflow graduates from RC, repos that hit the bootstrap path will regenerate actions from the older RC gem rather than the latest release—and since the bootstrap only fires when the actions directory is missing, it would silently succeed with stale generated code. This version needs the same update discipline as the default in cpflow-setup-environment/action.yml.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

✅ Review app for PR #732 is deleted

Open organization console
View workflow logs

@claude
Copy link
Copy Markdown

claude Bot commented May 9, 2026

Code Review

This PR replaces hand-written Control Plane GitHub Actions with the generated cpflow-* workflows. The new code is more structured, more secure, and better documented.

Security Improvements:
Secrets via env not CLI flags; SSH key in isolated step with trap cleanup; pull_request_target trust model applied correctly (no PR code checked out); fork deploy guard via same_repo check; indirect bash variable lookup guard in cpflow-validate-config; prefix-based delete safety more robust than old production/staging name regex.

Issue 1: Pre-release cpflow version in production deployments
Both cpflow-setup-environment and the bootstrap step hard-code 5.0.0.rc.0. A release candidate across review-app, staging, and production-promotion workflows means any RC bug could block or mis-deploy every environment. Consider a stable release, or document why the RC is intentional.

Issue 2: Health-check redirect acceptance
cpflow-wait-for-health defaults to accepting 301/302 as healthy. A maintenance-mode or auth-gate redirect would silently pass the post-promotion health check. Consider adding a /health endpoint returning 200 and setting HEALTH_CHECK_ACCEPTED_STATUSES=200.

Issue 3: Env-parity check only inspects GVC-level variable names
The 'Verify production environment variables' step only compares .spec.env[].name on the GVC. It misses workload-level env vars, secret references, and value differences. Documented in readme.md, but a misconfigured workload-level var could go undetected.

Issue 4: Bootstrap step version independent of CPFLOW_VERSION variable
cpflow-deploy-review-app.yml bootstrap hard-codes 5.0.0.rc.0 separately from the CPFLOW_VERSION repo variable. If the variable is updated, bootstrap still installs the hard-coded version.

Minor: Staging 3-job chain adds runner startup overhead. Concurrency cancel-in-progress false is correct. Rollback is image-only (DB migrations not reversed). Stale cleanup cancel-in-progress false is correct.

Summary: Excellent improvement. Actionable: (a) confirm 5.0.0.rc.0 intent or move to stable, (b) add /health endpoint and tighten HEALTH_CHECK_ACCEPTED_STATUSES.

# Override per-repo by setting `CPLN_CLI_VERSION` / `CPFLOW_VERSION` repo variables;
# an empty input falls back to the action's pinned default below.
default_cpln_cli_version="3.3.1"
default_cpflow_version="5.0.0.rc.0"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pins a pre-release (rc.0) as the default for all workflows that use this action — review apps, staging, and production promotion. A bug in the RC could block or mis-deploy every environment simultaneously. When a stable 5.0.0 is cut, bump this default; until then consider adding a comment like # intentional RC: stable not yet released so future readers know this is a deliberate choice and not an oversight.

shell: bash
run: |
set -euo pipefail
gem install cpflow -v "5.0.0.rc.0" --no-document
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bootstrap gem install hard-codes 5.0.0.rc.0 independently of the CPFLOW_VERSION repository variable used in cpflow-setup-environment. If the repo variable is later updated to a different version, this path will still install the hard-coded RC. The bootstrap only runs when the action files are absent, so the risk is low in practice — but consider deriving the version from the variable for consistency, e.g., by reading vars.CPFLOW_VERSION as a workflow input here.

`200 301 302` accepts redirects because curl is invoked without `-L`, so a
root path that auth-redirects looks like a redirect, not a failure.
required: false
default: "200 301 302"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 200 301 302 default is correctly explained in the description, but accepting redirects means a maintenance-mode page or an auth-gate that redirects to /login would pass the health check even when the underlying app is broken or unavailable. This is explicitly called out in the promote workflow's comments as a trade-off. As a follow-up, consider adding a /health endpoint to this Rails app (e.g., via a simple route returning 200 OK) and setting HEALTH_CHECK_ACCEPTED_STATUSES=200 in repository variables to make the production promotion gate unambiguous.

# deploy branches unless `cpflow generate-github-actions --staging-branch BRANCH`
# was used. If STAGING_APP_BRANCH is later changed in repository variables, keep
# this list in sync so pushes to that branch actually trigger the workflow.
branches: ["master"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub does not allow repository variables in push trigger branch filters, so this filter is correctly hard-coded to master. The STAGING_APP_BRANCH variable is then re-checked at runtime in validate-branch. This means if the staging branch is ever changed, two places need updating: this filter and the STAGING_APP_BRANCH repository variable. The existing comment in the workflow already calls this out, but it is worth keeping in mind as a maintenance footgun.

Comment on lines +137 to +138
staging_vars="$(CPLN_TOKEN="${CPLN_TOKEN_STAGING}" cpln gvc get "${STAGING_APP_NAME}" --org "${CPLN_ORG_STAGING}" -o json | jq -r '.spec.env // [] | .[].name' | sort)"
production_vars="$(CPLN_TOKEN="${CPLN_TOKEN_PRODUCTION}" cpln gvc get "${PRODUCTION_APP_NAME}" --org "${CPLN_ORG_PRODUCTION}" -o json | jq -r '.spec.env // [] | .[].name' | sort)"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parity check compares .spec.env[].name on the GVC objects, which covers only GVC-level environment variable names. It will not detect:

  • Workload-level env vars (set per-container, not on the GVC)
  • Control Plane secret references (cpln://secret/...) vs. plain values
  • Variables that exist in both environments but carry different values

This is documented in readme.md, but teams should be aware that a workload configured with a missing or differently-named env var in production could pass this check and only fail at runtime post-promotion. A workload-level parity check (iterating cpln workload get per workload) would close the gap but is more complex to implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant