Skip to content

Restore fork infra dropped by the #2391 main pickup (terraform, cockpit, sops, e2e workflow)#72

Open
ConstanzeTU wants to merge 3 commits into
mainfrom
restore-fork-infra-main
Open

Restore fork infra dropped by the #2391 main pickup (terraform, cockpit, sops, e2e workflow)#72
ConstanzeTU wants to merge 3 commits into
mainfrom
restore-fork-infra-main

Conversation

@ConstanzeTU

@ConstanzeTU ConstanzeTU commented Jun 28, 2026

Copy link
Copy Markdown

What

Restores fork-specific infrastructure that was deleted from main and should never have left it, at its original paths, byte-identical to the pre-deletion tree (610975b27).

Area Files Vs. original main
terraform/** (auth0 + cloud_deps + cockpit credentials + scripts) 16 byte-identical
private/cockpit/** (fork cloud kustomize bundle) 10 byte-identical
.sops.yaml 1 byte-identical
.github/workflows/e2e_log4shell_soc.yaml 1 byte-identical
.github/workflows/filename_linter.yaml 1 +1 line carve-out (see below)

Why

main tip 53d09cc8d is the fork landing of upstream pixie-io#2391 "Create vizier and cloud cert-manager compatible secrets". Upstream pixie-io#2391 only touches certificate handling, but as it landed on this fork it also deleted all of the above fork-only files — an unrelated regression. They were caught while preparing PR 68, re-added there to keep that branch deployable, then moved here (and stripped from PR 68) for a dedicated restore into main.

The one non-restore change

53d09cc8d also brought upstream's filename_linter.yaml onto main, which fails any PR touching a path containing private. To keep private/cockpit/** at its original path while staying green, a single negative glob is added:

- '!private/cockpit/**'

This preserves the "no private/* leaks" guard for everything else. It is the only deviation in this PR from a pure byte-for-byte restore.

Not included

skaffold/skaffold_cloud.yaml — an upstream file with fork edits, not a clean fork-only add; left for a separate decision.

pixie-agent added 2 commits June 28, 2026 08:54
The fork's terraform/ tree (auth0 + cloud_deps + cockpit credentials +
scripts, 16 files) was deleted on main by commit 53d09cc, the local
landing of upstream pixie-io#2391 'Create vizier and cloud cert-manager
compatible secrets'. That upstream commit only touches cert handling; the
terraform deletion was an unrelated regression bundled into the fork
pickup. Content here is byte-identical to the pre-deletion tree at
610975b (parent of main tip). This belongs on main, not in PR 68.
Same regression as the terraform tree: commit 53d09cc (the fork landing
of upstream pixie-io#2391) deleted these fork-only files from main, and they were
re-added on PR 68's branch where they don't belong. Moved here for the
main-restore PR.

- cockpit/**: restored at the post-rename path (private/cockpit/ ->
  cockpit/) so it passes the upstream filename-linter on main; the
  workflow file is unchanged from main.
- .sops.yaml, .github/workflows/e2e_log4shell_soc.yaml: byte-identical
  to PR 68's copies.
@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 27f8ccc7-8f1b-4941-bae7-1c05d1b11530

📥 Commits

Reviewing files that changed from the base of the PR and between 6737704 and 1ca1624.

📒 Files selected for processing (11)
  • .github/workflows/filename_linter.yaml
  • private/cockpit/artifact_config.yaml
  • private/cockpit/auth_deployment.yaml
  • private/cockpit/cloud_ingress.yaml
  • private/cockpit/domain_config.yaml
  • private/cockpit/kustomization.yaml
  • private/cockpit/oauth_config.yaml
  • private/cockpit/plugin_db_updater_job.yaml
  • private/cockpit/proxy_envoy.yaml
  • private/cockpit/script_bundles_config.yaml
  • private/cockpit/servers_transport.yaml
💤 Files with no reviewable changes (9)
  • private/cockpit/domain_config.yaml
  • private/cockpit/oauth_config.yaml
  • private/cockpit/auth_deployment.yaml
  • private/cockpit/artifact_config.yaml
  • private/cockpit/servers_transport.yaml
  • private/cockpit/script_bundles_config.yaml
  • private/cockpit/proxy_envoy.yaml
  • private/cockpit/plugin_db_updater_job.yaml
  • private/cockpit/cloud_ingress.yaml

📝 Walkthrough

Walkthrough

Adds a Terraform-managed Auth0 tenant and cloud_deps module (certs, secrets, kustomize batches), SOPS-encrypted credential storage via Azure Key Vault, Kubernetes cockpit manifests (ConfigMaps, Ingress, Envoy, auth-server, plugin-db-updater Job, Kustomization), and a GitHub Actions workflow for end-to-end log4shell SOC testing with pprof profiling.

Changes

Cockpit Cloud Environment Setup

Layer / File(s) Summary
SOPS config and encrypted Auth0 secret
.sops.yaml, terraform/credentials/cockpit/auth0_config.yaml
Adds SOPS creation_rules mapping terraform/credentials/cockpit to an Azure Key Vault key, and adds the SOPS-encrypted Kubernetes Secret holding Auth0 client-id and client-secret.
Auth0 Terraform provider, imports, and generated resources
terraform/kubernetes/auth0/auth0_main.tf, terraform/kubernetes/auth0/variables.tf, terraform/kubernetes/auth0/terraform.tfvars, terraform/kubernetes/auth0/.terraform.lock.hcl, terraform/kubernetes/auth0/auth0_import.tf, terraform/kubernetes/auth0/generated.tf
Adds provider config (debug mode, version constraint), cloud_domain variable/tfvars, lock file, auto-generated import blocks for existing Auth0 entities, and the full generated.tf covering connections, tenant, client, resource server scopes (~1000 scopes), branding, and attack protection.
cloud_deps Terraform: secrets, certs, and kustomize batches
terraform/kubernetes/cloud_deps/provider.tf, terraform/kubernetes/cloud_deps/.terraform.lock.hcl, terraform/kubernetes/cloud_deps/variables.tf, terraform/kubernetes/scripts/create_random_bytes.sh, terraform/kubernetes/cloud_deps/core_resource_deps.tf, terraform/kubernetes/cloud_deps/cert_manager.tf, terraform/kubernetes/cloud_deps/base_cloud_deps.tf, terraform/kubernetes/cloud_deps/env_cloud_deps.tf
Introduces the cloud_deps module: kubeconfig-based provider wiring, input variables, a /dev/urandom-backed random-bytes helper script, core Kubernetes secrets (JWT/session/DB keys and Auth0 credentials from remote state), cert-manager chain (self-signed CA → internal issuer → server/client certs; Let's Encrypt → proxy cert), and ordered three-priority kustomize batch resources for Elastic operator and env dependencies.
Cockpit Kubernetes manifests and Kustomization
private/cockpit/..., .github/workflows/filename_linter.yaml
Adds ConfigMaps for domain, OAuth, artifacts, script bundles, and Envoy proxy config; auth-server Deployment sourcing Auth0 credentials from secrets; cloud Ingress with TLS and external-dns annotations; Traefik ServersTransport; plugin-db-updater Job with PostgreSQL init container and projected TLS volumes; Kustomization overlay composing all resources. Also exempts private/cockpit/** from the filename linter.

E2E log4shell SOC GitHub Actions workflow

Layer / File(s) Summary
E2E workflow: SOC stack deploy, harness execution, profiling, and artifact upload
.github/workflows/e2e_log4shell_soc.yaml
Defines the e2e-log4shell-soc manual-dispatch workflow: installs k3s on an Oracle runner, deploys the SOC stack (Pixie/kubescape/ClickHouse/log4j chain) with optional dx-daemon image override and profiling env vars, waits for pod readiness, runs canonical harness scripts with "rule in" detection gate, port-forwards pprof/metrics during a live log4shell fire to collect CPU/heap profiles, and uploads all evidence as a 14-day artifact.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • k8sstormcenter/pixie#41: Directly overlaps with the cockpit cloud deployment IaC additions, including terraform/kubernetes/cloud_deps/, create_random_bytes.sh, cert-manager resources, and kustomize-based cloud manifests.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: restoring fork infrastructure and related workflow/config files dropped by the #2391 pickup.
Description check ✅ Passed The description is directly aligned with the changeset and explains the restored files and the one intentional linter carve-out.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch restore-fork-infra-main

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/e2e_log4shell_soc.yaml:
- Around line 36-37: The checkout in the workflow currently leaves persisted
credentials available to later steps, which can expose the read token to
external scripts. Update the existing actions/checkout usage in the
e2e_log4shell_soc job to disable persisted credentials, keeping the checkout
step for the harness scripts but ensuring the token is not stored in local git
config. Refer to the checkout step using actions/checkout in this workflow and
apply the secure checkout setting there.
- Around line 14-24: The workflow_dispatch e2e job currently has no concurrency
control, so duplicate runs can collide on shared k3s state and overwrite
/tmp/evidence. Add a concurrency block to this workflow using a stable group key
for the host-scoped job (for example based on github.workflow plus the selected
inputs such as dx_image and soc_ref), and enable cancel-in-progress or otherwise
prevent overlapping runs in the workflow that defines workflow_dispatch,
dx_image, and soc_ref.
- Around line 39-43: The Install k3s step is using the unpinned get.k3s.io
installer, which makes the privileged e2e environment non-reproducible. Update
the k3s install logic in the workflow step named Install k3s to set a validated
INSTALL_K3S_VERSION before invoking the installer, keeping the rest of the
readiness check and kubectl verification unchanged.
- Around line 102-120: The “Profile dx in real life (pprof + metrics)” step is
not failure-safe when prior setup fails, so it can’t write artifacts or clearly
indicate it was skipped. In the workflow job, update this step to create
/tmp/evidence before any writes and to explicitly detect missing prerequisites
like the dx-daemon pod or a failed port-forward. Use the existing shell block in
the profiling step to emit a clear skip message and exit successfully when those
prerequisites are absent, while keeping the profiling and metric collection
logic intact when they are present.
- Around line 45-70: The deploy step is interpolating untrusted manual inputs
directly into shell commands, which can allow command injection. In the SOC
deployment job, stop embedding inputs.soc_ref and inputs.dx_image inside the run
script and instead pass them through validated environment variables or
precomputed safe outputs before use. Add strict allowlist-style validation for
the ref/image values in the step that performs git clone and kubectl set image,
and only execute the commands after the values have been verified safe.

In `@cockpit/auth_deployment.yaml`:
- Around line 14-19: Make the PL_AUTH0_CLIENT_ID secret lookup mandatory as
well, since it comes from the same cloud-auth0-secrets object as
PL_AUTH0_CLIENT_SECRET. Update the secretKeyRef for PL_AUTH0_CLIENT_ID in the
auth deployment manifest to remove the optional lookup, matching the behavior
already used for the client secret so missing or broken secrets fail at
deployment time.

In `@cockpit/domain_config.yaml`:
- Line 7: PL_DOMAIN_NAME is hard-coded to a specific public domain, creating a
second source of truth that can drift from the configured cloud_domain. Update
the domain config so PL_DOMAIN_NAME is derived from the same domain input used
by the cert/Auth0 layer rather than a literal, and keep the value wired through
the existing configuration path that populates this ConfigMap.

In `@cockpit/plugin_db_updater_job.yaml`:
- Around line 31-32: The updater Job still references the mutable
cloud-plugin-load_db:latest image without a kustomize images transform, so fix
the image reference in the plugin_db_updater_job manifest or add an images
rewrite in kustomization to pin it to an immutable tag or digest. Use the
updater container entry in plugin_db_updater_job and the image handling in
kustomization.yaml to ensure the deployed Job always pulls a stable,
reproducible image.

In `@cockpit/proxy_envoy.yaml`:
- Around line 43-45: The CORS origin allowlist in proxy_envoy.yaml is too narrow
and blocks the cockpit domain. Update the Envoy CORS configuration in the
relevant cors/allow_origin_string_match block to allow the actual cloud domain
used by the deployment, matching the cloud_domain default from variables.tf (via
the cockpit proxy config) instead of only test.austrianopencloudcommunity.org.
Ensure the suffix matches the real browser origin expected by the cockpit
frontend.
- Around line 9-96: The ConfigMap entry for envoy.yaml is using a folded scalar,
which will collapse the YAML structure and break Envoy’s config parsing. Change
the envoy.yaml value to a literal block so the full nested structure is
preserved exactly as written. Keep the fix focused on the envoy.yaml key in
proxy_envoy.yaml and ensure the rendered config remains valid YAML for Envoy to
load.

In `@terraform/kubernetes/auth0/auth0_main.tf`:
- Around line 11-13: Disable the Auth0 provider debug setting in the provider
block so committed Terraform config does not emit request/response details;
update the auth0 provider configuration in auth0_main.tf to keep debug off by
default and only enable it temporarily for local troubleshooting. Use the
provider "auth0" block as the target for this change.

In `@terraform/kubernetes/auth0/generated.tf`:
- Around line 467-476: The Auth0 setup is over-privileged because the
interactive login client and the Management API grant share the same client
identity. Update the Terraform model around the app represented by the
regular_web client and the auth0_client_grant resource so the login application
only keeps callback/authorization_code usage, and move the Management API scopes
to a separate least-privileged M2M client. Make sure the grant resource uses the
new machine-to-machine client_id and that the login client no longer has
tenant-admin scopes attached.

In `@terraform/kubernetes/cloud_deps/cert_manager.tf`:
- Around line 1-4: The lookup in locals.cert_config is failing too early with an
invalid-index error, so the intended validation never gets a chance to run.
Change the cert_details access to use a nullable/safe lookup first in
cert_manager.tf, then have the existing precondition validate that returned
value and surface the custom message. Keep the fix centered around
locals.cert_config and the related precondition block so the missing
cert_details entry is handled by validation instead of indexing.

In `@terraform/kubernetes/cloud_deps/provider.tf`:
- Around line 14-22: The kubeconfig settings in the kubernetes and kustomization
provider blocks are hard-coded to a specific home path and context, making this
module environment-specific. Update the provider configuration to take both the
kubeconfig path and context from inputs instead of embedding
"~/.kube/cockpit-config" and "default". Use existing Terraform input variables
or add new ones, and if you keep a home-directory default, resolve it with
pathexpand() in the provider setup.

In `@terraform/kubernetes/scripts/create_random_bytes.sh`:
- Around line 19-36: The random-bytes generation in create_random_bytes.sh can
hang or emit an empty JSON payload when chars_to_keep is empty or when an
upstream command in the /dev/urandom pipeline fails. Update the script to
validate inputs before the pipeline starts, reject empty or invalid values for
chars_to_keep and num_chars in usage(), and make the pipeline fail fast by
ensuring the script propagates failures from bytes generation instead of still
reaching jq. Use the existing usage() and bytes assignment as the main
touchpoints.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 26709ba6-70c5-4b82-98c9-a0300c8fc6be

📥 Commits

Reviewing files that changed from the base of the PR and between 53d09cc and 6737704.

⛔ Files ignored due to path filters (1)
  • terraform/kubernetes/cloud_deps/terraform.tfstate is excluded by !**/*.tfstate
📒 Files selected for processing (27)
  • .github/workflows/e2e_log4shell_soc.yaml
  • .sops.yaml
  • cockpit/artifact_config.yaml
  • cockpit/auth_deployment.yaml
  • cockpit/cloud_ingress.yaml
  • cockpit/domain_config.yaml
  • cockpit/kustomization.yaml
  • cockpit/oauth_config.yaml
  • cockpit/plugin_db_updater_job.yaml
  • cockpit/proxy_envoy.yaml
  • cockpit/script_bundles_config.yaml
  • cockpit/servers_transport.yaml
  • terraform/credentials/cockpit/auth0_config.yaml
  • terraform/kubernetes/auth0/.terraform.lock.hcl
  • terraform/kubernetes/auth0/auth0_import.tf
  • terraform/kubernetes/auth0/auth0_main.tf
  • terraform/kubernetes/auth0/generated.tf
  • terraform/kubernetes/auth0/terraform.tfvars
  • terraform/kubernetes/auth0/variables.tf
  • terraform/kubernetes/cloud_deps/.terraform.lock.hcl
  • terraform/kubernetes/cloud_deps/base_cloud_deps.tf
  • terraform/kubernetes/cloud_deps/cert_manager.tf
  • terraform/kubernetes/cloud_deps/core_resource_deps.tf
  • terraform/kubernetes/cloud_deps/env_cloud_deps.tf
  • terraform/kubernetes/cloud_deps/provider.tf
  • terraform/kubernetes/cloud_deps/variables.tf
  • terraform/kubernetes/scripts/create_random_bytes.sh

Comment on lines +14 to +24
on:
workflow_dispatch:
inputs:
dx_image:
description: dx-daemon image to test (default = .image-tags pin)
required: false
default: ""
soc_ref:
description: k8sstormcenter/soc branch
required: false
default: "218-clickhouse-schema"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Add concurrency for this host-scoped e2e.

This job mutates k3s state and writes fixed /tmp/evidence; duplicate dispatches can collide or overwrite evidence.

Proposed fix
       soc_ref:
         description: k8sstormcenter/soc branch
         required: false
         default: "218-clickhouse-schema"
+concurrency:
+  group: e2e-log4shell-soc
+  cancel-in-progress: false
 permissions:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
on:
workflow_dispatch:
inputs:
dx_image:
description: dx-daemon image to test (default = .image-tags pin)
required: false
default: ""
soc_ref:
description: k8sstormcenter/soc branch
required: false
default: "218-clickhouse-schema"
on:
workflow_dispatch:
inputs:
dx_image:
description: dx-daemon image to test (default = .image-tags pin)
required: false
default: ""
soc_ref:
description: k8sstormcenter/soc branch
required: false
default: "218-clickhouse-schema"
concurrency:
group: e2e-log4shell-soc
cancel-in-progress: false
🧰 Tools
🪛 zizmor (1.26.1)

[warning] 14-24: insufficient job-level concurrency limits (concurrency-limits): workflow is missing concurrency setting

(concurrency-limits)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/e2e_log4shell_soc.yaml around lines 14 - 24, The
workflow_dispatch e2e job currently has no concurrency control, so duplicate
runs can collide on shared k3s state and overwrite /tmp/evidence. Add a
concurrency block to this workflow using a stable group key for the host-scoped
job (for example based on github.workflow plus the selected inputs such as
dx_image and soc_ref), and enable cancel-in-progress or otherwise prevent
overlapping runs in the workflow that defines workflow_dispatch, dx_image, and
soc_ref.

Source: Linters/SAST tools

Comment on lines +36 to +37
- name: Checkout pixie (harness scripts)
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Disable persisted checkout credentials.

The job later runs external repo scripts; leaving the checkout token in local git config unnecessarily exposes the read token to job code.

Proposed fix
     - name: Checkout pixie (harness scripts)
       uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683  # v4.2.2
+      with:
+        persist-credentials: false
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Checkout pixie (harness scripts)
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Checkout pixie (harness scripts)
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
persist-credentials: false
🧰 Tools
🪛 zizmor (1.26.1)

[warning] 36-37: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/e2e_log4shell_soc.yaml around lines 36 - 37, The checkout
in the workflow currently leaves persisted credentials available to later steps,
which can expose the read token to external scripts. Update the existing
actions/checkout usage in the e2e_log4shell_soc job to disable persisted
credentials, keeping the checkout step for the harness scripts but ensuring the
token is not stored in local git config. Refer to the checkout step using
actions/checkout in this workflow and apply the secure checkout setting there.

Source: Linters/SAST tools

Comment on lines +39 to +43
- name: Install k3s
run: |
curl -sfL https://get.k3s.io | sh -s - --write-kubeconfig-mode 644
for i in $(seq 1 60); do kubectl get nodes --no-headers 2>/dev/null | grep -q ' Ready' && break; sleep 5; done
kubectl get nodes

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🔵 Trivial | ⚡ Quick win

Pin the k3s version before running the installer.

get.k3s.io installs a moving target; set a validated INSTALL_K3S_VERSION so this privileged e2e is reproducible.

Proposed fix pattern
     - name: Install k3s
       run: |
+        export INSTALL_K3S_VERSION="${INSTALL_K3S_VERSION:?pin the validated k3s version}"
         curl -sfL https://get.k3s.io | sh -s - --write-kubeconfig-mode 644
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/e2e_log4shell_soc.yaml around lines 39 - 43, The Install
k3s step is using the unpinned get.k3s.io installer, which makes the privileged
e2e environment non-reproducible. Update the k3s install logic in the workflow
step named Install k3s to set a validated INSTALL_K3S_VERSION before invoking
the installer, keeping the rest of the readiness check and kubectl verification
unchanged.

Comment on lines +45 to +70
- name: Deploy the SOC stack (Pixie + kubescape + ClickHouse + AE + dx + log4j chain)
env:
PX_CLOUD_ADDR: pixie.austrianopencloudcommunity.org
PX_DEPLOY_KEY: ${{ secrets.PX_DEPLOY_KEY }}
PX_API_KEY: ${{ secrets.PX_API_KEY }}
TS_AUTHKEY: ${{ secrets.TAILSCALE_AUTH_KEY }}
DX_ENTLEIN_PAT: ${{ secrets.DX_ENTLEIN_PAT }} # private entlein/dx image pull
CLICKHOUSE_ANALYST_PASSWORD: ${{ secrets.CLICKHOUSE_ANALYST_PASSWORD }}
CLICKHOUSE_INGEST_PASSWORD: ${{ secrets.CLICKHOUSE_INGEST_PASSWORD }}
CLICKHOUSE_PIXIE_PASSWORD: ${{ secrets.CLICKHOUSE_PIXIE_PASSWORD }}
run: |
set -euo pipefail
sudo apt-get update -qq && sudo apt-get install -y python3-yaml
git clone --depth 1 -b "${{ inputs.soc_ref }}" https://github.com/k8sstormcenter/soc soc
cd soc
make pixie # vizier + AE
make kubescape || true # node-agent (netStreaming)
bash tree/clickhouse-lab/install.sh # forensic_db
make log4j # vulnerable backend + attacker + dx + SBoBs (managed-by=User)
if [ -n "${{ inputs.dx_image }}" ]; then
kubectl -n honey set image ds/dx-daemon dx-daemon="${{ inputs.dx_image }}" || true
fi
# optimal config + enable pprof for the real-life profile (DX_TELEMETRY_CACHE/DX_BENCH
# are defaults in main, set here too in case the kit's manifest predates them)
kubectl -n honey set env ds/dx-daemon DX_PPROF_ADDR=0.0.0.0:6060 DX_TELEMETRY_CACHE=1 DX_BENCH=pemdirect
kubectl -n honey rollout status ds/dx-daemon --timeout=120s

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win

Move dispatch inputs out of shell templates and validate them.

${{ inputs.soc_ref }} and ${{ inputs.dx_image }} are expanded before shell parsing. A crafted manual input can break out of quotes and run commands with the secrets in this step.

Proposed fix
       env:
+        SOC_REF: ${{ inputs.soc_ref }}
+        DX_IMAGE: ${{ inputs.dx_image }}
         PX_CLOUD_ADDR: pixie.austrianopencloudcommunity.org
         PX_DEPLOY_KEY: ${{ secrets.PX_DEPLOY_KEY }}
@@
         set -euo pipefail
         sudo apt-get update -qq && sudo apt-get install -y python3-yaml
-        git clone --depth 1 -b "${{ inputs.soc_ref }}" https://github.com/k8sstormcenter/soc soc
+        if ! printf '%s' "$SOC_REF" | grep -Eq '^[A-Za-z0-9._/-]+$'; then
+          echo "Invalid soc_ref"; exit 1
+        fi
+        git clone --depth 1 --branch "$SOC_REF" --single-branch https://github.com/k8sstormcenter/soc soc
@@
-        if [ -n "${{ inputs.dx_image }}" ]; then
-          kubectl -n honey set image ds/dx-daemon dx-daemon="${{ inputs.dx_image }}" || true
+        if [ -n "$DX_IMAGE" ]; then
+          if ! printf '%s' "$DX_IMAGE" | grep -Eq '^[A-Za-z0-9._/:@+-]+$'; then
+            echo "Invalid dx_image"; exit 1
+          fi
+          kubectl -n honey set image ds/dx-daemon "dx-daemon=$DX_IMAGE"
         fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Deploy the SOC stack (Pixie + kubescape + ClickHouse + AE + dx + log4j chain)
env:
PX_CLOUD_ADDR: pixie.austrianopencloudcommunity.org
PX_DEPLOY_KEY: ${{ secrets.PX_DEPLOY_KEY }}
PX_API_KEY: ${{ secrets.PX_API_KEY }}
TS_AUTHKEY: ${{ secrets.TAILSCALE_AUTH_KEY }}
DX_ENTLEIN_PAT: ${{ secrets.DX_ENTLEIN_PAT }} # private entlein/dx image pull
CLICKHOUSE_ANALYST_PASSWORD: ${{ secrets.CLICKHOUSE_ANALYST_PASSWORD }}
CLICKHOUSE_INGEST_PASSWORD: ${{ secrets.CLICKHOUSE_INGEST_PASSWORD }}
CLICKHOUSE_PIXIE_PASSWORD: ${{ secrets.CLICKHOUSE_PIXIE_PASSWORD }}
run: |
set -euo pipefail
sudo apt-get update -qq && sudo apt-get install -y python3-yaml
git clone --depth 1 -b "${{ inputs.soc_ref }}" https://github.com/k8sstormcenter/soc soc
cd soc
make pixie # vizier + AE
make kubescape || true # node-agent (netStreaming)
bash tree/clickhouse-lab/install.sh # forensic_db
make log4j # vulnerable backend + attacker + dx + SBoBs (managed-by=User)
if [ -n "${{ inputs.dx_image }}" ]; then
kubectl -n honey set image ds/dx-daemon dx-daemon="${{ inputs.dx_image }}" || true
fi
# optimal config + enable pprof for the real-life profile (DX_TELEMETRY_CACHE/DX_BENCH
# are defaults in main, set here too in case the kit's manifest predates them)
kubectl -n honey set env ds/dx-daemon DX_PPROF_ADDR=0.0.0.0:6060 DX_TELEMETRY_CACHE=1 DX_BENCH=pemdirect
kubectl -n honey rollout status ds/dx-daemon --timeout=120s
- name: Deploy the SOC stack (Pixie + kubescape + ClickHouse + AE + dx + log4j chain)
env:
SOC_REF: ${{ inputs.soc_ref }}
DX_IMAGE: ${{ inputs.dx_image }}
PX_CLOUD_ADDR: pixie.austrianopencloudcommunity.org
PX_DEPLOY_KEY: ${{ secrets.PX_DEPLOY_KEY }}
PX_API_KEY: ${{ secrets.PX_API_KEY }}
TS_AUTHKEY: ${{ secrets.TAILSCALE_AUTH_KEY }}
DX_ENTLEIN_PAT: ${{ secrets.DX_ENTLEIN_PAT }} # private entlein/dx image pull
CLICKHOUSE_ANALYST_PASSWORD: ${{ secrets.CLICKHOUSE_ANALYST_PASSWORD }}
CLICKHOUSE_INGEST_PASSWORD: ${{ secrets.CLICKHOUSE_INGEST_PASSWORD }}
CLICKHOUSE_PIXIE_PASSWORD: ${{ secrets.CLICKHOUSE_PIXIE_PASSWORD }}
run: |
set -euo pipefail
sudo apt-get update -qq && sudo apt-get install -y python3-yaml
if ! printf '%s' "$SOC_REF" | grep -Eq '^[A-Za-z0-9._/-]+$'; then
echo "Invalid soc_ref"; exit 1
fi
git clone --depth 1 --branch "$SOC_REF" --single-branch https://github.com/k8sstormcenter/soc soc
cd soc
make pixie # vizier + AE
make kubescape || true # node-agent (netStreaming)
bash tree/clickhouse-lab/install.sh # forensic_db
make log4j # vulnerable backend + attacker + dx + SBoBs (managed-by=User)
if [ -n "$DX_IMAGE" ]; then
if ! printf '%s' "$DX_IMAGE" | grep -Eq '^[A-Za-z0-9._/:@+-]+$'; then
echo "Invalid dx_image"; exit 1
fi
kubectl -n honey set image ds/dx-daemon "dx-daemon=$DX_IMAGE"
fi
# optimal config + enable pprof for the real-life profile (DX_TELEMETRY_CACHE/DX_BENCH
# are defaults in main, set here too in case the kit's manifest predates them)
kubectl -n honey set env ds/dx-daemon DX_PPROF_ADDR=0.0.0.0:6060 DX_TELEMETRY_CACHE=1 DX_BENCH=pemdirect
kubectl -n honey rollout status ds/dx-daemon --timeout=120s
🧰 Tools
🪛 zizmor (1.26.1)

[error] 58-58: code injection via template expansion (template-injection): may expand into attacker-controllable code

(template-injection)


[error] 64-64: code injection via template expansion (template-injection): may expand into attacker-controllable code

(template-injection)


[error] 65-65: code injection via template expansion (template-injection): may expand into attacker-controllable code

(template-injection)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/e2e_log4shell_soc.yaml around lines 45 - 70, The deploy
step is interpolating untrusted manual inputs directly into shell commands,
which can allow command injection. In the SOC deployment job, stop embedding
inputs.soc_ref and inputs.dx_image inside the run script and instead pass them
through validated environment variables or precomputed safe outputs before use.
Add strict allowlist-style validation for the ref/image values in the step that
performs git clone and kubectl set image, and only execute the commands after
the values have been verified safe.

Source: Linters/SAST tools

Comment on lines +102 to +120
- name: Profile dx in real life (pprof + metrics)
if: always()
run: |
set -uo pipefail
POD=$(kubectl -n honey get pod -l app=dx-daemon -o jsonpath='{.items[0].metadata.name}')
kubectl -n honey port-forward "$POD" 6060:6060 9095:9095 & PF=$!; sleep 5
# 30s CPU profile under a fresh fire + heap, served by DX_PPROF_ADDR=:6060
( bash "$HARNESS/log4shell_fire.sh" >/dev/null 2>&1 || true ) &
curl -s --max-time 40 -o /tmp/evidence/dx_cpu.pprof \
"http://127.0.0.1:6060/debug/pprof/profile?seconds=30" || true
curl -s "http://127.0.0.1:6060/debug/pprof/heap" -o /tmp/evidence/dx_heap.pprof || true
curl -s "http://127.0.0.1:9095/metrics" -o /tmp/evidence/dx_metrics.txt || true
go tool pprof -top -nodecount=25 /tmp/evidence/dx_cpu.pprof > /tmp/evidence/dx_cpu_top.txt 2>&1 || true
kill $PF 2>/dev/null || true
echo "=== dx CPU top ==="; head -30 /tmp/evidence/dx_cpu_top.txt
echo "=== verdict latency ==="
grep -E \
"dx_(time_to_verdict|bench_query_duration)_seconds_(sum|count)" \
/tmp/evidence/dx_metrics.txt || true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Make the always-on profiling step failure-safe.

If earlier steps fail before /tmp/evidence or the dx pod exists, this step silently loses profile artifacts. Create the directory and record an explicit skip.

Proposed fix
       run: |
         set -uo pipefail
-        POD=$(kubectl -n honey get pod -l app=dx-daemon -o jsonpath='{.items[0].metadata.name}')
+        mkdir -p /tmp/evidence
+        POD=$(kubectl -n honey get pod -l app=dx-daemon -o jsonpath='{.items[0].metadata.name}' 2>/dev/null || true)
+        if [ -z "$POD" ]; then
+          echo "dx-daemon pod not found; profiling skipped" | tee /tmp/evidence/profile_skipped.txt
+          exit 0
+        fi
         kubectl -n honey port-forward "$POD" 6060:6060 9095:9095 & PF=$!; sleep 5
+        trap 'kill "$PF" 2>/dev/null || true' EXIT
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Profile dx in real life (pprof + metrics)
if: always()
run: |
set -uo pipefail
POD=$(kubectl -n honey get pod -l app=dx-daemon -o jsonpath='{.items[0].metadata.name}')
kubectl -n honey port-forward "$POD" 6060:6060 9095:9095 & PF=$!; sleep 5
# 30s CPU profile under a fresh fire + heap, served by DX_PPROF_ADDR=:6060
( bash "$HARNESS/log4shell_fire.sh" >/dev/null 2>&1 || true ) &
curl -s --max-time 40 -o /tmp/evidence/dx_cpu.pprof \
"http://127.0.0.1:6060/debug/pprof/profile?seconds=30" || true
curl -s "http://127.0.0.1:6060/debug/pprof/heap" -o /tmp/evidence/dx_heap.pprof || true
curl -s "http://127.0.0.1:9095/metrics" -o /tmp/evidence/dx_metrics.txt || true
go tool pprof -top -nodecount=25 /tmp/evidence/dx_cpu.pprof > /tmp/evidence/dx_cpu_top.txt 2>&1 || true
kill $PF 2>/dev/null || true
echo "=== dx CPU top ==="; head -30 /tmp/evidence/dx_cpu_top.txt
echo "=== verdict latency ==="
grep -E \
"dx_(time_to_verdict|bench_query_duration)_seconds_(sum|count)" \
/tmp/evidence/dx_metrics.txt || true
- name: Profile dx in real life (pprof + metrics)
if: always()
run: |
set -uo pipefail
mkdir -p /tmp/evidence
POD=$(kubectl -n honey get pod -l app=dx-daemon -o jsonpath='{.items[0].metadata.name}' 2>/dev/null || true)
if [ -z "$POD" ]; then
echo "dx-daemon pod not found; profiling skipped" | tee /tmp/evidence/profile_skipped.txt
exit 0
fi
kubectl -n honey port-forward "$POD" 6060:6060 9095:9095 & PF=$!; sleep 5
trap 'kill "$PF" 2>/dev/null || true' EXIT
# 30s CPU profile under a fresh fire + heap, served by DX_PPROF_ADDR=:6060
( bash "$HARNESS/log4shell_fire.sh" >/dev/null 2>&1 || true ) &
curl -s --max-time 40 -o /tmp/evidence/dx_cpu.pprof \
"http://127.0.0.1:6060/debug/pprof/profile?seconds=30" || true
curl -s "http://127.0.0.1:6060/debug/pprof/heap" -o /tmp/evidence/dx_heap.pprof || true
curl -s "http://127.0.0.1:9095/metrics" -o /tmp/evidence/dx_metrics.txt || true
go tool pprof -top -nodecount=25 /tmp/evidence/dx_cpu.pprof > /tmp/evidence/dx_cpu_top.txt 2>&1 || true
kill $PF 2>/dev/null || true
echo "=== dx CPU top ==="; head -30 /tmp/evidence/dx_cpu_top.txt
echo "=== verdict latency ==="
grep -E \
"dx_(time_to_verdict|bench_query_duration)_seconds_(sum|count)" \
/tmp/evidence/dx_metrics.txt || true
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/e2e_log4shell_soc.yaml around lines 102 - 120, The
“Profile dx in real life (pprof + metrics)” step is not failure-safe when prior
setup fails, so it can’t write artifacts or clearly indicate it was skipped. In
the workflow job, update this step to create /tmp/evidence before any writes and
to explicitly detect missing prerequisites like the dx-daemon pod or a failed
port-forward. Use the existing shell block in the profiling step to emit a clear
skip message and exit successfully when those prerequisites are absent, while
keeping the profiling and metric collection logic intact when they are present.

Comment on lines +11 to +13
provider "auth0" {
debug = true
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n## Files mentioning auth0 provider or debug\n'
git ls-files | rg '(^|/)auth0.*\.tf$|terraform/kubernetes/auth0/|provider "auth0"|debug\s*=' -n || true

printf '\n## auth0_main.tf with line numbers\n'
cat -n terraform/kubernetes/auth0/auth0_main.tf

printf '\n## nearby terraform files in auth0 module\n'
find terraform/kubernetes/auth0 -maxdepth 1 -type f | sort | xargs -I{} sh -c 'echo "--- {}"; cat -n "{}" | sed -n "1,220p"'

Repository: k8sstormcenter/pixie

Length of output: 18992


🌐 Web query:

Auth0 Terraform provider debug = true logs sensitive data provider documentation

💡 Result:

In the Auth0 Terraform provider, setting debug = true in the provider configuration enables HTTP request and response logging when the TF_LOG environment variable is also set to DEBUG [1][2][3]. Security implications of enabling this debug mode: 1. Sensitive Data Exposure: Enabling debug logging can cause sensitive information, such as authentication tokens, client secrets, or other PII, to be written to standard output or log files in plain text [1][2][4]. Because these logs capture the full HTTP traffic between the provider and the Auth0 Management API, they may bypass standard Terraform CLI redaction mechanisms [4][5]. 2. Best Practices: You should avoid enabling debug mode in non-development environments [6]. If you must use debug logging, ensure that the output is directed to a secure, restricted-access location and that logs are purged promptly [5]. 3. Alternative Data Protection: For managing sensitive data within Auth0 resources, the provider offers specific mechanisms like write-only attributes (e.g., for passwords or API keys) and PII configuration options for resources like log streams to help handle sensitive data more securely than raw logging [7][8][9]. To debug issues without globally exposing sensitive data via the provider's debug flag, consider using more granular Terraform logging options (e.g., TF_LOG_PROVIDER_AUTH0=DEBUG) or platform-specific monitoring tools that do not persist sensitive API request bodies in cleartext [5]. Always treat Terraform log output as potentially containing sensitive information when debug verbosity is enabled [4].

Citations:


Disable Auth0 provider debug logging. debug = true can emit Auth0 request/response details, including secrets, into Terraform logs when debug logging is enabled. Keep it off in committed config and enable it only in local troubleshooting.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@terraform/kubernetes/auth0/auth0_main.tf` around lines 11 - 13, Disable the
Auth0 provider debug setting in the provider block so committed Terraform config
does not emit request/response details; update the auth0 provider configuration
in auth0_main.tf to keep debug off by default and only enable it temporarily for
local troubleshooting. Use the provider "auth0" block as the target for this
change.

Comment on lines +467 to +476
# __generated__ by Terraform from "cgr_lgvjTwTNm53mojKq"
resource "auth0_client_grant" "akm0sumjg95tcrlp390pxzjozkkyndt_https_k8sstormcenter_eu_auth0_com_api_v2" {
allow_all_scopes = false
allow_any_organization = false
audience = "https://k8sstormcenter.eu.auth0.com/api/v2/"
authorization_details_types = []
client_id = "5akM0SumJG95tcRlP390pXzJoZkkYNDT"
organization_usage = null
scopes = ["read:users", "update:users", "delete:users", "create:users", "read:users_app_metadata", "update:users_app_metadata", "delete:users_app_metadata", "create:users_app_metadata", "read:user_custom_blocks", "create:user_custom_blocks", "delete:user_custom_blocks", "create:user_tickets", "read:logs_users", "read:user_idp_tokens", "create:user_attribute_profiles", "read:user_attribute_profiles", "update:user_attribute_profiles", "delete:user_attribute_profiles", "read:user_effective_permissions", "read:user_effective_roles", "read:user_role_source_groups", "read:user_permission_source_roles"]
subject_type = "client"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

file='terraform/kubernetes/auth0/generated.tf'

echo '=== Around lines 450-490 ==='
sed -n '450,490p' "$file" | cat -n

echo
echo '=== Around lines 1688-1765 ==='
sed -n '1688,1765p' "$file" | cat -n

echo
echo '=== Search for the client id 5akM0SumJG95tcRlP390pXzJoZkkYNDT ==='
rg -n '5akM0SumJG95tcRlP390pXzJoZkkYNDT|app_type =|refresh_token|authorization_code|client_grant' "$file"

Repository: k8sstormcenter/pixie

Length of output: 7746


🏁 Script executed:

#!/bin/bash
set -euo pipefail
sed -n '580,620p' terraform/kubernetes/auth0/generated.tf | cat -n

Repository: k8sstormcenter/pixie

Length of output: 1761


Split the interactive login client from the Management API grant. The same client_id is used for callbacks/authorization_code on the app at 1698-1758 and is also granted Auth0 Management API scopes here, so a leaked login secret would expose tenant-admin access too. Move the Management API scopes to a separate least-privileged M2M client and keep the login app modeled as regular_web.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@terraform/kubernetes/auth0/generated.tf` around lines 467 - 476, The Auth0
setup is over-privileged because the interactive login client and the Management
API grant share the same client identity. Update the Terraform model around the
app represented by the regular_web client and the auth0_client_grant resource so
the login application only keeps callback/authorization_code usage, and move the
Management API scopes to a separate least-privileged M2M client. Make sure the
grant resource uses the new machine-to-machine client_id and that the login
client no longer has tenant-admin scopes attached.

Comment on lines +1 to +4
locals {
cert_subdomain = var.cloud_domain
cert_config = var.cert_details[var.cloud_domain]
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

The custom validation never catches a missing cert_details entry.

local.cert_config = var.cert_details[var.cloud_domain] can throw Terraform's invalid-index error before the precondition runs, so operators won't get the intended message. Make the lookup nullable first, then validate that result.

Suggested fix
 locals {
   cert_subdomain = var.cloud_domain
-  cert_config    = var.cert_details[var.cloud_domain]
+  cert_config    = lookup(var.cert_details, var.cloud_domain, null)
 }
@@
   lifecycle {
     precondition {
-      condition     = contains(keys(var.cert_details), var.cloud_domain)
+      condition     = local.cert_config != null
       error_message = "cloud_domain '${var.cloud_domain}' must have a corresponding entry in cert_details."
     }
   }

Also applies to: 48-53

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@terraform/kubernetes/cloud_deps/cert_manager.tf` around lines 1 - 4, The
lookup in locals.cert_config is failing too early with an invalid-index error,
so the intended validation never gets a chance to run. Change the cert_details
access to use a nullable/safe lookup first in cert_manager.tf, then have the
existing precondition validate that returned value and surface the custom
message. Keep the fix centered around locals.cert_config and the related
precondition block so the missing cert_details entry is handled by validation
instead of indexing.

Comment on lines +14 to +22
provider "kubernetes" {
config_path = "~/.kube/cockpit-config"
config_context = "default"
}

provider "kustomization" {
context = "default"
kubeconfig_path = "~/.kube/cockpit-config"
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '--- file list ---\n'
git ls-files terraform/kubernetes/cloud_deps

printf '\n--- provider.tf ---\n'
cat -n terraform/kubernetes/cloud_deps/provider.tf

printf '\n--- related terraform files ---\n'
fd -a -t f 'variables.tf|*.tf' terraform/kubernetes/cloud_deps 2>/dev/null | sed 's#^\./##' | sort

printf '\n--- search for kubeconfig/context inputs/usages ---\n'
rg -n --hidden --glob '!**/.terraform/**' --glob '!**/.git/**' \
  'kubeconfig_path|kubeconfig_context|config_context|config_path|context\s*=' \
  terraform/kubernetes/cloud_deps

Repository: k8sstormcenter/pixie

Length of output: 249


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf -- '--- file list ---\n'
git ls-files terraform/kubernetes/cloud_deps

printf -- '\n--- provider.tf ---\n'
cat -n terraform/kubernetes/cloud_deps/provider.tf

printf -- '\n--- related terraform files ---\n'
fd -a -t f 'variables.tf|*.tf' terraform/kubernetes/cloud_deps 2>/dev/null | sed 's#^\./##' | sort

printf -- '\n--- search for kubeconfig/context inputs/usages ---\n'
rg -n --hidden --glob '!**/.terraform/**' --glob '!**/.git/**' \
  'kubeconfig_path|kubeconfig_context|config_context|config_path|context\s*=' \
  terraform/kubernetes/cloud_deps

Repository: k8sstormcenter/pixie

Length of output: 1194


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf -- '--- variables.tf ---\n'
cat -n terraform/kubernetes/cloud_deps/variables.tf

printf -- '\n--- other terraform files in module ---\n'
for f in terraform/kubernetes/cloud_deps/*.tf; do
  [ "$(basename "$f")" = "provider.tf" ] && continue
  printf -- '\n### %s ###\n' "$f"
  cat -n "$f"
done

printf -- '\n--- search for cloud_deps references ---\n'
rg -n --hidden --glob '!**/.terraform/**' --glob '!**/.git/**' 'cloud_deps|cockpit-config|kubeconfig|kustomization' .

Repository: k8sstormcenter/pixie

Length of output: 48401


🌐 Web query:

Terraform Kubernetes provider config_path tilde expansion kubeconfig path docs

💡 Result:

The Terraform Kubernetes provider documentation explicitly uses tilde () expansion for the config_path argument in its examples [1][2][3]. While this syntax is commonly used in Terraform provider configuration blocks, it is important to note that the provider expects an absolute path or a path relative to the working directory [4]. If you encounter issues where tilde expansion is not working as expected—particularly in CI/CD environments or non-standard file systems—you can use the built-in Terraform function pathexpand() to resolve the tilde to an absolute path before passing it to the provider [4]: provider "kubernetes" { config_path = pathexpand("/.kube/config") } Key considerations for config_path and config_paths: * The provider allows you to specify the kubeconfig location via the config_path attribute or the KUBE_CONFIG_PATH environment variable [1][3]. * The provider supports multiple paths using the config_paths attribute or the KUBE_CONFIG_PATHS environment variable [1][3]. * Unlike kubectl, the Terraform Kubernetes provider does not automatically read the standard KUBECONFIG environment variable by default [5][6]. You must explicitly configure the path in your Terraform block or provide it via the provider-specific environment variables [1][3].

Citations:


🌐 Web query:

kbst kustomization provider kubeconfig_path context docs tilde expansion

💡 Result:

The kbst/kustomization Terraform provider does not automatically expand the tilde () character in the kubeconfig_path attribute [1][2]. While some Terraform providers or core features may accept tilde-prefixed paths, it is a known behavior in the Terraform ecosystem that many providers do not perform shell-style tilde expansion [3]. To use a path in your home directory with the kubeconfig_path attribute, you should use the Terraform pathexpand function [4][3]. Recommended approach: Use the pathexpand function within a local variable or directly in the provider configuration to ensure the path is correctly resolved to an absolute path before it is passed to the provider [3]. Example usage: provider "kustomization" { kubeconfig_path = pathexpand("/.kube/config") } Alternatively, if you are setting this via an environment variable (e.g., KUBECONFIG_PATH), the shell environment may handle the expansion before Terraform receives the string, depending on your shell configuration and how the variable is exported [5][1]. However, explicitly using pathexpand in your HCL code is the most reliable way to ensure portability across different systems [3].

Citations:


Parameterize the kubeconfig settings
terraform/kubernetes/cloud_deps/provider.tf should not hard-code ~/.kube/cockpit-config and default. ~ is not a reliable path expansion here, and the fixed context makes this module environment-specific. Expose both as inputs, using pathexpand() if you want a home-directory default.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@terraform/kubernetes/cloud_deps/provider.tf` around lines 14 - 22, The
kubeconfig settings in the kubernetes and kustomization provider blocks are
hard-coded to a specific home path and context, making this module
environment-specific. Update the provider configuration to take both the
kubeconfig path and context from inputs instead of embedding
"~/.kube/cockpit-config" and "default". Use existing Terraform input variables
or add new ones, and if you keep a home-directory default, resolve it with
pathexpand() in the provider setup.

Comment on lines +19 to +36
set -e

usage() {
if [ "$#" -ne 2 ]; then
echo "Illegal number of parameters"
echo "Usage: $0 <characters_to_keep> <number_of_characters>"
exit 1
fi
}

usage "$@"

chars_to_keep=$1
num_chars=$2

bytes=$(< /dev/urandom tr -dc "${chars_to_keep}" | fold -w "$num_chars" | head -n 1)

jq -n --arg output "$bytes" '{"output":$output}'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Fail fast before entering the /dev/urandom pipeline.

An empty chars_to_keep makes tr -dc reject every byte, so head -n 1 never returns; and without pipefail, an upstream failure can still emit {"output":""}. That can hang terraform apply or generate empty secrets.

💡 Proposed fix
- set -e
+ set -euo pipefail
...
 usage "$@"
 
 chars_to_keep=$1
 num_chars=$2
+
+if [ -z "$chars_to_keep" ] || ! [[ "$num_chars" =~ ^[1-9][0-9]*$ ]]; then
+    echo "Usage: $0 <characters_to_keep> <number_of_characters>" >&2
+    exit 1
+fi
 
-bytes=$(< /dev/urandom tr -dc "${chars_to_keep}" | fold -w "$num_chars" | head -n 1)
+bytes=$(LC_ALL=C tr -dc "${chars_to_keep}" < /dev/urandom | fold -w "$num_chars" | head -n 1)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
set -e
usage() {
if [ "$#" -ne 2 ]; then
echo "Illegal number of parameters"
echo "Usage: $0 <characters_to_keep> <number_of_characters>"
exit 1
fi
}
usage "$@"
chars_to_keep=$1
num_chars=$2
bytes=$(< /dev/urandom tr -dc "${chars_to_keep}" | fold -w "$num_chars" | head -n 1)
jq -n --arg output "$bytes" '{"output":$output}'
set -euo pipefail
usage() {
if [ "$#" -ne 2 ]; then
echo "Illegal number of parameters"
echo "Usage: $0 <characters_to_keep> <number_of_characters>"
exit 1
fi
}
usage "$@"
chars_to_keep=$1
num_chars=$2
if [ -z "$chars_to_keep" ] || ! [[ "$num_chars" =~ ^[1-9][0-9]*$ ]]; then
echo "Usage: $0 <characters_to_keep> <number_of_characters>" >&2
exit 1
fi
bytes=$(LC_ALL=C tr -dc "${chars_to_keep}" < /dev/urandom | fold -w "$num_chars" | head -n 1)
jq -n --arg output "$bytes" '{"output":$output}'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@terraform/kubernetes/scripts/create_random_bytes.sh` around lines 19 - 36,
The random-bytes generation in create_random_bytes.sh can hang or emit an empty
JSON payload when chars_to_keep is empty or when an upstream command in the
/dev/urandom pipeline fails. Update the script to validate inputs before the
pipeline starts, reject empty or invalid values for chars_to_keep and num_chars
in usage(), and make the pipeline fail fast by ensuring the script propagates
failures from bytes generation instead of still reaching jq. Use the existing
usage() and bytes assignment as the main touchpoints.

…name-linter

Reverts the private/cockpit/ -> cockpit/ rename so this restore matches
the original main tree exactly: all 10 cockpit files (incl.
kustomization.yaml with its original ../../k8s/cloud relative paths) are
now byte-identical to the pre-deletion state at 610975b.

Upstream main carries filename_linter.yaml (added by pixie-io#2391), which fails
any PR touching a 'private' path. Adding a single '!private/cockpit/**'
negative glob keeps that guard for everything else while letting this
known fork-cloud config through. This is the only deviation in the PR
from a pure byte-for-byte restore.

@entlein entlein left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restoring original commits that were thrown out of main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants