Skip to content

fix(agent): harden PSU Event Hub worker lifecycle#1830

Merged
Benoît Cortier (CBenoit) merged 5 commits into
masterfrom
mamoreau-devolutions-psu-agent-review-plan
Jun 23, 2026
Merged

fix(agent): harden PSU Event Hub worker lifecycle#1830
Benoît Cortier (CBenoit) merged 5 commits into
masterfrom
mamoreau-devolutions-psu-agent-review-plan

Conversation

@mamoreau-devolutions

Copy link
Copy Markdown
Contributor

Hardens the experimental PowerShell Universal Event Hub compatibility path so agent deployments are less likely to exhaust local PowerShell workers, leak completed results, or keep stuck scripts running indefinitely.

The agent now uses safer defaults for worker execution, reconnects more gradually after hub failures, and rejects unsupported default-credential configuration instead of attempting unauthenticated connections.

Issue: #1803

Share the PowerShell worker pool, bound stored results, enforce worker timeouts, and make SignalR reconnects and execution tasks shut down predictably.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mamoreau-devolutions

Copy link
Copy Markdown
Contributor Author

Implementation notes:

  • Shares a single PSU PowerShell worker across Event Hub connections and AppToken secret resolution, so MaxWorkerPoolSize applies at the task level instead of per connection.
  • Keeps execution tasks alive across SignalR reconnects and shuts them down only when the connection task is stopping, preserving returned execution IDs through transient disconnects.
  • Adds a 30-minute PowerShell execution timeout, kill-on-drop child handling, tempfile-backed request files, and a stable worker script file per worker lifetime.
  • Bounds stored results with a 15-minute TTL and 1024-entry per-connection cap to avoid unbounded memory growth when hubs abandon result IDs.
  • Adds capped exponential reconnect backoff, sanitizes PSU description headers, uses Windows SAM-compatible identity lookup for service-context headers, and skips unsupported UseDefaultCredentials-only connections.

Testing:

  • cargo +nightly fmt --all
  • cargo test -p devolutions-agent -- --test-threads=1
  • cargo clippy -p devolutions-agent --tests -- -D warnings
  • cargo +nightly clippy -p devolutions-agent --tests --no-deps -- -D warnings

Full workspace validation notes:

  • cargo +nightly clippy --workspace --tests -- -D warnings is currently blocked by unrelated pre-existing clippy failures in other crates.
  • cargo test --workspace reaches and passes devolutions-agent tests, then is blocked by devolutions-agent-updater requiring elevation on this machine.

@github-actions

Copy link
Copy Markdown

Let maintainers know that an action is required on their side

  • Add the label release-required Please cut a new release (Devolutions Gateway, Devolutions Agent, Jetsocat, PowerShell module) when you request a maintainer to cut a new release (Devolutions Gateway, Devolutions Agent, Jetsocat, PowerShell module)

  • Add the label release-blocker Follow-up is required before cutting a new release if a follow-up is required before cutting a new release

  • Add the label publish-required Please publish libraries (`Devolutions.Gateway.Utils`, OpenAPI clients, etc) when you request a maintainer to publish libraries (Devolutions.Gateway.Utils, OpenAPI clients, etc.)

  • Add the label publish-blocker Follow-up is required before publishing libraries if a follow-up is required before publishing libraries

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the Devolutions Agent’s experimental PowerShell Universal (PSU) Event Hub compatibility path to reduce leaked results, avoid long-running/stuck PowerShell worker processes, and make reconnect behavior more resilient after hub failures.

Changes:

  • Adds exponential backoff reconnect logic and tracks invocation execution tasks so they can be drained and shut down cleanly.
  • Introduces PowerShell worker process timeouts and safer temp-file handling (shared worker script + auto-cleaned request files).
  • Adds result-store TTL + max-size eviction and rejects unsupported UseDefaultCredentials-without-AppToken configurations.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
devolutions-agent/src/psu_event_hub/signalr.rs Adds reconnect backoff, tracks spawned invocation tasks, and improves header construction/sanitization.
devolutions-agent/src/psu_event_hub/result_store.rs Adds TTL + bounded result storage with eviction and tests.
devolutions-agent/src/psu_event_hub/powershell_worker.rs Adds worker execution timeout, kill-on-drop, and safer temp file lifecycle for requests/scripts.
devolutions-agent/src/psu_event_hub/models.rs Adds a timeout response constructor to represent worker timeouts.
devolutions-agent/src/psu_event_hub/mod.rs Initializes a shared worker, validates connection config, and wires executor/connection tasks together.
devolutions-agent/src/psu_event_hub/executor.rs Routes invocation execution through a caller-provided JoinSet for lifecycle management.
devolutions-agent/src/config.rs Derives Default for PSU Event Hub config and documents defaults.
devolutions-agent/Cargo.toml Promotes tempfile to a regular dependency (used by runtime code).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread devolutions-agent/src/psu_event_hub/signalr.rs
Comment thread devolutions-agent/src/psu_event_hub/mod.rs
Distinguish task cancellation from panics when joining execution tasks,
and align the UseDefaultCredentials bail message with project error
conventions (lowercase, no trailing punctuation).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@CBenoit Benoît Cortier (CBenoit) enabled auto-merge (squash) June 23, 2026 14:47

@CBenoit Benoît Cortier (CBenoit) left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is much better! 👍

Comment on lines +121 to +131
fn default_credentials_without_app_token_are_rejected() {
let connection = dto::PsuEventHubConnectionConf {
hub: "Hub".to_owned(),
url: Url::parse("http://localhost:5000").expect("parse URL"),
app_token: None,
use_default_credentials: true,
script_path: None,
description: None,
};

assert!(validate_connection(&connection).is_err());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: This should probably be encoded at the type level instead of through a validate function; making the runtime test unrequired.

Comment on lines +333 to +345
let request_json = serde_json::to_vec(request).context("failed to serialize PSU worker request")?;
let temp_path = tempfile::Builder::new()
.prefix("devolutions-agent-psu-")
.suffix(".json")
.tempfile_in(temp_dir()?.as_std_path())
.context("failed to create temporary PSU worker request")?
.into_temp_path();
let path = Utf8PathBuf::from_path_buf(temp_path.to_path_buf())
.map_err(|path| anyhow::anyhow!("non-UTF-8 PSU worker request path: {path:?}"))?;

tokio::fs::write(&path, request_json)
.await
.with_context(|| format!("failed to write PSU worker request at {path}"))?;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that TempRequestFile::write serializes the WorkerRequest (the command name and the event data arguments) and writes it to std::env::temp_dir().

tempfile::Builder creates the file with default permissions. On Linux, std::env::temp_dir() resolves to a world-readable /tmp, so the request payload is briefly readable by any other local user while the worker runs. On Windows I’m not sure.

Since the data field carries arguments coming straight off the Event Hub, so its sensitivity depends entirely on what callers send. I’m not sure how sensible this can be, but I’m a bit worried about that.

suggestion: pass the request over stdin instead of a temp file. invoke_worker already sets stdin(Stdio::null()) (line 257), so the worker script could read the JSON from stdin and we'd eliminate the on-disk exposure (and the extra file I/O) entirely.

@CBenoit Benoît Cortier (CBenoit) left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I think it’s fine to not address my last two comments right now since the feature is experimental. We can consider than in the future.

@CBenoit Benoît Cortier (CBenoit) merged commit 1827007 into master Jun 23, 2026
42 checks passed
@CBenoit Benoît Cortier (CBenoit) deleted the mamoreau-devolutions-psu-agent-review-plan branch June 23, 2026 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants