fix(agent): harden PSU Event Hub worker lifecycle#1830
Conversation
Share the PowerShell worker pool, bound stored results, enforce worker timeouts, and make SignalR reconnects and execution tasks shut down predictably. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Implementation notes:
Testing:
Full workspace validation notes:
|
Let maintainers know that an action is required on their side
|
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR hardens the Devolutions Agent’s experimental PowerShell Universal (PSU) Event Hub compatibility path to reduce leaked results, avoid long-running/stuck PowerShell worker processes, and make reconnect behavior more resilient after hub failures.
Changes:
- Adds exponential backoff reconnect logic and tracks invocation execution tasks so they can be drained and shut down cleanly.
- Introduces PowerShell worker process timeouts and safer temp-file handling (shared worker script + auto-cleaned request files).
- Adds result-store TTL + max-size eviction and rejects unsupported
UseDefaultCredentials-without-AppTokenconfigurations.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| devolutions-agent/src/psu_event_hub/signalr.rs | Adds reconnect backoff, tracks spawned invocation tasks, and improves header construction/sanitization. |
| devolutions-agent/src/psu_event_hub/result_store.rs | Adds TTL + bounded result storage with eviction and tests. |
| devolutions-agent/src/psu_event_hub/powershell_worker.rs | Adds worker execution timeout, kill-on-drop, and safer temp file lifecycle for requests/scripts. |
| devolutions-agent/src/psu_event_hub/models.rs | Adds a timeout response constructor to represent worker timeouts. |
| devolutions-agent/src/psu_event_hub/mod.rs | Initializes a shared worker, validates connection config, and wires executor/connection tasks together. |
| devolutions-agent/src/psu_event_hub/executor.rs | Routes invocation execution through a caller-provided JoinSet for lifecycle management. |
| devolutions-agent/src/config.rs | Derives Default for PSU Event Hub config and documents defaults. |
| devolutions-agent/Cargo.toml | Promotes tempfile to a regular dependency (used by runtime code). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Distinguish task cancellation from panics when joining execution tasks, and align the UseDefaultCredentials bail message with project error conventions (lowercase, no trailing punctuation). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Benoît Cortier (CBenoit)
left a comment
There was a problem hiding this comment.
Thanks, this is much better! 👍
| fn default_credentials_without_app_token_are_rejected() { | ||
| let connection = dto::PsuEventHubConnectionConf { | ||
| hub: "Hub".to_owned(), | ||
| url: Url::parse("http://localhost:5000").expect("parse URL"), | ||
| app_token: None, | ||
| use_default_credentials: true, | ||
| script_path: None, | ||
| description: None, | ||
| }; | ||
|
|
||
| assert!(validate_connection(&connection).is_err()); |
There was a problem hiding this comment.
thought: This should probably be encoded at the type level instead of through a validate function; making the runtime test unrequired.
| let request_json = serde_json::to_vec(request).context("failed to serialize PSU worker request")?; | ||
| let temp_path = tempfile::Builder::new() | ||
| .prefix("devolutions-agent-psu-") | ||
| .suffix(".json") | ||
| .tempfile_in(temp_dir()?.as_std_path()) | ||
| .context("failed to create temporary PSU worker request")? | ||
| .into_temp_path(); | ||
| let path = Utf8PathBuf::from_path_buf(temp_path.to_path_buf()) | ||
| .map_err(|path| anyhow::anyhow!("non-UTF-8 PSU worker request path: {path:?}"))?; | ||
|
|
||
| tokio::fs::write(&path, request_json) | ||
| .await | ||
| .with_context(|| format!("failed to write PSU worker request at {path}"))?; |
There was a problem hiding this comment.
I see that TempRequestFile::write serializes the WorkerRequest (the command name and the event data arguments) and writes it to std::env::temp_dir().
tempfile::Builder creates the file with default permissions. On Linux, std::env::temp_dir() resolves to a world-readable /tmp, so the request payload is briefly readable by any other local user while the worker runs. On Windows I’m not sure.
Since the data field carries arguments coming straight off the Event Hub, so its sensitivity depends entirely on what callers send. I’m not sure how sensible this can be, but I’m a bit worried about that.
suggestion: pass the request over stdin instead of a temp file. invoke_worker already sets stdin(Stdio::null()) (line 257), so the worker script could read the JSON from stdin and we'd eliminate the on-disk exposure (and the extra file I/O) entirely.
Benoît Cortier (CBenoit)
left a comment
There was a problem hiding this comment.
LGTM! I think it’s fine to not address my last two comments right now since the feature is experimental. We can consider than in the future.
Hardens the experimental PowerShell Universal Event Hub compatibility path so agent deployments are less likely to exhaust local PowerShell workers, leak completed results, or keep stuck scripts running indefinitely.
The agent now uses safer defaults for worker execution, reconnects more gradually after hub failures, and rejects unsupported default-credential configuration instead of attempting unauthenticated connections.
Issue: #1803