Clean up setup experience cancellation behavior#43437
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #43437 +/- ##
===========================================
- Coverage 66.91% 29.22% -37.70%
===========================================
Files 2596 2307 -289
Lines 208103 177907 -30196
Branches 9322 9214 -108
===========================================
- Hits 139250 51988 -87262
- Misses 56197 121174 +64977
+ Partials 12656 4745 -7911
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…ctivity emission Zed + Opus 4.6; prompt: Are there tests covering the new activity creation in `server/worker/apple_mdm.go` L594-608? If not, add test coverage to assert that the activity is as expected.
Zed + Opis 4.6; prompt: Update tests added on this branch to reflect the cleanup done in the most recent commit.
Zed + Opus 4.6; prompt: Resolve test failures in `TestIntegrationsEnterprise/TestSetupExperienceLinuxWithSoftware` and `TestIntegrationsMDM/TestCancelUpcomingActivity`. You can enable MySQL tests to validate that tests run correctly; don't guess based on code analysis.
(even though I personally use "cancelled")
| { | ||
| HostUUID: hostUUID, | ||
| Status: fleet.SetupExperienceStatusFailure, | ||
| SetupExperienceScriptID: ptr.Uint(4), |
There was a problem hiding this comment.
@marko-lisica @melpike Right now we don't block setup experience on a failed script, so we don't emit a cancelled setup experience activity when a script is cancelled. Looks like we emit the ran_script activity with a failure status, but the FromSetupExperience flag doesn't actually get persisted so setup experience script activities aren't retrieves as such.
Given that setup experience scripts for macOS are a bit of a legacy feature (vs. script packages), do we need to do anything here?
ee/server/service/orbit_test.go
Outdated
| assert.Equal(t, "FailedApp", cseAct.SoftwareTitle) | ||
| assert.Equal(t, failedTitleID, cseAct.SoftwareTitleID) | ||
|
|
||
| // Second: canceled install software |
There was a problem hiding this comment.
@marko-lisica @melpike Do we want this order on activities (failed install first, then cxl'd setup experience, then cxl'd software/app installs) or do we want the setup experience cancellation activity after all of the individual install cancellations? The install failure itself happens a fair amount earlier in a different request so that would need to stay first, absent a larger refactor.
| assert.True(t, vppAct.FromSetupExperience) | ||
| }) | ||
|
|
||
| t.Run("script cancellation does not trigger activity", func(t *testing.T) { |
There was a problem hiding this comment.
@melpike @marko-lisica This test explicitly describes what I mentioned above.
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
@claude review once |
There was a problem hiding this comment.
Pull request overview
Adds audit-log visibility and UI rendering for setup experience cancellation due to software install failures, and improves cancellation attribution for related “canceled install” activities.
Changes:
- Adds a new
canceled_setup_experienceactivity type and UI templates for host + global activity feeds. - Extends “canceled install” activities (software + App Store app) with
from_setup_experiencemetadata and surfaces that in UI copy. - Records additional activities on VPP enqueue failures during setup experience and adds/updates tests accordingly.
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
server/worker/apple_mdm.go |
Emits an installed_app_store_app activity when VPP enqueue fails during iOS/iPadOS setup experience. |
server/worker/apple_mdm_test.go |
Adds worker test coverage asserting activity emission on VPP enqueue failure. |
server/service/integration_mdm_test.go |
Updates integration expectations to include from_setup_experience: false on canceled install activities. |
server/service/integration_enterprise_test.go |
Updates setup-experience Linux integration test to expect canceled install activity semantics. |
server/fleet/setup_experience.go |
Adds helper IsForVPPApp() to classify setup experience status results. |
server/fleet/activities.go |
Adds canceled_setup_experience activity type; extends canceled install activity details with from_setup_experience. |
frontend/pages/hosts/details/cards/Activity/ActivityItems/CanceledSetupExperienceActivityItem/index.ts |
Exports the new host activity item component. |
frontend/pages/hosts/details/cards/Activity/ActivityItems/CanceledSetupExperienceActivityItem/CanceledSetupExperienceActivityItem.tsx |
Renders the new host-scoped canceled setup experience activity item. |
frontend/pages/hosts/details/cards/Activity/ActivityItems/CanceledInstallSoftwareActivityItem/CanceledInstallSoftwareActivityItem.tsx |
Updates host activity copy to mention “during setup experience” when applicable. |
frontend/pages/hosts/details/cards/Activity/ActivityConfig.tsx |
Wires the new activity type to the host activity component map. |
frontend/pages/DashboardPage/cards/ActivityFeed/GlobalActivityItem/GlobalActivityItem.tsx |
Adds global-feed detail template + switch case for canceled setup experience; updates copy for canceled installs. |
frontend/interfaces/activity.ts |
Introduces new activity enum value and filter label; adds from_setup_experience to details typing. |
ee/server/service/setup_experience.go |
Emits an installed_app_store_app activity on VPP enqueue failure during setup experience. |
ee/server/service/orbit.go |
Replaces “fail cancelled installs” logic with “record canceled activities + mark cancelled as failure” logic. |
ee/server/service/orbit_test.go |
Adds unit tests for the new cancellation-recording helper and WasFromAutomation behavior. |
ee/server/service/devices.go |
Uses the new cancellation-recording helper when serving device setup experience status. |
cmd/fleet/serve.go |
Passes svc.NewActivity into the Apple MDM worker schedule. |
cmd/fleet/cron.go |
Threads NewActivityFn into the Apple MDM worker instance. |
changes/34288-setup-experience-cancel-activity |
Adds release notes for the new cancellation/failure activities behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…install fails Zed + Opus 4.6; prompt: Move L238-252 to L~293 of maybeCancelPendingSetupExperienceSteps, wiring up dependencies as needed. This looks to be the only way to resolve #43437 (comment) since otherwise we would emit an activity on each poll. It may make sense to pass through the relevant failed install informaiton rather than pulling the failure out of the database.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ee/server/service/orbit.go`:
- Around line 242-266: The code currently sets r.Status =
fleet.SetupExperienceStatusFailure and calls
svc.ds.UpdateSetupExperienceStatusResult, then calls svc.NewActivity and returns
an error if NewActivity fails; this can permanently drop cancellations because
the DB row is already mutated. Fix by making the activity write best-effort:
persist the status transition via svc.ds.UpdateSetupExperienceStatusResult first
(or, if supported, create an atomic store method that persists both the status
and the activity), then call svc.NewActivity for the relevant branches
(r.IsForSoftwarePackage / r.IsForVPPApp) but do not return the error to Orbit if
NewActivity fails — instead log the failure with svc.logger.ErrorContext
(include hostUUID/hostID, r.Name, and the error) and continue; alternatively
implement a transactional method on the datastore to update the status and
insert the activity together (e.g., AddSetupExperienceStatusAndActivity) and
call that instead of separate UpdateSetupExperienceStatusResult + NewActivity.
In `@ee/server/service/setup_experience.go`:
- Around line 285-296: The current code constructs a
fleet.ActivityInstalledAppStoreApp (failActivity) before installSoftwareFromVPP
returns a CommandUUID, which results in a zero CommandUUID being stored; update
the code to emit a separate pre-enqueue failure activity (e.g., new
struct/variant like fleet.ActivityEnqueueVPPFailure or
fleet.ActivityPreEnqueueVPPFailure) instead of using
ActivityInstalledAppStoreApp, populate it with HostID, HostDisplayName,
SoftwareTitle, AppStoreID, Status (SoftwareInstallFailed) and
FromSetupExperience, and call svc.NewActivity(ctx, nil, <newActivity>) in place
of the existing failActivity creation so downstream consumers can distinguish
pre-command enqueue failures from command-backed install activities (or
alternatively extend ActivityInstalledAppStoreApp to clearly mark pre-enqueue
failures and avoid a zero CommandUUID).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b3571bf9-7ebd-48c9-ac0d-3d2dce85ab2d
📒 Files selected for processing (9)
ee/server/service/orbit.goee/server/service/orbit_test.goee/server/service/setup_experience.goserver/fleet/activities.goserver/service/apple_mdm.goserver/service/apple_mdm_cmd_results.goserver/service/orbit.goserver/service/setup_experience.goserver/service/setup_experience_test.go
✅ Files skipped from review due to trivial changes (1)
- ee/server/service/orbit_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- server/fleet/activities.go
| r.Status = fleet.SetupExperienceStatusFailure | ||
| svc.logger.InfoContext(ctx, "marking setup experience software as failed due to cancellation", "host_uuid", hostUUID, "software_name", r.Name) | ||
| svc.logger.InfoContext(ctx, "marking setup experience software as canceled", "host_uuid", hostUUID, "software_name", r.Name) | ||
| err := svc.ds.UpdateSetupExperienceStatusResult(ctx, r) | ||
| if err != nil { | ||
| return ctxerr.Wrap(ctx, err, "failing cancelled setup experience software install") | ||
| } | ||
| // TODO -- support recording activity for failed VPP apps as well. | ||
| // https://github.com/fleetdm/fleet/issues/34288 | ||
| if r.IsForSoftwarePackage() { | ||
| softwarePackage := "" | ||
| var source *string | ||
| installerMeta, err := svc.ds.GetSoftwareInstallerMetadataByID(ctx, *r.SoftwareInstallerID) | ||
| if err != nil && !fleet.IsNotFound(err) { | ||
| return ctxerr.Wrap(ctx, err, "getting software installer metadata for cancelled setup experience software install") | ||
| } | ||
| if installerMeta != nil { | ||
| softwarePackage = installerMeta.Name | ||
| // Get the software title to retrieve the source | ||
| if installerMeta.TitleID != nil { | ||
| title, err := svc.ds.SoftwareTitleByID(ctx, *installerMeta.TitleID, nil, fleet.TeamFilter{}) | ||
| if err != nil && !fleet.IsNotFound(err) { | ||
| return ctxerr.Wrap(ctx, err, "getting software title for cancelled setup experience software install") | ||
| } | ||
| if title != nil { | ||
| source = &title.Source | ||
| } | ||
| } | ||
| } | ||
| activity := fleet.ActivityTypeInstalledSoftware{ | ||
| if err := svc.NewActivity(ctx, nil, fleet.ActivityTypeCanceledInstallSoftware{ | ||
| HostID: hostID, | ||
| HostDisplayName: hostDisplayName, | ||
| SoftwareTitle: r.Name, | ||
| SoftwarePackage: softwarePackage, | ||
| InstallUUID: ptr.ValOrZero(r.HostSoftwareInstallsExecutionID), | ||
| Status: "failed", | ||
| SelfService: false, | ||
| Source: source, | ||
| SoftwareTitleID: ptr.ValOrZero(r.SoftwareTitleID), | ||
| FromSetupExperience: true, | ||
| }); err != nil { | ||
| return ctxerr.Wrap(ctx, err, "creating activity for canceled setup experience software install") | ||
| } | ||
| err = svc.NewActivity(ctx, nil, activity) | ||
| if err != nil { | ||
| return ctxerr.Wrap(ctx, err, "creating activity for cancelled setup experience software install") | ||
| } else if r.IsForVPPApp() { | ||
| if err := svc.NewActivity(ctx, nil, fleet.ActivityTypeCanceledInstallAppStoreApp{ | ||
| HostID: hostID, | ||
| HostDisplayName: hostDisplayName, | ||
| SoftwareTitle: r.Name, | ||
| SoftwareTitleID: ptr.ValOrZero(r.SoftwareTitleID), | ||
| FromSetupExperience: true, | ||
| }); err != nil { | ||
| return ctxerr.Wrap(ctx, err, "creating activity for canceled setup experience VPP app install") |
There was a problem hiding this comment.
Don't fail the poll after the cancellation has already been consumed.
Line 242 flips the row to Failure before the activity write. If NewActivity fails at Line 249 or Line 259, this returns an error to Orbit even though the DB state is already mutated, and the next poll won't retry because the row is no longer Cancelled. That permanently drops the cancellation activity. Please make the activity write best-effort here, or persist the status transition and activity atomically.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ee/server/service/orbit.go` around lines 242 - 266, The code currently sets
r.Status = fleet.SetupExperienceStatusFailure and calls
svc.ds.UpdateSetupExperienceStatusResult, then calls svc.NewActivity and returns
an error if NewActivity fails; this can permanently drop cancellations because
the DB row is already mutated. Fix by making the activity write best-effort:
persist the status transition via svc.ds.UpdateSetupExperienceStatusResult first
(or, if supported, create an atomic store method that persists both the status
and the activity), then call svc.NewActivity for the relevant branches
(r.IsForSoftwarePackage / r.IsForVPPApp) but do not return the error to Orbit if
NewActivity fails — instead log the failure with svc.logger.ErrorContext
(include hostUUID/hostID, r.Name, and the error) and continue; alternatively
implement a transactional method on the datastore to update the status and
insert the activity together (e.g., AddSetupExperienceStatusAndActivity) and
call that instead of separate UpdateSetupExperienceStatusResult + NewActivity.
There was a problem hiding this comment.
Will have to accept the risk here, since we can't run service layer work in the same transaction. Hard-failing here is probably a little better than letting this through. Going to keep this open in case a human reviewer has different opinions.
| failActivity := fleet.ActivityInstalledAppStoreApp{ | ||
| HostID: host.ID, | ||
| HostDisplayName: host.DisplayName(), | ||
| SoftwareTitle: sw.Name, | ||
| AppStoreID: ptr.ValOrZero(sw.VPPAppAdamID), | ||
| Status: string(fleet.SoftwareInstallFailed), | ||
| HostPlatform: host.Platform, | ||
| FromSetupExperience: true, | ||
| } | ||
| if actErr := svc.NewActivity(ctx, nil, failActivity); actErr != nil { | ||
| svc.logger.WarnContext(ctx, "failed to create activity for VPP app install failure during setup experience", "err", actErr) | ||
| } |
There was a problem hiding this comment.
Pre-enqueue VPP failures need a different activity type.
This branch runs before installSoftwareFromVPP returns a command UUID, but fleet.ActivityInstalledAppStoreApp is the command-backed payload. Emitting it here stores the zero value for CommandUUID, so downstream consumers can't correlate this failure with a real App Store install command. Please use a dedicated enqueue-failure activity or extend the schema for this pre-command failure path instead of reusing the command-scoped activity.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ee/server/service/setup_experience.go` around lines 285 - 296, The current
code constructs a fleet.ActivityInstalledAppStoreApp (failActivity) before
installSoftwareFromVPP returns a CommandUUID, which results in a zero
CommandUUID being stored; update the code to emit a separate pre-enqueue failure
activity (e.g., new struct/variant like fleet.ActivityEnqueueVPPFailure or
fleet.ActivityPreEnqueueVPPFailure) instead of using
ActivityInstalledAppStoreApp, populate it with HostID, HostDisplayName,
SoftwareTitle, AppStoreID, Status (SoftwareInstallFailed) and
FromSetupExperience, and call svc.NewActivity(ctx, nil, <newActivity>) in place
of the existing failActivity creation so downstream consumers can distinguish
pre-command enqueue failures from command-backed install activities (or
alternatively extend ActivityInstalledAppStoreApp to clearly mark pre-enqueue
failures and avoid a zero CommandUUID).
There was a problem hiding this comment.
@marko-lisica @melpike It seems like the blank command ID has a reasonable path in the UI right now so this is less of a big deal. Thoughts? Can retrace my steps with this prompt:
What happens on the frontend in the activity feed when a failed
installed_app_store_appactivity has no command UUID?
Zed + Opus 4.6; prompt: In `server/service/setup_experience_test.go`, for calls to `maybeUpdateSetupExperienceStatus`, replace nil 5th params with activity functions when it makes sense to spy on the function calls. If we don't have a test where the new activity function would be invoked, add a test that exercises that path.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
server/service/setup_experience.go (1)
322-339: Emitcanceled_setup_experienceonly when cancellation actually occurred.Right now, the activity can be emitted whenever a failed software row exists, even if this call had no pending/running steps to cancel. That can produce duplicate/no-op cancellation events.
Possible guard to avoid no-op/duplicate emits
func maybeCancelPendingSetupExperienceSteps(ctx context.Context, ds fleet.Datastore, host *fleet.Host, newActivityFn fleet.NewActivityFunc) error { @@ + hadPendingOrRunning := false for _, status := range statuses { if err := status.IsValid(); err != nil { return ctxerr.Wrap(ctx, err, "invalid row") } if status.Status != fleet.SetupExperienceStatusPending && status.Status != fleet.SetupExperienceStatusRunning { continue } + hadPendingOrRunning = true // Cancel any upcoming software installs, vpp installs or script runs. @@ - if newActivityFn != nil { + if hadPendingOrRunning && newActivityFn != nil { for _, s := range statuses { if s.Status == fleet.SetupExperienceStatusFailure && s.IsForSoftware() {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/service/setup_experience.go` around lines 322 - 339, Only emit ActivityTypeCanceledSetupExperience when this call actually performed a cancellation: track whether any SetupExperience row changed from a pending/running state to a canceled state during this request and use that boolean guard before calling newActivityFn. Concretely, set a local flag (e.g., canceledOccurred) when you transition a status (statuses slice handling code) from pending/running to fleet.SetupExperienceStatusCanceled, then replace the current loop that emits based only on SetupExperienceStatusFailure/IsForSoftware with one that checks canceledOccurred && that the failed software is present; call newActivityFn(...) only if canceledOccurred is true to avoid duplicate/no-op emits.server/service/setup_experience_test.go (1)
560-567: Strengthen mock input assertions in the new failure-path test.
HostByIdentifierFuncandListSetupExperienceResultsByHostUUIDFunccurrently accept any identifier values, so incorrect argument plumbing could still pass this test.Suggested test hardening
ds.HostByIdentifierFunc = func(ctx context.Context, identifier string) (*fleet.Host, error) { + require.Equal(t, hostUUID, identifier) return &fleet.Host{ ID: 1, UUID: hostUUID, Platform: "darwin", TeamID: &teamID, }, nil } @@ ds.ListSetupExperienceResultsByHostUUIDFunc = func(ctx context.Context, hUUID string, tID uint) ([]*fleet.SetupExperienceStatusResult, error) { + require.Equal(t, hostUUID, hUUID) + require.Equal(t, teamID, tID) return []*fleet.SetupExperienceStatusResult{Also applies to: 583-603
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/service/setup_experience_test.go` around lines 560 - 567, The mocks accept any identifier/UUID which lets wrong inputs pass; tighten ds.HostByIdentifierFunc and ds.ListSetupExperienceResultsByHostUUIDFunc to validate their input arguments (e.g., assert the incoming identifier equals the expected host identifier and the incoming UUID equals hostUUID) and return an error or call t.Fatalf/t.Errorf when they differ so the test fails on incorrect plumbing; update both mock implementations (HostByIdentifierFunc and ListSetupExperienceResultsByHostUUIDFunc) in the failure-path test (and the similar block around lines 583-603) to perform these checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/service/setup_experience.go`:
- Around line 400-403: The error message wrapped around
maybeCancelPendingSetupExperienceSteps is too platform-specific ("macos software
install failure"); update the fmt.Errorf call that wraps cancelErr (the return
in the block after calling maybeCancelPendingSetupExperienceSteps) to use a
generic, accurate context (e.g., "cancel setup experience after
install/cancellation failure" or "cancel setup experience after software install
or VPP-triggered cancellation") so it reflects Windows/VPP paths as well while
still wrapping cancelErr.
---
Nitpick comments:
In `@server/service/setup_experience_test.go`:
- Around line 560-567: The mocks accept any identifier/UUID which lets wrong
inputs pass; tighten ds.HostByIdentifierFunc and
ds.ListSetupExperienceResultsByHostUUIDFunc to validate their input arguments
(e.g., assert the incoming identifier equals the expected host identifier and
the incoming UUID equals hostUUID) and return an error or call t.Fatalf/t.Errorf
when they differ so the test fails on incorrect plumbing; update both mock
implementations (HostByIdentifierFunc and
ListSetupExperienceResultsByHostUUIDFunc) in the failure-path test (and the
similar block around lines 583-603) to perform these checks.
In `@server/service/setup_experience.go`:
- Around line 322-339: Only emit ActivityTypeCanceledSetupExperience when this
call actually performed a cancellation: track whether any SetupExperience row
changed from a pending/running state to a canceled state during this request and
use that boolean guard before calling newActivityFn. Concretely, set a local
flag (e.g., canceledOccurred) when you transition a status (statuses slice
handling code) from pending/running to fleet.SetupExperienceStatusCanceled, then
replace the current loop that emits based only on
SetupExperienceStatusFailure/IsForSoftware with one that checks canceledOccurred
&& that the failed software is present; call newActivityFn(...) only if
canceledOccurred is true to avoid duplicate/no-op emits.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 76a6defe-0533-4710-8f0b-e770e6766b3c
📒 Files selected for processing (2)
server/service/setup_experience.goserver/service/setup_experience_test.go
Resolves #43437 (comment)
…dupes if that step doesn't succeed Resolves #43437 (review)
|
@claude review once |
For #34288.
Related issue: Resolves #
Checklist for submitter
If some of the following don't apply, delete the relevant line.
changes/,orbit/changes/oree/fleetd-chrome/changes.See Changes files for more information.
Testing
Added/updated automated tests
QA'd all new/changed functionality manually
Summary by CodeRabbit