Skip to content

stp, sig-virt: Introduce the AllowWorkloadDisruption migration STP#82

Open
SamAlber wants to merge 1 commit intoRedHatQE:mainfrom
SamAlber:stp/sig-virt/workload-disruption-migration
Open

stp, sig-virt: Introduce the AllowWorkloadDisruption migration STP#82
SamAlber wants to merge 1 commit intoRedHatQE:mainfrom
SamAlber:stp/sig-virt/workload-disruption-migration

Conversation

@SamAlber
Copy link
Copy Markdown

@SamAlber SamAlber commented Apr 16, 2026

Add Software Test Plan for the AllowWorkloadDisruption (AWD) migration feature, covering PostCopy and Paused migration modes across RHEL and Windows guests with migration, node drain, and CPU/memory hotplug triggers.

Special notes for your reviewer

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive test plan for the "Allow Workload Disruption Migration" enhancement, including QE approach, mapped test scenarios, pass/fail criteria, prioritized test goals (P0/P1), environment and entry criteria, supported triggers and guest OS coverage, explicit exclusions and limitations, risk mitigations, and designated reviewers/approvers.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 16, 2026

Warning

Rate limit exceeded

@SamAlber has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 46 minutes and 12 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 46 minutes and 12 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b7e84171-949b-4063-b7a6-cd6039245042

📥 Commits

Reviewing files that changed from the base of the PR and between df64602 and 8af640b.

📒 Files selected for processing (1)
  • stps/sig-virt/workload-disruption-migration.md
📝 Walkthrough

Walkthrough

Added a new Software Test Plan (STP) at stps/sig-virt/workload-disruption-migration.md specifying terminology, end-to-end QE plans tied to MigrationPolicy fields, test scope (triggers and guest OSes), environment baseline, entry criteria, limitations, risks, prioritized test goals, and traceability mappings.

Changes

Cohort / File(s) Summary
Software Test Plan (new)
stps/sig-virt/workload-disruption-migration.md
New STP introducing AWD/PostCopy/Paused terminology; defines QE test plan mapped to MigrationPolicy fields (allowWorkloadDisruption, allowPostCopy, bandwidthPerMigration, completionTimeoutPerGiB) and VMI observables (migrationState.mode); includes test scope (migration triggers, RHEL/Windows guests), environment baseline (OCP-V 4.17+, multi-worker bare metal, RWX storage, OVN), entry criteria, exclusions/limitations (e.g., s390x memory hotplug, Windows exclusions), risks/mitigations, P0/P1 goals, and a traceability matrix.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly describes the main change: introducing a Software Test Plan (STP) for the AllowWorkloadDisruption migration feature in the sig-virt domain.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-virtualization-qe-bot-2
Copy link
Copy Markdown

Report bugs in Issues

Welcome! 🎉

This pull request will be automatically processed with the following features:

🔄 Automatic Actions

  • Reviewer Assignment: Reviewers are automatically assigned based on the OWNERS file in the repository root
  • Size Labeling: PR size labels (XS, S, M, L, XL, XXL) are automatically applied based on changes
  • Issue Creation: A tracking issue is created for this PR and will be closed when the PR is merged or closed
  • Branch Labeling: Branch-specific labels are applied to track the target branch
  • Auto-verification: Auto-verified users have their PRs automatically marked as verified
  • Labels: Enabled categories: branch, can-be-merged, cherry-pick, has-conflicts, hold, needs-rebase, size, verified, wip

📋 Available Commands

PR Status Management

  • /wip - Mark PR as work in progress (adds WIP: prefix to title)
  • /wip cancel - Remove work in progress status
  • /hold - Block PR merging (approvers only)
  • /hold cancel - Unblock PR merging
  • /verified - Mark PR as verified
  • /verified cancel - Remove verification status
  • /reprocess - Trigger complete PR workflow reprocessing (useful if webhook failed or configuration changed)
  • /regenerate-welcome - Regenerate this welcome message

Review & Approval

  • /lgtm - Approve changes (looks good to me)
  • /approve - Approve PR (approvers only)
  • /assign-reviewers - Assign reviewers based on OWNERS file
  • /assign-reviewer @username - Assign specific reviewer
  • /check-can-merge - Check if PR meets merge requirements

Testing & Validation

  • /retest tox - Run Python test suite with tox
  • /retest all - Run all available tests

Cherry-pick Operations

  • /cherry-pick <branch> - Schedule cherry-pick to target branch when PR is merged
    • Multiple branches: /cherry-pick branch1 branch2 branch3

Label Management

  • /<label-name> - Add a label to the PR
  • /<label-name> cancel - Remove a label from the PR

✅ Merge Requirements

This PR will be automatically approved when the following conditions are met:

  1. Approval: /approve from at least one approver
  2. LGTM Count: Minimum 2 /lgtm from reviewers
  3. Status Checks: All required status checks must pass
  4. No Blockers: No wip, hold, has-conflicts labels and PR must be mergeable (no conflicts)

📊 Review Process

Approvers and Reviewers

Approvers:

  • dshchedr
  • vsibirsk

Reviewers:

  • SamAlber
  • SiboWang1997
  • akri3i
  • dshchedr
  • jerry7z
  • kbidarkar
  • vsibirsk
Available Labels
  • hold
  • verified
  • wip
  • lgtm
  • approve
AI Features
  • Cherry-Pick Conflict Resolution: Enabled (claude/claude-opus-4-6[1m])

💡 Tips

  • WIP Status: Use /wip when your PR is not ready for review
  • Verification: The verified label is removed on new commits unless the push is detected as a clean rebase
  • Cherry-picking: Cherry-pick labels are processed when the PR is merged
  • Permission Levels: Some commands require approver permissions
  • Auto-verified Users: Certain users have automatic verification and merge privileges

For more information, please refer to the project documentation or contact the maintainers.

@openshift-virtualization-qe-bot-2
Copy link
Copy Markdown

SiboWang1997 can not be added as reviewer. Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.: 422 {"message": "Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.", "documentation_url": "https://docs.github.com/rest/pulls/review-requests#request-reviewers-for-a-pull-request", "status": "422"}

@openshift-virtualization-qe-bot-2
Copy link
Copy Markdown

jerry7z can not be added as reviewer. Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.: 422 {"message": "Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.", "documentation_url": "https://docs.github.com/rest/pulls/review-requests#request-reviewers-for-a-pull-request", "status": "422"}

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@stps/sig-virt/workload-disruption-migration.md`:
- Line 175: Update the PM approver line that currently only lists an email
("Product Manager/Owner: [Ronen Sde-Or](ronen@redhat.com)") to include the
GitHub handle per STP Section IV requirements; locate the "Product
Manager/Owner" entry in workload-disruption-migration.md and replace or augment
the email-only link with the approver's GitHub handle (e.g., include the
`@github-username` or a link to their GitHub profile) so the entry contains both
the name and GitHub handle.
- Around line 46-49: The Known Limitations section under the heading "#### **2.
Known Limitations**" currently lists s390x memory hotplug but lacks the
mandatory sign-off; add an explicit sign-off line for that limitation in the
form "*Sign-off:* [Name/Date]" or replace it with the required phrase "None —
reviewed and confirmed by [Name/Date]" if appropriate, and do the same for the
Test Limitations block referenced around lines 86-88 (ensure each listed
limitation has its own sign-off or the explicit "None — reviewed and confirmed…"
statement) so the STP sections I.2 and II.1 comply with the sign-off guideline.
- Around line 24-30: Summary: The paragraph uses internal API/field names
(allowWorkloadDisruption, MigrationPolicy, allowPostCopy) and implementation
mechanics (precopy, PostCopy, Paused) instead of user-facing behavior; rewrite
to describe only observable user experience and acceptance criteria.
Instructions: Remove all internal identifiers and implementation terms (e.g.,
allowWorkloadDisruption, MigrationPolicy, allowPostCopy, precopy, PostCopy,
Paused) and replace with plain-language descriptions of what users will see
(e.g., "migration may use more disruptive methods to ensure completion", "VM may
be briefly stunned or live state resumed on the destination", "migration will
complete even if memory is rapidly changing"); update acceptance criteria to be
product-observable (e.g., successful node drain completes, VM state visible as
paused or resumed to users, migration completes rather than failing) and apply
the same rewrite style at the other referenced locations (lines ~39, ~57, ~137).
- Around line 41-45: Convert the two items under the "Customer Use Cases"
section into user-story format (e.g., "As an admin, I want to drain a node for
maintenance so that VMs with AWD policy migrate using PostCopy or Paused mode
and resume without process loss" and "As an admin, I want to hotplug CPU/memory
to a running VM so AWD migration reflects new resources without process loss"),
and update the "Non-Functional Requirements (NFRs)" section to explicitly list
and address Monitoring, Observability, UI, Documentation, Performance, Security,
and Scalability (each with a short measurable acceptance criterion or owner),
instead of deferring them to section II.3, so reviewers can clearly verify NFR
coverage.
- Around line 141-149: The Risks table in workload-disruption-migration.md is
missing the required "Other" risk category and lacks explicit sign-off details
for each risk; update the table to include a seventh row labeled "Other" with a
clear risk description and mitigation strategy, and add a "Sign-off" column (or
append sign-off cells) for every risk row containing the approver's name/role
and date (or a placeholder like "TBD by [ROLE] - YYYY-MM-DD") so each risk entry
includes Risk description, Mitigation strategy, and Sign-off; ensure you edit
the existing table header and rows (the risk table block around the Risk
Category/Test Coverage/Test Environment entries) to add these fields
consistently.
- Around line 92-107: The Test Strategy table is missing a "Scale Testing" row
and the "Upgrade Testing" row lacks an explicit statement about upgrade path
evaluation; add a new table row labeled "Scale Testing" with Description
covering scale/volume considerations and a clear Applicable (Y/N or N/A) and
Comments noting planned approach or deferred rationale, and update the existing
"Upgrade Testing" row (the cell under Description or Comments for "Upgrade
Testing") to explicitly state whether the upgrade path was evaluated or deferred
and include any mitigation/plan; locate and edit the rows named "Scale Testing"
and "Upgrade Testing" in the existing markdown table (the entries currently
titled "Upgrade Testing" and the nearby "Performance Testing") to make these
additions/clarifications.
- Around line 79-85: Replace the four TODO PM/Lead Agreement entries in the
Out-of-Scope table for the rows containing "Windows node drain", "Windows memory
hotplug", "s390x memory hotplug", and "Performance / scale testing" with a
proper "PM/Lead Agreement" value including the approver's name and date (e.g.,
"Name / YYYY-MM-DD"), and convert the table into the required itemized format by
ensuring each item has explicit Rationale and PM/Lead Agreement fields (move the
rationale text into the Rationale field and replace each TODO with the name/date
agreement), so the Out-of-Scope section meets the STP Section II.1 requirements.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 090197de-e007-45a6-80c5-dbbc6fba13a7

📥 Commits

Reviewing files that changed from the base of the PR and between 424a4f9 and 7cb614b.

📒 Files selected for processing (1)
  • stps/sig-virt/workload-disruption-migration.md

Comment on lines +24 to +30
When a VM is live-migrated, precopy may fail to converge if the guest dirties pages faster than the migration
bandwidth can transfer them. With `allowWorkloadDisruption` enabled on a MigrationPolicy, KubeVirt is permitted
to use disruptive migration modes (PostCopy or Paused) to guarantee the migration completes. This feature is
critical for operations that require migration — such as node drain and CPU/memory hotplug — where a failed
migration would block the operation entirely. The mode selection is controlled by the `allowPostCopy` field:
when enabled, the migration switches to PostCopy; when disabled, the VM is stunned and migrated in Paused mode.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CRITICAL: Rewrite feature description and requirements in user-facing terms only.

This STP repeatedly uses API/field names and internal mechanics. That breaks the repo rule for user-perspective-only STPs and makes acceptance criteria less product-observable.

As per coding guidelines, "STPs must describe what users experience from a user perspective only — no API field names, CRD names, internal component references, or implementation mechanisms".

Also applies to: 39-39, 57-57, 137-137

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/workload-disruption-migration.md` around lines 24 - 30,
Summary: The paragraph uses internal API/field names (allowWorkloadDisruption,
MigrationPolicy, allowPostCopy) and implementation mechanics (precopy, PostCopy,
Paused) instead of user-facing behavior; rewrite to describe only observable
user experience and acceptance criteria. Instructions: Remove all internal
identifiers and implementation terms (e.g., allowWorkloadDisruption,
MigrationPolicy, allowPostCopy, precopy, PostCopy, Paused) and replace with
plain-language descriptions of what users will see (e.g., "migration may use
more disruptive methods to ensure completion", "VM may be briefly stunned or
live state resumed on the destination", "migration will complete even if memory
is rapidly changing"); update acceptance criteria to be product-observable
(e.g., successful node drain completes, VM state visible as paused or resumed to
users, migration completes rather than failing) and apply the same rewrite style
at the other referenced locations (lines ~39, ~57, ~137).

Comment on lines +41 to +45
| **Customer Use Cases** | [x] | 1. Admin drains a node for maintenance; VMs with AWD policy migrate using PostCopy or Paused mode and resume without process loss.<br>2. Admin hotplugs CPU/memory to a running VM; the resulting migration uses AWD mode and the guest reflects the new resources without process loss. | |
| **Testability** | [x] | Testable by configuring a MigrationPolicy with tight `completionTimeoutPerGiB` and capped bandwidth, then verifying migration mode and guest process PID after migration. | |
| **Acceptance Criteria** | [x] | 1. Migration completes in the expected mode (PostCopy or Paused) under AWD policy.<br>2. Guest background process PID is preserved after migration.<br>3. Hotplugged CPU/memory is reflected in the guest OS after AWD migration.<br>4. Both RHEL and Windows guests are supported. | |
| **Non-Functional Requirements (NFRs)** | [x] | See section **II.3 - Test Environment** | |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

HIGH: Section I.1 does not meet required requirement/NFR structure.

  • Line 41 customer use cases are not in user story format (As a [role], I want...).
  • Line 44 NFRs are not explicitly covered (Monitoring, Observability, UI, Documentation, Performance, Security, Scalability) and are deferred to another section.

As per coding guidelines, "Customer use cases must be in user story format" and "NFRs must explicitly address Monitoring, Observability, UI, Documentation, Performance, Security, and Scalability".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/workload-disruption-migration.md` around lines 41 - 45, Convert
the two items under the "Customer Use Cases" section into user-story format
(e.g., "As an admin, I want to drain a node for maintenance so that VMs with AWD
policy migrate using PostCopy or Paused mode and resume without process loss"
and "As an admin, I want to hotplug CPU/memory to a running VM so AWD migration
reflects new resources without process loss"), and update the "Non-Functional
Requirements (NFRs)" section to explicitly list and address Monitoring,
Observability, UI, Documentation, Performance, Security, and Scalability (each
with a short measurable acceptance criterion or owner), instead of deferring
them to section II.3, so reviewers can clearly verify NFR coverage.

Comment on lines +46 to +49
#### **2. Known Limitations**

- s390x does not support maxGuest memory, so memory hotplug is not available on this architecture.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

HIGH: Limitations sections are missing mandatory sign-offs.

Known Limitations and Test Limitations must include explicit sign-off lines (*Sign-off:* [Name/Date]) or the required “None — reviewed and confirmed…” statement.

As per coding guidelines, "STP Section I.2 Known Limitations: Each limitation must have a sign-off..." and "STP Section II.1 Test Limitations: Each limitation must have sign-off...".

Also applies to: 86-88

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/workload-disruption-migration.md` around lines 46 - 49, The
Known Limitations section under the heading "#### **2. Known Limitations**"
currently lists s390x memory hotplug but lacks the mandatory sign-off; add an
explicit sign-off line for that limitation in the form "*Sign-off:* [Name/Date]"
or replace it with the required phrase "None — reviewed and confirmed by
[Name/Date]" if appropriate, and do the same for the Test Limitations block
referenced around lines 86-88 (ensure each listed limitation has its own
sign-off or the explicit "None — reviewed and confirmed…" statement) so the STP
sections I.2 and II.1 comply with the sign-off guideline.

Comment on lines +79 to +85
| Out-of-Scope Item | Rationale | PM/ Lead Agreement |
|:------------------------------|:----------------------------------------------------------------------------------------------------------------|:------------------|
| Windows node drain | Windows AWD node drain is not currently covered; only migration mode and CPU hotplug are tested for Windows. | TODO |
| Windows memory hotplug | Windows memory hotplug with AWD is not currently covered. | TODO |
| s390x memory hotplug | s390x does not support maxGuest memory, so memory hotplug cannot be tested on this architecture. | TODO |
| Performance / scale testing | Not in scope for functional AWD validation. | TODO |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CRITICAL: Out-of-Scope entries are incomplete (TODO) and not in the required agreement format.

This blocks approval because PM/Lead agreement (name/date) is missing for every item.

Suggested format fix
-| Windows node drain            | ... | TODO |
+| Windows node drain | ... | Ronen Sde-Or / 2026-04-16 |

Also align the section to the required itemized format with explicit Rationale and PM/Lead Agreement fields per item.

As per coding guidelines, "STP Section II.1 Out of Scope: Each item must have Rationale and PM/Lead Agreement with name and date...".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/workload-disruption-migration.md` around lines 79 - 85, Replace
the four TODO PM/Lead Agreement entries in the Out-of-Scope table for the rows
containing "Windows node drain", "Windows memory hotplug", "s390x memory
hotplug", and "Performance / scale testing" with a proper "PM/Lead Agreement"
value including the approver's name and date (e.g., "Name / YYYY-MM-DD"), and
convert the table into the required itemized format by ensuring each item has
explicit Rationale and PM/Lead Agreement fields (move the rationale text into
the Rationale field and replace each TODO with the name/date agreement), so the
Out-of-Scope section meets the STP Section II.1 requirements.

Comment on lines +92 to +107
| Item | Description | Applicable (Y/N or N/A) | Comments |
|:-------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------|:--------------------------------------------------------------------------------------------------------------------------|
| Functional Testing | Validates that the feature works according to specified requirements and user stories | Y | |
| Automation Testing | Ensures test cases are automated for continuous integration and regression coverage | Y | |
| Performance Testing | Validates feature performance meets requirements (latency, throughput, resource usage) | N | Not in scope for functional AWD validation |
| Security Testing | Verifies security requirements, RBAC, authentication, authorization, and vulnerability scanning | N/A | MigrationPolicy RBAC is covered by core KubeVirt tests |
| Usability Testing | Validates user experience, UI/UX consistency, and accessibility requirements. Does the feature require UI? If so, ensure the UI aligns with the requirements | N/A | No UI component; migration mode is reported via standard VMI status fields |
| Compatibility Testing | Ensures feature works across supported platforms, versions, and configurations | Y | Parametrized across RHEL and Windows guest OSes |
| Regression Testing | Verifies that new changes do not break existing functionality | Y | Tests validate both migration mode and hotplug functionality |
| Upgrade Testing | Validates upgrade paths from previous versions, data migration, and configuration preservation | N | Not currently in scope |
| Backward Compatibility Testing | Ensures feature maintains compatibility with previous API versions and configurations | N/A | No known API changes affecting backward compatibility |
| Dependencies | Dependent on deliverables from other components/products? Identify what is tested by which team. | Y | Core AWD functionality is implemented in KubeVirt |
| Cross Integrations | Does the feature affect other features/require testing by other components? Identify what is tested by which team. | Y | AWD interacts with hotplug and node drain (sig-virt); tests cover migration triggered by both |
| Monitoring | Does the feature require metrics and/or alerts? | N/A | No specific metrics required for AWD |
| Cloud Testing | Does the feature require multi-cloud platform testing? Consider cloud-specific features. | N/A | Bare metal with RWX storage is required |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

HIGH: Test Strategy is missing required coverage dimensions.

A dedicated Scale Testing entry is missing, and Upgrade testing does not explicitly state that the upgrade path was evaluated (even if deferred).

As per coding guidelines, "STP Section II.2 Test Strategy: All 13 testing types must be present and addressed..." and "Upgrade Testing must confirm upgrade path was evaluated; Performance/Scale must document consideration and plans even if deferred".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/workload-disruption-migration.md` around lines 92 - 107, The
Test Strategy table is missing a "Scale Testing" row and the "Upgrade Testing"
row lacks an explicit statement about upgrade path evaluation; add a new table
row labeled "Scale Testing" with Description covering scale/volume
considerations and a clear Applicable (Y/N or N/A) and Comments noting planned
approach or deferred rationale, and update the existing "Upgrade Testing" row
(the cell under Description or Comments for "Upgrade Testing") to explicitly
state whether the upgrade path was evaluated or deferred and include any
mitigation/plan; locate and edit the rows named "Scale Testing" and "Upgrade
Testing" in the existing markdown table (the entries currently titled "Upgrade
Testing" and the nearby "Performance Testing") to make these
additions/clarifications.

Comment on lines +141 to +149
| Risk Category | Specific Risk for This Feature | Mitigation Strategy | Status |
|:---------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------|:-------|
| Timeline/Schedule | N/A | - | [x] |
| Test Coverage | Paused mode may not reliably trigger if precopy converges too quickly on fast storage | Added 1GiB guest disk allocation for RHEL VMs and tuned completionTimeoutPerGiB to ensure precopy does not converge in phase 1 | [x] |
| Test Environment | Windows tests require special_infra and high_resource_vm nodes which may not be available in all CI environments | Tests are marked with `special_infra` and `high_resource_vm` markers to skip when infrastructure is unavailable | [x] |
| Untestable Aspects | Exact timing of PostCopy/Paused mode transition depends on cluster load, storage speed, and network conditions | Tests verify the final migration mode reported in VMI status rather than transition timing | [x] |
| Resource Constraints | N/A | - | [x] |
| Dependencies | AWD behavior depends on KubeVirt migration engine implementation; changes in convergence logic could affect mode selection | Monitor KubeVirt upstream changes to migration convergence and completionTimeout behavior | [x] |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

HIGH: Risks section is incomplete against required structure.

  • Missing the Other risk category (required 7th category).
  • Entries do not include explicit sign-off details.

As per coding guidelines, "STP Section II.5 Risks: ALL 7 risk categories must be addressed..." and "Each risk must have Risk description, Mitigation strategy, and Sign-off...".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/workload-disruption-migration.md` around lines 141 - 149, The
Risks table in workload-disruption-migration.md is missing the required "Other"
risk category and lacks explicit sign-off details for each risk; update the
table to include a seventh row labeled "Other" with a clear risk description and
mitigation strategy, and add a "Sign-off" column (or append sign-off cells) for
every risk row containing the approver's name/role and date (or a placeholder
like "TBD by [ROLE] - YYYY-MM-DD") so each risk entry includes Risk description,
Mitigation strategy, and Sign-off; ensure you edit the existing table header and
rows (the risk table block around the Risk Category/Test Coverage/Test
Environment entries) to add these fields consistently.

Comment thread stps/sig-virt/workload-disruption-migration.md Outdated
@openshift-virtualization-qe-bot
Copy link
Copy Markdown

jerry7z can not be added as reviewer. Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.: 422 {"message": "Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.", "documentation_url": "https://docs.github.com/rest/pulls/review-requests#request-reviewers-for-a-pull-request", "status": "422"}

@openshift-virtualization-qe-bot
Copy link
Copy Markdown

SiboWang1997 can not be added as reviewer. Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.: 422 {"message": "Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.", "documentation_url": "https://docs.github.com/rest/pulls/review-requests#request-reviewers-for-a-pull-request", "status": "422"}

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (7)
stps/sig-virt/workload-disruption-migration.md (7)

175-175: ⚠️ Potential issue | 🟡 Minor

MEDIUM: PM approver entry needs GitHub handle.

Line 175 includes email only; Section IV requires reviewer/approver identity with GitHub handle.

As per coding guidelines, "STP Sign-off and Approval (Section IV) must list Reviewers with names and GitHub handles and Approvers (QE Lead, PM, Dev Lead at minimum) with no placeholder text remaining".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/workload-disruption-migration.md` at line 175, The PM approver
entry currently shows only an email ("Product Manager/Owner: [Ronen
Sde-Or](rsdeor@redhat.com)"); update the Section IV approvers/reviewers list to
include the PM's GitHub handle alongside their name (for example replace the
email-only link with a GitHub-style reference such as "Product Manager/Owner:
Ronen Sde‑Or (`@rsdeor`)" or a Markdown link to their GitHub profile), ensuring
the Approvers block (QE Lead, PM, Dev Lead) contains real names and GitHub
handles with no placeholder text remaining.

18-20: ⚠️ Potential issue | 🔴 Critical

CRITICAL: Rewrite internal/API-centric wording into user-observable behavior only.

The STP still relies on API fields/CRD terms and internal mechanics (e.g., MigrationPolicy, field names, VMI status internals) across core requirement, strategy, and scenario text. This blocks approval for this repo’s STP standard.

As per coding guidelines, "STPs must describe what users experience from the user perspective only — no API field names, CRD names, internal component references, or implementation mechanisms".

Also applies to: 24-30, 39-39, 42-43, 57-57, 66-66, 70-76, 88-88, 98-98, 137-137, 145-148

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/workload-disruption-migration.md` around lines 18 - 20, Rewrite
the affected sections to describe only user-observable behavior and outcomes,
removing all API/CRD/internal names and implementation mechanics: replace
references like "AWD", "MigrationPolicy", "PostCopy", "Paused", and any mentions
of VMI status internals with plain-language descriptions of what the user will
see (e.g., "the VM may be briefly paused to finish transfer", "remaining memory
is fetched on-demand after start on target", "a migration mode that resumes
immediately once final data arrives"), and update the core requirement,
strategy, and scenario paragraphs accordingly so they describe behaviors,
triggers, and user-facing states rather than field names or internal components.

48-48: ⚠️ Potential issue | 🟠 Major

HIGH: Known/Test limitations are missing mandatory sign-off lines.

Each limitation item must include explicit sign-off metadata (*Sign-off:* [Name/Date]) or the required “None — reviewed and confirmed…” statement when applicable.

As per coding guidelines, "STP Known Limitations (Section I.2) must include sign-off with *Sign-off:* [Name/Date] for each limitation" and "STP Test Limitations (Section II.1) must each have sign-off: *Sign-off:* [Name/Date]."

Also applies to: 88-88

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/workload-disruption-migration.md` at line 48, The Known
Limitations and Test Limitations entries are missing the mandatory sign-off
metadata; update each limitation item in "Known Limitations (Section I.2)" and
"Test Limitations (Section II.1)" — including the s390x memory hotplug note (the
line stating "s390x does not support maxGuest memory...") — to append a sign-off
line in the exact format "*Sign-off:* [Name/Date]" or, if there truly are none,
replace the absence with the required statement "None — reviewed and confirmed
[Name/Date]"; ensure every listed limitation has one of these sign-offs so the
document meets the STP sign-off requirements.

92-107: ⚠️ Potential issue | 🟠 Major

HIGH: Test Strategy is still incomplete (missing Scale Testing).

The table does not include a dedicated Scale Testing entry, which is mandatory even if deferred/N/A with justification.

As per coding guidelines, "STP Test Strategy (Section II.2) must address all testing types (Functional, Automation, Regression, Performance, Scale, Security, Usability, Monitoring, Compatibility, Upgrade, Dependencies, Cross Integrations, Cloud Testing)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/workload-disruption-migration.md` around lines 92 - 107, Add a
missing "Scale Testing" row to the test matrix table: insert a row labeled
"Scale Testing" under the other test types with an Applicable (Y/N or N/A) value
and a concise comment that either describes the planned scale tests (e.g.,
node/pod/migration concurrency, VMI counts, storage IO at scale) or justifies
why scale testing is deferred/N/A for this STP; ensure the new row follows the
existing table formatting and references "Scale Testing" so the STP Test
Strategy (Section II.2) explicitly addresses this required testing type.

79-85: ⚠️ Potential issue | 🔴 Critical

CRITICAL: Out-of-Scope section still contains TODO approvals and non-compliant structure.

PM/ Lead Agreement cannot remain TODO, and each out-of-scope item must be in the required itemized format with rationale and named/date agreement.

As per coding guidelines, "STP Out of Scope items (Section II.1) must have Rationale and PM/Lead Agreement with name and date (format: - **Item** / *Rationale:* / *PM/Lead Agreement:*)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/workload-disruption-migration.md` around lines 79 - 85, Replace
the table under "Out-of-Scope Item" in workload-disruption-migration.md (Section
II.1) so that no PM/ Lead Agreement cells contain "TODO": convert each row into
the required itemized format "- **Item** / *Rationale:* / *PM/Lead Agreement:*"
and fill the PM/Lead Agreement with an actual name and date (e.g., "Jane Doe,
2026-04-19") for each of the four items (Windows node drain, Windows memory
hotplug, s390x memory hotplug, Performance / scale testing); ensure the
rationale text from the table is preserved and that the header/table is removed
or replaced so the section conforms exactly to the prescribed itemized
structure.

141-149: ⚠️ Potential issue | 🟠 Major

HIGH: Risks section missing required category and sign-off data.

The table omits the Other risk category and does not include explicit per-risk sign-off, which are both required.

As per coding guidelines, "STP Risks (Section II.5) must address ALL 7 risk categories (Timeline/Schedule, Test Coverage, Test Environment, Untestable Aspects, Resource Constraints, Dependencies, Other), even if N/A with justification" and "STP Risks must each have: Risk description, Mitigation strategy, and Sign-off."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/workload-disruption-migration.md` around lines 141 - 149, The
Risks table in STP Risks (Section II.5) is missing the required "Other" risk
category and per-risk sign-off entries; update the markdown table to include a
row for "Other" (even if N/A with justification) and add a Sign-off column entry
for every existing row (Timeline/Schedule, Test Coverage, Test Environment,
Untestable Aspects, Resource Constraints, Dependencies, and Other) so each risk
has Risk description, Mitigation strategy, and Sign-off; ensure the new column
values clearly indicate who signs off (or "N/A — justification") and use the
same table format/pipe alignment as the existing rows.

41-44: ⚠️ Potential issue | 🟠 Major

HIGH: Section I.1 still misses required use-case and NFR structure.

Line 41 is not in user-story format (As a [role], I want...), and Line 44 does not explicitly and verifiably address all required NFR categories (Monitoring, Observability, UI, Documentation, Performance, Security, Scalability).

As per coding guidelines, "STP Customer use cases must be in user story format ("As a [role], I want to [action]")" and "STP NFRs must explicitly address: Monitoring, Observability, UI, Documentation, Performance, Security, Scalability. NFRs not covered must have justification".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stps/sig-virt/workload-disruption-migration.md` around lines 41 - 44, Convert
the "Customer Use Cases" entry into one or more user stories using the required
format (e.g., "As an admin, I want to drain a node so that VMs with AWD policy
migrate using PostCopy or Paused mode and resume without process loss"; "As an
admin, I want to hotplug CPU/memory so that AWD migration preserves processes
and reflects new resources in the guest"), and update the "Non-Functional
Requirements (NFRs)" row to explicitly list each NFR category (Monitoring,
Observability, UI, Documentation, Performance, Security, Scalability) with
either a short requirement or a justification why it is not applicable (for
example: Monitoring: none required / justification; Observability: covered by X
tests; UI: none; Documentation: covered in KubeVirt user-guide; Performance:
completionTimeoutPerGiB and bandwidth caps defined for testability; Security:
RBAC covered by core KubeVirt tests; Scalability: state impact limited to
node-level), referencing the table headings "Customer Use Cases" and
"Non-Functional Requirements (NFRs)" and terms like MigrationPolicy and
completionTimeoutPerGiB to keep context.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@stps/sig-virt/workload-disruption-migration.md`:
- Around line 110-122: The test-environment table contains empty cells; update
the table rows (notably the "Required Operators", "Platform", and "Special
Configurations" columns and any other blank example cells) so every cell
contains either an explicit value or "N/A", and make the "OCP & OpenShift
Virtualization Version(s)" entry explicitly state both the supported OCP version
and the OpenShift Virtualization operator version (e.g., "OCP 4.17+, OVS X.Y.Z"
or "OCP 4.17+, OpenShift Virtualization X.Y.Z"); ensure the "Specification
Examples" column entries are completed for the affected rows (Cluster Topology,
Required Operators, Platform, Special Configurations) so there are no empty
cells remaining.

---

Duplicate comments:
In `@stps/sig-virt/workload-disruption-migration.md`:
- Line 175: The PM approver entry currently shows only an email ("Product
Manager/Owner: [Ronen Sde-Or](rsdeor@redhat.com)"); update the Section IV
approvers/reviewers list to include the PM's GitHub handle alongside their name
(for example replace the email-only link with a GitHub-style reference such as
"Product Manager/Owner: Ronen Sde‑Or (`@rsdeor`)" or a Markdown link to their
GitHub profile), ensuring the Approvers block (QE Lead, PM, Dev Lead) contains
real names and GitHub handles with no placeholder text remaining.
- Around line 18-20: Rewrite the affected sections to describe only
user-observable behavior and outcomes, removing all API/CRD/internal names and
implementation mechanics: replace references like "AWD", "MigrationPolicy",
"PostCopy", "Paused", and any mentions of VMI status internals with
plain-language descriptions of what the user will see (e.g., "the VM may be
briefly paused to finish transfer", "remaining memory is fetched on-demand after
start on target", "a migration mode that resumes immediately once final data
arrives"), and update the core requirement, strategy, and scenario paragraphs
accordingly so they describe behaviors, triggers, and user-facing states rather
than field names or internal components.
- Line 48: The Known Limitations and Test Limitations entries are missing the
mandatory sign-off metadata; update each limitation item in "Known Limitations
(Section I.2)" and "Test Limitations (Section II.1)" — including the s390x
memory hotplug note (the line stating "s390x does not support maxGuest
memory...") — to append a sign-off line in the exact format "*Sign-off:*
[Name/Date]" or, if there truly are none, replace the absence with the required
statement "None — reviewed and confirmed [Name/Date]"; ensure every listed
limitation has one of these sign-offs so the document meets the STP sign-off
requirements.
- Around line 92-107: Add a missing "Scale Testing" row to the test matrix
table: insert a row labeled "Scale Testing" under the other test types with an
Applicable (Y/N or N/A) value and a concise comment that either describes the
planned scale tests (e.g., node/pod/migration concurrency, VMI counts, storage
IO at scale) or justifies why scale testing is deferred/N/A for this STP; ensure
the new row follows the existing table formatting and references "Scale Testing"
so the STP Test Strategy (Section II.2) explicitly addresses this required
testing type.
- Around line 79-85: Replace the table under "Out-of-Scope Item" in
workload-disruption-migration.md (Section II.1) so that no PM/ Lead Agreement
cells contain "TODO": convert each row into the required itemized format "-
**Item** / *Rationale:* / *PM/Lead Agreement:*" and fill the PM/Lead Agreement
with an actual name and date (e.g., "Jane Doe, 2026-04-19") for each of the four
items (Windows node drain, Windows memory hotplug, s390x memory hotplug,
Performance / scale testing); ensure the rationale text from the table is
preserved and that the header/table is removed or replaced so the section
conforms exactly to the prescribed itemized structure.
- Around line 141-149: The Risks table in STP Risks (Section II.5) is missing
the required "Other" risk category and per-risk sign-off entries; update the
markdown table to include a row for "Other" (even if N/A with justification) and
add a Sign-off column entry for every existing row (Timeline/Schedule, Test
Coverage, Test Environment, Untestable Aspects, Resource Constraints,
Dependencies, and Other) so each risk has Risk description, Mitigation strategy,
and Sign-off; ensure the new column values clearly indicate who signs off (or
"N/A — justification") and use the same table format/pipe alignment as the
existing rows.
- Around line 41-44: Convert the "Customer Use Cases" entry into one or more
user stories using the required format (e.g., "As an admin, I want to drain a
node so that VMs with AWD policy migrate using PostCopy or Paused mode and
resume without process loss"; "As an admin, I want to hotplug CPU/memory so that
AWD migration preserves processes and reflects new resources in the guest"), and
update the "Non-Functional Requirements (NFRs)" row to explicitly list each NFR
category (Monitoring, Observability, UI, Documentation, Performance, Security,
Scalability) with either a short requirement or a justification why it is not
applicable (for example: Monitoring: none required / justification;
Observability: covered by X tests; UI: none; Documentation: covered in KubeVirt
user-guide; Performance: completionTimeoutPerGiB and bandwidth caps defined for
testability; Security: RBAC covered by core KubeVirt tests; Scalability: state
impact limited to node-level), referencing the table headings "Customer Use
Cases" and "Non-Functional Requirements (NFRs)" and terms like MigrationPolicy
and completionTimeoutPerGiB to keep context.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ef2e93af-35f0-4f4f-9f48-ea1140c1d9d7

📥 Commits

Reviewing files that changed from the base of the PR and between 7cb614b and df64602.

📒 Files selected for processing (1)
  • stps/sig-virt/workload-disruption-migration.md

Comment thread stps/sig-virt/workload-disruption-migration.md
@openshift-virtualization-qe-bot-5
Copy link
Copy Markdown

jerry7z can not be added as reviewer. Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.: 422 {"message": "Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.", "documentation_url": "https://docs.github.com/rest/pulls/review-requests#request-reviewers-for-a-pull-request", "status": "422"}

@openshift-virtualization-qe-bot-5
Copy link
Copy Markdown

SiboWang1997 can not be added as reviewer. Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.: 422 {"message": "Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.", "documentation_url": "https://docs.github.com/rest/pulls/review-requests#request-reviewers-for-a-pull-request", "status": "422"}

Add Software Test Plan for the AllowWorkloadDisruption (AWD) migration
feature, covering PostCopy and Paused migration modes across RHEL and
Windows guests with migration, node drain, and CPU/memory hotplug
triggers.
@SamAlber SamAlber force-pushed the stp/sig-virt/workload-disruption-migration branch from 7f3a3f4 to 8af640b Compare April 19, 2026 14:58
@openshift-virtualization-qe-bot-2
Copy link
Copy Markdown

jerry7z can not be added as reviewer. Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.: 422 {"message": "Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.", "documentation_url": "https://docs.github.com/rest/pulls/review-requests#request-reviewers-for-a-pull-request", "status": "422"}

@openshift-virtualization-qe-bot-2
Copy link
Copy Markdown

SiboWang1997 can not be added as reviewer. Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.: 422 {"message": "Reviews may only be requested from collaborators. One or more of the users or teams you specified is not a collaborator of the RedHatQE/openshift-virtualization-tests-design-docs repository.", "documentation_url": "https://docs.github.com/rest/pulls/review-requests#request-reviewers-for-a-pull-request", "status": "422"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants