Skip to content

Fix a flaky issue with task percentage not stored in DB#421

Merged
petrutlucian94 merged 2 commits into
cloudbase:masterfrom
fabi200123:fix-update-task-percentage
May 13, 2026
Merged

Fix a flaky issue with task percentage not stored in DB#421
petrutlucian94 merged 2 commits into
cloudbase:masterfrom
fabi200123:fix-update-task-percentage

Conversation

@fabi200123
Copy link
Copy Markdown
Contributor

It seems like there is a flaky issue with the percentage for Replicate Disks.
The final steps for replication are not stored in the DB and the UI doesn't display the 100% for disk replication. The worker reaches the 100%, but the 100% is never stored in the DB.

This issue seems to come from the self._cast from update_task_progress_update.

NOTE: Changing it to _call seems to fix the issue, but the whole issue is hard to replicate.

Comment thread coriolis/conductor/rpc/client.py Outdated
@fabi200123 fabi200123 force-pushed the fix-update-task-percentage branch from dfdbe84 to 665c0a9 Compare May 13, 2026 10:00
Comment thread coriolis/db/api.py
# The _call on the last task update might get overwritten by an
# out-of-order update with an older current_step value, so we need to
# ensure that the progress is not overwritten by an older step.
if new_current_step < task_progress_update.current_step:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better than what we have now but it's still prone to race conditions. We rely on task_progress_update.current_step, but it may be incremented by a concurrent DB operation and we'd end up overwriting it, potentially decreasing current_step.

The safest approach would be to use a "compare-and-update" sql operation.

I'm ok with merging this PR as it is and add a TODO to use a db-level check.

Copy link
Copy Markdown
Member

@petrutlucian94 petrutlucian94 May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested comment:

# Quick progress updates may be processed out of order. We're trying to mitigate
# this by checking the current step.
#
# TODO: the current approach is still prone to race conditions since
# `task_progress_update.current_step` may be out of date. The safest approach
# would be to use a db-level `compare-and-update` sql operation.

@fabi200123 fabi200123 force-pushed the fix-update-task-percentage branch from e15f2af to 346f83f Compare May 13, 2026 12:50
@fabi200123 fabi200123 force-pushed the fix-update-task-percentage branch from 346f83f to 0150826 Compare May 13, 2026 12:57
@petrutlucian94 petrutlucian94 merged commit 01ad969 into cloudbase:master May 13, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants