fix(chain): prefer block producer-matching parent in skip approval resolution#15597
fix(chain): prefer block producer-matching parent in skip approval resolution#15597ssavenko-near wants to merge 6 commits intomasterfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #15597 +/- ##
==========================================
+ Coverage 69.53% 69.54% +0.01%
==========================================
Files 940 940
Lines 214389 214412 +23
Branches 214389 214412 +23
==========================================
+ Hits 149065 149121 +56
+ Misses 59410 59382 -28
+ Partials 5914 5909 -5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
cdb27f0 to
c85403f
Compare
There was a problem hiding this comment.
Pull request overview
This PR fixes a subtle failure mode in Client::collect_block_approval when resolving the parent hash for Skip(parent_height) approvals: instead of picking an arbitrary block at parent_height, it searches the known candidates and prefers the one whose epoch makes the current node the block producer for target_height, avoiding silently dropping valid approvals around epoch-boundary forks. It also adds an integration test covering handling of unresolvable epoch data among candidates.
Changes:
- Update skip-approval parent resolution to prefer the candidate whose derived epoch assigns this node as block producer at
target_height. - Add an integration test that simulates multiple
BlockPerHeightcandidates including an unresolvable hash and verifies the approval still reaches Doomslug when a valid candidate exists.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
chain/client/src/client.rs |
Adjusts Skip(parent_height) parent hash selection to try all candidates and prefer one matching the node’s producer assignment for target_height. |
integration-tests/src/tests/client/doomslug.rs |
Adds a regression test for skip approvals when some parent-height candidates have unresolvable epochs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
c85403f to
4175ec7
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
4175ec7 to
2b856ec
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
6d7c0a4 to
8455fb9
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
5350e40 to
c081627
Compare
| tested += 1; | ||
| // We repeat across 4 epoch boundaries. Each trial independently | ||
| // has ~50% chance of the right parent being first in HashMap iteration | ||
| // order, so 4 trials give ~6% false-pass probability. | ||
| if tested >= 4 { | ||
| return; | ||
| } | ||
| } |
There was a problem hiding this comment.
Optional: I am not against to simplify the test here and not do retries. In that case, a regression could cause a flaky test rather than failing, but we could have a simpler test.
It also seems fine to keep it.
There was a problem hiding this comment.
I prefer to have it less flaky. False negatives are especially annoying cause some code breaking the test may sneak in and pushing after it would become difficult.
…solution When resolving the parent block for a Skip(parent_height) approval, collect_block_approval previously picked an arbitrary block hash from the set of known blocks at that height. Near epoch boundaries, multiple blocks at the same height can belong to different epochs, each mapping to a different block producer for the target height. Picking the wrong one causes the receiver to incorrectly conclude it is not the intended producer and silently drop a valid approval. Instead, iterate all known blocks at parent_height and prefer one where the current node is the block producer for target_height. Fall back to the first hash if none match, preserving existing behavior for non-producers.
c081627 to
670cb1e
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Collects block approvals. | ||
| /// | ||
| /// We send the approval to doomslug given the epoch of the current tip iff: | ||
| /// 1. We are the block producer for the target height in the tip's epoch; | ||
| /// 2. The signature matches that of the account; | ||
| /// If we are not the block producer, but we also don't know the previous block, we add the | ||
| /// approval to `pending_approvals`, since it could be that the approval is from the next epoch. | ||
| /// | ||
| /// # Arguments | ||
| /// * `approval` - the approval to be collected | ||
| /// * `approval_type` - whether the approval was just produced by us (in which case skip validation, | ||
| /// only check whether we are the next block producer and store in Doomslug) | ||
| /// Resolve the parent block hash for a skip approval. | ||
| /// | ||
| /// Multiple blocks can exist at `parent_height` when there are forks | ||
| /// (e.g. around an epoch boundary). Each fork block may belong to a | ||
| /// different epoch with different block-producer assignments for | ||
| /// `target_height`. We prefer the parent whose epoch makes us the block | ||
| /// producer so that the approval is not silently dropped later. | ||
| /// | ||
| /// Returns `None` when no blocks exist at `parent_height`. | ||
| fn resolve_skip_parent( |
There was a problem hiding this comment.
The doc-comment block starting with /// Collects block approvals. is now immediately followed by the new resolve_skip_parent helper, so Rustdoc will attach all of those /// lines to resolve_skip_parent and collect_block_approval ends up undocumented. Suggest splitting the docs: keep the collect_block_approval docs directly above pub fn collect_block_approval(...), and leave only the skip-parent docs above resolve_skip_parent (or convert one of the blocks to // comments).
There was a problem hiding this comment.
@ssavenko-near I guess the copilot is right about the doc comments being merged together.
| let hashes = self.chain.chain_store().get_all_block_hashes_by_height(parent_height); | ||
| let mut first = None; | ||
| let chosen = hashes.values().flatten().find(|hash| { | ||
| if first.is_none() { | ||
| first = Some(**hash); | ||
| } | ||
| my_account_id.is_some_and(|id| { | ||
| self.epoch_manager | ||
| .get_epoch_id_from_prev_block(hash) | ||
| .and_then(|epoch_id| { | ||
| self.epoch_manager.get_block_producer(&epoch_id, target_height) | ||
| }) | ||
| .ok() | ||
| .as_ref() | ||
| == Some(id) | ||
| }) | ||
| }); | ||
| chosen.copied().or(first) | ||
| } |
There was a problem hiding this comment.
resolve_skip_parent always iterates over all hashes when my_account_id is None because the .find(..) predicate can never succeed. This is a small but unnecessary performance regression vs the previous next() behavior. Consider an early return when my_account_id.is_none() (or restructure to compute first via iter.next() and only do the producer-matching scan when Some).
| #[test] | ||
| fn test_skip_approval_prefers_producer_matching_parent() { | ||
| init_test_logger(); |
There was a problem hiding this comment.
PR description says test_skip_approval_prefers_producer_matching_parent was disabled on SPICE, but the test currently has no #[cfg_attr(feature = "protocol_feature_spice", ignore)] (or equivalent guard). This will cause the test to still run under --features protocol_feature_spice unlike other SPICE-incompatible tests in this crate. Please add the ignore/guard or update the PR description if it’s no longer needed.
| /// Collects block approvals. | ||
| /// | ||
| /// We send the approval to doomslug given the epoch of the current tip iff: | ||
| /// 1. We are the block producer for the target height in the tip's epoch; | ||
| /// 2. The signature matches that of the account; | ||
| /// If we are not the block producer, but we also don't know the previous block, we add the | ||
| /// approval to `pending_approvals`, since it could be that the approval is from the next epoch. | ||
| /// | ||
| /// # Arguments | ||
| /// * `approval` - the approval to be collected | ||
| /// * `approval_type` - whether the approval was just produced by us (in which case skip validation, | ||
| /// only check whether we are the next block producer and store in Doomslug) | ||
| /// Resolve the parent block hash for a skip approval. | ||
| /// | ||
| /// Multiple blocks can exist at `parent_height` when there are forks | ||
| /// (e.g. around an epoch boundary). Each fork block may belong to a | ||
| /// different epoch with different block-producer assignments for | ||
| /// `target_height`. We prefer the parent whose epoch makes us the block | ||
| /// producer so that the approval is not silently dropped later. | ||
| /// | ||
| /// Returns `None` when no blocks exist at `parent_height`. | ||
| fn resolve_skip_parent( |
There was a problem hiding this comment.
@ssavenko-near I guess the copilot is right about the doc comments being merged together.
When resolving the parent block for a Skip(parent_height) approval, collect_block_approval previously picked an arbitrary block hash from the set of known blocks at that height. Near epoch boundaries, multiple blocks at the same height can belong to different epochs, each mapping to a different block producer for the target height. Picking the wrong one causes the receiver to incorrectly conclude it is not the intended producer and silently drop a valid approval.
Instead, iterate all known blocks at parent_height and prefer one where the current node is the block producer for target_height. Fall back to the first hash if none match, preserving existing behavior for non-producers.
Had to disable the
test_skip_approval_prefers_producer_matching_parenttest on SPICE.