Summary
feature_llmq_simplepose.py is intermittently timing out in DKG/quorum waits under contention. This showed up in dashpay/dash#7232, but it reproduces on plain develop, so it is not caused by that PR.
CI evidence
From dashpay/dash#7232 (ci: migrate primary CI jobs to ARM runners):
Failure in CI:
- test:
feature_llmq_simplepose.py
- path:
run_test() -> test_banning(self.force_old_mn_proto, 3)
- timeout:
wait_for_quorum_list()
- symptom: after mining the final commitment, the expected quorum hash never appeared in
quorum list within 60 seconds
Relevant excerpt:
AssertionError: Predicate ''''
def wait_func():
return quorum_hash in self.nodes[0].quorum('list')[llmq_type_name]
''' not true after 60.0 seconds
Local reproduction on develop
Checked out upstream/develop and ran:
python3 test/functional/feature_llmq_simplepose.py
This also failed locally on develop (outside the PR branch), with a different timeout in the same test:
- path:
run_test() -> test_banning(self.close_mn_port) -> mine_quorum_less_checks()
- timeout:
wait_for_quorum_commitment()
- symptom:
minableCommitments never became visible on all nodes within 15 seconds
Relevant excerpt:
AssertionError: Predicate ''''
def check_dkg_comitments():
...
''' not true after 15 seconds
Additional context
There is already related work in progress in PR #7254 (test: fix flaky PoSe ban assertion in feature_llmq_simplepose), but this current failure appears to be a broader timing-sensitive/simplepose flake: different wait points in the same test can fail depending on contention.
We also previously observed feature_llmq_simplepose failing under broad parallel contention while passing consistently when hammered in isolation, which points to resource-sensitive / inter-test timing behavior rather than a regression in the PR above.
Conclusion
This is a pre-existing flaky test on develop, not a regression introduced by dashpay/dash#7232.
Summary
feature_llmq_simplepose.pyis intermittently timing out in DKG/quorum waits under contention. This showed up indashpay/dash#7232, but it reproduces on plaindevelop, so it is not caused by that PR.CI evidence
From
dashpay/dash#7232(ci: migrate primary CI jobs to ARM runners):linux64_tsan-test / Test sourceFailure in CI:
feature_llmq_simplepose.pyrun_test()->test_banning(self.force_old_mn_proto, 3)wait_for_quorum_list()quorum listwithin 60 secondsRelevant excerpt:
Local reproduction on develop
Checked out
upstream/developand ran:This also failed locally on
develop(outside the PR branch), with a different timeout in the same test:run_test()->test_banning(self.close_mn_port)->mine_quorum_less_checks()wait_for_quorum_commitment()minableCommitmentsnever became visible on all nodes within 15 secondsRelevant excerpt:
Additional context
There is already related work in progress in PR #7254 (
test: fix flaky PoSe ban assertion in feature_llmq_simplepose), but this current failure appears to be a broader timing-sensitive/simplepose flake: different wait points in the same test can fail depending on contention.We also previously observed
feature_llmq_simpleposefailing under broad parallel contention while passing consistently when hammered in isolation, which points to resource-sensitive / inter-test timing behavior rather than a regression in the PR above.Conclusion
This is a pre-existing flaky test on
develop, not a regression introduced bydashpay/dash#7232.