Fix QTCP multi-flow correctness bugs#378
Conversation
- Add drain loops to EndNodeController, NetworkNodeController, and LinkController so all pending messages are processed before sleeping - Tighten LinkLevelReplyAtSource/LinkLevelReplyAtHop lookups to match by flow_uuid and seq_num, preventing cross-flow reply corruption - Fix QTCPPairEnd to report actual flow_src from the QDatagram instead of a hardcoded 0 - Fix LinkController slot mapping so each node receives the entangler slot allocated on its own side of the link Add regression tests for concurrent multi-flow scenarios on both grid and shared-chain topologies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes several multi-flow correctness issues in the QTCP protocol implementation that could corrupt reply matching or strand pending messages under concurrent/bursty traffic, and adds regression tests to validate concurrent-flow behavior.
Changes:
- Drain pending messages in
EndNodeController,NetworkNodeController, andLinkControllerbefore sleeping to prevent stalls/stranded work. - Match link-level replies by
(flow_uuid, seq_num)to prevent cross-flow reply corruption and propagate correctflow_srcmetadata intoQTCPPairEnd. - Fix
LinkControllerentangler slot mapping so each node receives the slot allocated on its own side of the link. - Add regression tests for concurrent flows on a grid and on a shared repeater chain.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/ProtocolZoo/qtcp.jl |
Implements drain loops, correct reply matching keys, correct QTCPPairEnd.flow_src, and correct per-side slot mapping in LinkController. |
test/general/protocolzoo_qtcp_tests.jl |
Adds helper utilities and two new concurrent-flow regression testsets. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if !isnothing(llrequest) | ||
| workwasdone = true | ||
| @assert originator_node != destination_node "LinkController $(nodeA) $(nodeB) has a link request with originator node $(originator_node) equal to the destination node $(destination_node)" | ||
| _, flow_uuid, seq_num, remote_node = llrequest.tag |
There was a problem hiding this comment.
remote_node is unpacked from llrequest.tag but never used. Consider replacing it with _ (or using it for an assertion/log) to avoid implying it affects entanglement creation or slot mapping.
| _, flow_uuid, seq_num, remote_node = llrequest.tag | |
| _, flow_uuid, seq_num, _ = llrequest.tag |
|
|
||
| # Check if there are any LinkLevelReplyAtHop messages and turn them into QTCPPairEnd messages | ||
| link_reply = querydelete!(mb, LinkLevelReplyAtHop, flow_uuid, seq_num, ❓) | ||
| @assert !isnothing(link_reply) "No LinkLevelReplyAtHop message found after the success of flow $(flow_uuid), sequence $(seq_num) at node $(node)" |
There was a problem hiding this comment.
The assertion message says "after the success of flow ..." but this branch is handling a received QDatagram at the destination (before the success/ack is sent). Updating the wording would make debugging easier if this assert ever triggers.
| @assert !isnothing(link_reply) "No LinkLevelReplyAtHop message found after the success of flow $(flow_uuid), sequence $(seq_num) at node $(node)" | |
| @assert !isnothing(link_reply) "No LinkLevelReplyAtHop message found for received QDatagram of flow $(flow_uuid), sequence $(seq_num) at node $(node)" |
Extract per-request entanglement logic into _link_handle_request, a standalone @Resumable helper spawned with @process. The main LinkController loop no longer blocks on @yield while waiting for EntanglerProt, so it can immediately process additional LinkLevelRequests arriving on the same link. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
29edb4b to
be069b7
Compare
Benchmark Results (Julia v1)Time benchmarks
Memory benchmarks
|
Summary
Fixes several correctness bugs in the QTCP protocol that surface when running concurrent flows:
EndNodeController,NetworkNodeController, andLinkControllernow process all pending messages before sleeping, preventing stranded messages under bursty trafficLinkLevelReplyAtSource/LinkLevelReplyAtHoplookups now match byflow_uuidandseq_num, fixing cross-flow reply corruption with concurrent flowsflow_srcfrom the QDatagram instead of hardcoded0ArgumentError: indices not uniquewith concurrent cross-topology flowsLinkLevelRequests while waiting for entanglement generationCHANGELOG notes
EndNodeController,NetworkNodeController,LinkController) now drain all pending messages before sleeping, preventing protocol stalls under bursty traffic.LinkLevelReplyAtSource,LinkLevelReplyAtHop) now match byflow_uuidandseq_num, fixing cross-flow corruption with concurrent flows.QTCPPairEndnow reports the correctflow_srcfrom the receivedQDatagram.LinkControllernow maps entangler slots to the correct originator/destination nodes, fixingArgumentError: indices not uniquewith concurrent cross-topology flows.LinkControllerno longer blocks while waiting for entanglement generation, allowing concurrent link-level requests on the same link.Checklist
Test plan
🤖 Generated with Claude Code