Skip to content

fixes for engineAPI high latency#4166

Draft
advaita-saha wants to merge 3 commits intomasterfrom
engine-api-latency
Draft

fixes for engineAPI high latency#4166
advaita-saha wants to merge 3 commits intomasterfrom
engine-api-latency

Conversation

@advaita-saha
Copy link
Copy Markdown
Contributor

@advaita-saha advaita-saha commented Apr 22, 2026

An attempt to improve the engineAPI performance. The following changes have been done

Changes

  1. skip redundant RLP decode in validateVersionedHashed
  2. conditional yield in processQueue
  3. reuse tx hashes in txRecords instead of re-RLP-hashing
    After a block is processed, forked_chain loops over all transactions and calls computeRlpHash(tx) for every one — i.e. re-RLP-encodes and hashes each tx. These hashes were already computed during RLP decode in ethBlock(payload, .... For a 150-tx block that's ~150 × ~20µs ≈ 3ms of wasted work per payload, on the critical path.
  4. non-blocking engine_api enqueue
    When the FC action queue is at MaxQueueSize=128, queueImportBlock and queueForkChoice call await c.queue.addLast(item) which suspends the caller until space frees up. Under burst load the CL sees unbounded tail latency rather than a clean SYNCING response, and the CL will eventually time out.

2. conditional yield in processQueue
…ock is processed, forked_chain loops over all transactions and calls computeRlpHash(tx) for every one — i.e. re-RLP-encodes and hashes each tx. These hashes were already computed during RLP decode in ethBlock(payload, .... For a 150-tx block that's ~150 × ~20µs ≈ 3ms of wasted work per payload, on the critical path.

2. non-blocking engine_api enqueue. When the FC action queue is at MaxQueueSize=128, queueImportBlock and queueForkChoice call await c.queue.addLast(item) which suspends the caller until space frees up. Under burst load the CL sees unbounded tail latency rather than a clean SYNCING response, and the CL will eventually time out.
@advaita-saha
Copy link
Copy Markdown
Contributor Author

Unsure about the 4th change, not sure if it's the best idea. Or we should just let the engineAPI timeout instead for this case.
For example the CL approaches the tip of the chain, it sends 30-40blocks almost together, and this is the time when this happens

@advaita-saha advaita-saha requested a review from tersec April 22, 2026 21:22
blockAccessList: Opt[BlockAccessListRef],
finalized: bool
finalized: bool,
txHashes: Opt[seq[Hash32]] = Opt.none(seq[Hash32])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need to list the type here explicitly since it can infer it from the default value.

await c.queue.addLast(item)
if c.queue.full:
debug "Engine action queue saturated, rejecting importBlock",
queueLen = c.queue.len, maxSize = MaxQueueSize
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why log a constant? Is there an intention to change this to be variable in the future?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, these are only for testing for now
Mostly for me to detect scenarios where this condition is occurring

Would be removed in the final version

await c.queue.addLast(item)
if c.queue.full:
debug "Engine action queue saturated, rejecting forkChoice",
queueLen = c.queue.len, maxSize = MaxQueueSize
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise logging a constant.

@tersec
Copy link
Copy Markdown
Contributor

tersec commented Apr 24, 2026

Unsure about the 4th change, not sure if it's the best idea. Or we should just let the engineAPI timeout instead for this case. For example the CL approaches the tip of the chain, it sends 30-40blocks almost together, and this is the time when this happens

SYNCING is basically always better than a timeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants