paymentsdb: restore sql payment store parity with kv by yyforyongyu · Pull Request #10721 · lightningnetwork/lnd

yyforyongyu · 2026-04-07T12:35:18Z

Issues Found

SQL stored the payment identifier instead of the per-attempt hash for AMP
HTLC attempts, so AMP attempts could reload with the wrong hash after a
restart.
SQL startup recovery only considered unresolved attempts, which missed some
non-terminal payments that KV already exposed through
FetchInFlightPayments.
The non-terminal payment selector also needed a more efficient query shape so
the correctness fix would not devolve into a full history scan.

What This PR Does

persist the correct AMP attempt hash in the SQL store
fetch non-terminal payments directly for SQL startup recovery, matching the
current KV behavior
use a UNION-based selector to improve the non-terminal payment query
remove the old inflight-only SQL query path and its dead helper code

Testing

make lint
GOWORK=off go test ./payments/db ./routing
GOWORK=off go test -tags=test_db_sqlite ./payments/db
GOWORK=off go test -tags=test_db_postgres ./payments/db
- blocked in the default environment without Docker socket access

SQL writes store the payment identifier in payment_hash for each attempt. That is wrong for AMP payments, because the payment identifier is the SetID while each shard carries its own HTLC hash. Add TestRegisterAttemptPreservesAttemptHash as evidence. It passes on KV and fails on SQL before the fix because SQL reads the attempt hash back as the payment identifier.

Live SQL writes stored the payment identifier in payment_hash for each attempt. That works for legacy payments, but it breaks AMP because the payment identifier is the SetID while each shard carries its own HTLC hash. Use TestRegisterAttemptPreservesAttemptHash as evidence. The test now passes on both KV and SQL. Fix this by persisting attempt.Hash when it is present and only falling back to the payment identifier when the attempt hash is nil. That restores KV parity for AMP attempt reloads.

SQL FetchInFlightPayments only returns payments with an unresolved attempt row. KV returns every non-terminal payment, including retryable payments with only failed attempts and payments that have been initialized but have not registered any HTLCs yet. Add TestFetchInFlightPaymentsIncludesRetryablePayments and TestFetchInFlightPaymentsIncludesInitiatedPayments as evidence. Both tests pass on KV and fail on SQL before the fix.

FetchInFlightPayments needs a dedicated SQL query that can return non-terminal payments without relying on the unresolved-attempt scan. The first version of that query fixed correctness, but the follow-up selector measurements showed a UNION-based shape was materially faster while returning the same payment set. Use the inflight regression tests as evidence. The tests still fail on SQL before the Go payment store is wired up, but this commit adds the final SQL surface the later wiring commit depends on. Add FetchNonTerminalPayments to the SQL query set, regenerate the sqlc bindings, add the PaymentAndIntent adapters for the new row type, and use the UNION-based candidate selection so the final query shape lands in one commit.

The new FetchNonTerminalPayments query is available, but FetchInFlightPayments still uses the old unresolved-attempt scan until this commit. Use the inflight recovery regression tests as evidence. They now pass on both KV and SQL once the payment store is wired up to use the new query. Fix this by switching FetchInFlightPayments to the non-terminal payment query and batch loading only the related attempt and route data for those payment IDs.

FetchInFlightPayments no longer relies on the old FetchAllInflightAttempts query surface once the non-terminal payment query is in place. Use make lint and the inflight recovery tests as evidence. The code still passes once the old query and its generated bindings are removed. Remove FetchAllInflightAttempts from payments.sql, regenerate the sqlc bindings, and drop the matching SQLQueries interface method.

After removing the old FetchAllInflightAttempts query API, the helper types that only existed to batch load that path are no longer used. Use make lint as evidence. It reports the old inflight helper types as unused once the query API is gone. Remove the obsolete helper structs and batch-loading function from the payment SQL store so the remaining code matches the new inflight recovery path.

gemini-code-assist · 2026-04-07T12:38:17Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request improves the SQL payment store's parity with the KV store by addressing issues related to AMP payment hash persistence and startup recovery. It introduces a more efficient query mechanism for identifying non-terminal payments and cleans up legacy code paths that were previously used for inflight attempt tracking.

Highlights

AMP Payment Hash Persistence: Updated the SQL store to correctly persist the individual HTLC attempt hash instead of the payment identifier, ensuring AMP payments reload with the correct hash after a restart.
Improved Startup Recovery: Replaced the inflight-only query path with a more robust FetchNonTerminalPayments query, ensuring all non-terminal payments are correctly exposed during startup recovery.
Query Optimization: Implemented a UNION-based selector for non-terminal payments to improve query efficiency and prevent full history scans.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors the payment fetching logic in the SQL store to use a more efficient pagination approach based on non-terminal payments rather than individual HTLC attempts. It introduces a new FetchNonTerminalPayments query that correctly identifies payments that are still in-flight, initiated, or have retryable failures. Additionally, it ensures that HTLC attempt hashes are preserved independently of the payment identifier, which is critical for AMP payments. New unit tests have been added to verify these behaviors. I have no feedback to provide as the identified issue in the review was actually incorrect; rows.Close() is not redundant in this context as it is good practice to close rows explicitly to handle potential errors during the closing process, especially when not using a defer block.

ziggie1984 · 2026-04-07T18:20:43Z

payments/db/sql_store.go

 		sessionKey := attempt.SessionKey()
 		sessionKeyBytes := sessionKey.Serialize()
+		attemptHash := paymentHash[:]
+		if attempt.Hash != nil {


should we log an error in case the hash is nil normally this should never happen or ?

Good idea. I added 412db8ae7 to log the nil-hash fallback path so we get a diagnostic if a newly registered attempt ever reaches it unexpectedly. The legacy fallback behavior itself stays the same.

ziggie1984

LGTM

Thank you for the fix. The AMP fix but also the Inflight fix are not super severe since for the hash it is never used for critical operations in the paymentsDB. Moreover the fetching of payments which do not have an unresolved HTLC are also very rare and it would them only leave in an INFLIGHT state but without any potential loss of funds.

ziggie1984 · 2026-04-07T18:56:02Z

payments/db/sql_store.go

 			SessionKey:   sessionKeyBytes,
 			AttemptTime:  attempt.AttemptTime,
-			PaymentHash:  paymentHash[:],
+			PaymentHash:  attemptHash,


Worth noting that while the stored hash was wrong for AMP payments, this did not cause observable payment failures. Settlement is driven by the preimage arriving on the wire, and
failure decryption relies solely on the session key and payment path — neither path consults the stored attempt hash. The bug was a data correctness issue rather than a live payment
breakage, which also explains why it went undetected.

Agreed. I left the runtime behavior unchanged and treated this as a data-correctness fix rather than a live payment breakage. I only followed up with 412db8ae7 to add a diagnostic on the nil-hash fallback path.

ziggie1984 · 2026-04-07T19:25:15Z

Note: This review was generated by a Claude Code AI review session (/code-review), based on deep static analysis of the diff, schema, index definitions, and Go call paths. No automated tooling was substituted for manual reasoning — all conclusions were derived by tracing the code.

Review: paymentsdb: restore sql payment store parity with kv

Verdict: Approve with minor suggestions.

What was verified

All new regression tests pass on both the KV and SQLite backends
The three UNION branches are mutually exclusive across payments and correctly map to all three non-terminal states in decidePaymentStatus
ORDER BY p.id ASC correctly replaces the old sort.Slice by SequenceNum since p.id == SequenceNum (sql_store.go:636)
The AMP hash fix is correct; attempt.Hash is never nil in live router code — only in KV→SQL migration of legacy attempts, which is handled separately in migration1/sql_migration.go:434

Impact of the AMP hash bug

Worth noting that while the stored hash was wrong for AMP payments, this did not cause observable payment failures. Settlement is driven by the preimage arriving on the wire, and failure decryption relies solely on the session key and payment path — neither path consults the stored attempt hash. The bug was a data correctness issue rather than a live payment breakage, which also explains why it went undetected.

Suggestions (non-blocking)

1. Silent nil-hash fallback should log an error

In sql_store.go:1477–1480, the nil branch falls back silently with no diagnostic output. Since this path should never be reached in live router code, hitting it indicates a programming error. A log.Errorf would surface it immediately rather than silently persisting wrong data:

attemptHash := paymentHash[:]
if attempt.Hash != nil {
    attemptHash = attempt.Hash[:]
} else {
    log.Errorf("RegisterAttempt: attempt %d has nil hash, falling "+
        "back to payment identifier %v", attempt.AttemptID, paymentHash)
}

2. Branch 1 lacks DISTINCT while Branch 3 has it

Minor clarity nit: Branch 1 returns one row per unresolved attempt, so a payment with multiple in-flight shards produces duplicate payment_id values. The outer UNION deduplicates them
correctly, but Branch 3 already uses SELECT DISTINCT for the same reason. Making Branch 1 consistent and switching to UNION ALL makes the invariant self-documenting — each branch is
responsible for its own deduplication, and UNION ALL signals that the caller knows the branches can never overlap:

   SELECT DISTINCT ha.payment_id AS id   -- add DISTINCT
   FROM payment_htlc_attempts ha
   WHERE NOT EXISTS (...)

  UNION ALL                              -- change from UNION

   SELECT p.id
   FROM payments p
   WHERE ...

  UNION ALL                              -- change from UNION

   SELECT DISTINCT ha.payment_id AS id   -- already present
   FROM payment_htlc_attempts ha
   ...

No behaviour change, no performance impact — purely a readability improvement.

3. Docstring is inaccurate after the fix

FetchInFlightPayments and the PaymentReader interface both still describe the function as returning "payments with status InFlight". The function now also returns StatusInitiated payments (Branch 2) and retryable payments. The description should say "non-terminal payments" and enumerate all three cases.

4. Branch 2 full-table scan (minor, startup-only)

payments WHERE fail_reason IS NULL has no index. A partial index would help on large nodes:

CREATE INDEX IF NOT EXISTS idx_payments_no_fail_reason
ON payments(id) WHERE fail_reason IS NULL;

Both SQLite and PostgreSQL support partial indexes. Acceptable as a follow-up since this is startup-only.

RegisterAttempt falls back to the payment identifier when an attempt hash is nil so legacy data can still round-trip safely. In live router code, however, a nil attempt hash should never happen for newly registered attempts. Add an error log on the fallback path so an unexpected nil attempt hash is surfaced immediately instead of silently persisting the fallback value.

Roasbeef

Great PR! Excellent commit series as well.

LGTM 🐳

Dropped some non-blocking comments re query readability and potentail perf improvements.

Roasbeef · 2026-04-08T18:25:20Z

sqldb/sqlc/queries/payments.sql

+
+    SELECT DISTINCT ha.payment_id AS id
+    FROM payment_htlc_attempts ha
+    JOIN payment_htlc_attempt_resolutions hr


Given that this clause and the 1st both join on attempt resolutions, I think we can collapse the query to instead be:

SELECT p.id FROM payments p WHERE p.fail_reason IS NULL AND NOT EXISTS ( SELECT 1 FROM payment_htlc_attempt_resolutions hr JOIN payment_htlc_attempts ha ON ha.attempt_index = hr.attempt_index WHERE ha.payment_id = p.id AND hr.resolution_type = 1 ) UNION SELECT ha.payment_id FROM payment_htlc_attempts ha WHERE NOT EXISTS ( SELECT 1 FROM payment_htlc_attempt_resolutions hr WHERE hr.attempt_index = ha.attempt_index )

(not tested)

So it queries that: not failed yet and not settled attempted + has an unresolved attempt

cool applied

Roasbeef · 2026-04-08T18:31:04Z

payments/db/sql_store.go


-		// Convert map to slice and sort by sequence number to
-		// produce a deterministic ordering.
-		mpPayments = make([]*MPPayment, 0, len(processedPayments))


We no longer need this sorting step?

yea we no longer need the old Go-side sorting step. The current path appends one payment per selector row, and the selector query already returns rows ordered by p.id ASC

Roasbeef · 2026-04-08T18:33:38Z

sqldb/sqlc/queries/payments.sql

+    AND NOT EXISTS (
+        SELECT 1 FROM payment_htlc_attempts ha
+        WHERE ha.payment_id = p.id
+    )


Can consider adding an index like this to speed up this portion of the query:

CREATE INDEX IF NOT EXISTS idx_payments_no_fail_reason ON payments(id) WHERE fail_reason IS NULL;

Otherwise will scan, but iiuc this only is done on start up

Roasbeef · 2026-04-08T18:33:58Z

payments/db/sql_store.go

+		if attempt.Hash != nil {
+			attemptHash = attempt.Hash[:]
+		} else {
+			log.Errorf("RegisterAttempt: attempt %d has nil hash, "+


Should this return an error instead?

Think we need it to keep the current behavior - KV already tolerates attempt.Hash == nil, and changing this to an error would be a behavior change rather than just a bug fix. We could revisit it tho, if we did see an error log.

Simplify FetchNonTerminalPayments by collapsing the selector down to two branches: payments that are not failed and have no settled attempt, and payments that still have unresolved attempts. This keeps the same non-terminal semantics while making the query easier to reason about. Also add a partial index on payments(id) where fail_reason IS NULL to speed up the startup selector branch that scans payments without a recorded failure reason.

github-actions · 2026-04-08T18:50:13Z

🔴 PR Severity: CRITICAL

Automated classification | 6 files (excl. tests/generated) | 279 lines changed

🔴 Critical (2 files)

sqldb/sqlc/migrations/000014_payments_no_fail_reason_index.down.sql - SQL database migration (sqldb/* migrations are always CRITICAL)
sqldb/sqlc/migrations/000014_payments_no_fail_reason_index.up.sql - SQL database migration (sqldb/* migrations are always CRITICAL)

🟠 High (3 files)

sqldb/sqlc/db_custom.go - sqldb/* package (HIGH)
sqldb/sqlc/querier.go - sqldb/* package (HIGH)
sqldb/sqlc/queries/payments.sql - sqldb/* package (HIGH)

🟡 Medium (1 file)

payments/db/sql_store.go - payments/* package (MEDIUM)

Analysis

This PR adds a new SQL database migration (000014_payments_no_fail_reason_index) that modifies the payments table schema — specifically adding an index for payments without a fail reason. Database migrations in sqldb/* are always classified as CRITICAL regardless of other factors, because schema changes are irreversible and can affect data integrity and query performance in production.

The remaining changes update the SQL query layer (payments.sql.go, querier.go, db_custom.go) to support the new index, and payments/db/sql_store.go is updated to use the new query interface. These are HIGH/MEDIUM on their own, but the migration drives the overall classification to CRITICAL.

This PR warrants expert review focused on:

Correctness and safety of the migration (up and down paths)
Index definition and its impact on query performance
Compatibility with existing payment store logic

_{To override, add a severity-override-{critical,high,medium,low} label.}

yyforyongyu added 7 commits April 7, 2026 16:09

yyforyongyu added bug fix no-changelog labels Apr 7, 2026

gemini-code-assist bot reviewed Apr 7, 2026

View reviewed changes

saubyk assigned yyforyongyu Apr 7, 2026

yyforyongyu marked this pull request as ready for review April 7, 2026 16:16

saubyk requested a review from ziggie1984 April 7, 2026 16:41

saubyk added this to v0.21 Apr 7, 2026

saubyk moved this to In progress in v0.21 Apr 7, 2026

ziggie1984 reviewed Apr 7, 2026

View reviewed changes

ziggie1984 approved these changes Apr 7, 2026

View reviewed changes

ziggie1984 moved this from In progress to In review in v0.21 Apr 8, 2026

Roasbeef approved these changes Apr 8, 2026

View reviewed changes

github-actions bot added the severity-critical Requires expert review - security/consensus critical label Apr 8, 2026

yyforyongyu merged commit 9bf4f50 into lightningnetwork:master Apr 9, 2026
52 checks passed

github-project-automation bot moved this from In review to Done in v0.21 Apr 9, 2026

Conversation

yyforyongyu commented Apr 7, 2026

Issues Found

What This PR Does

Testing

Uh oh!

gemini-code-assist bot commented Apr 7, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ziggie1984 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ziggie1984 commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review: paymentsdb: restore sql payment store parity with kv

What was verified

Impact of the AMP hash bug

Suggestions (non-blocking)

Uh oh!

Roasbeef left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 8, 2026

🔴 PR Severity: CRITICAL

Analysis

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ziggie1984 commented Apr 7, 2026 •

edited

Loading