Skip to content

adapter: use occ for read-then write, with incremental retries#35192

Draft
aljoscha wants to merge 8 commits intoMaterializeInc:mainfrom
aljoscha:adapter-read-then-write-occ-incremental
Draft

adapter: use occ for read-then write, with incremental retries#35192
aljoscha wants to merge 8 commits intoMaterializeInc:mainfrom
aljoscha:adapter-read-then-write-occ-incremental

Conversation

@aljoscha
Copy link
Copy Markdown
Contributor

We now use SUBSCRIBE instead of PEEK to maintain the desired set of
updates that need to be written. We also don't acquire locks on tables
but instead optimistically try and write our updates at the timestamp
right at our current subscribe frontier.

Additionally, we take the opportunity this provides and move the
sequencing code from the coordinator main loop to the frontend, similar
to how we have done that for peeks in frontend_peek.rs.

Work towards https://github.com/MaterializeInc/database-issues/issues/6686

Implementation of https://github.com/MaterializeInc/materialize/blob/63645b72e83ee26d2cfa99d25d773a06e6accb74/doc/developer/design/20260210_incremental_occ_read_then_write.md

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

  • Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
  • Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
  • Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

@aljoscha aljoscha force-pushed the adapter-read-then-write-occ-incremental branch 4 times, most recently from 1675464 to 2cb8e6d Compare February 27, 2026 13:59
@aljoscha aljoscha force-pushed the adapter-read-then-write-occ-incremental branch from 865091f to 22a54a2 Compare March 16, 2026 15:30
@aljoscha aljoscha force-pushed the adapter-read-then-write-occ-incremental branch 11 times, most recently from 179deb6 to 8b8d81c Compare April 23, 2026 10:33
@aljoscha aljoscha force-pushed the adapter-read-then-write-occ-incremental branch 9 times, most recently from 01bcd4d to aff3f2e Compare April 28, 2026 13:36
aljoscha and others added 3 commits April 28, 2026 16:38
The cancel scenario periodically fires `pg_cancel_backend` against a
random worker. If that signal arrives during the worker's ROLLBACK,
psycopg's `connection.rollback()` raises a "canceling statement due
to user request" error. The rollback-error handler didn't recognize
that string, silently cleared `rollback_next`, and moved on — but
psycopg was still in `InFailedSqlTransaction`, so the next
psycopg-path query (COPY TO, SET, PREPARE, ...) died client-side
with "current transaction is aborted", which no action tolerates.

Surfaced by nightly 16160 (Parallel Workload (cancel)) on the
frontend-RTW-OCC branch: the RTW path's wider per-statement window
widens the race, but the bug is in the harness. Fix it by forcing a
reconnect on cancelled or aborted-transaction rollback failures —
reconnection is a clean reset, and psycopg doesn't guarantee the
connection is usable after a failed rollback anyway.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
We will later use this for the frontend read-then-write implementation.
aljoscha and others added 2 commits April 28, 2026 16:39
We now use SUBSCRIBE instead of PEEK to maintain the desired set of
updates that need to be written. We also don't acquire locks on tables
but instead optimistically try and write our updates at the timestamp
right at our current subscribe frontier.

Additionally, we take the opportunity this provides and move the
sequencing code from the coordinator main loop to the frontend, similar
to how we have done that for peeks in frontend_peek.rs.

Work towards MaterializeInc/database-issues#6686
The original `if active_subscribe.internal { ... as BuiltinTableAppendNotify }`
branch needed an explicit cast (with a `clippy::as_conversions` allow) for
the if/else types to unify. Splitting into a guarded match arm lets the
match's coercion site unsize both `Box::pin`s to the trait-object type.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@aljoscha aljoscha force-pushed the adapter-read-then-write-occ-incremental branch from aff3f2e to f444d6d Compare April 28, 2026 14:39
aljoscha and others added 3 commits April 28, 2026 16:10
Commit 21bbdde unified the error returned when a peek or subscribe
sees an underlying relation dropped mid-flight to "query could not
complete because relation X was dropped" (via
DroppedDependency::query_terminated_error). The COPY path keeps a
distinct "copy has been terminated because underlying ..." message via
DroppedDependency::copy_terminated_error.

Two assertions were missed in that change:
  - test/sqllogictest/ct_various.slt:190 still expected the old subscribe
    wording, so the FETCH-after-DROP-CT case fails.
  - src/environmentd/tests/sql.rs::test_dont_drop_sinks_twice was
    updated to expect "query could not complete", but the test runs
    COPY (SUBSCRIBE ...), which goes through copy_terminated_error and
    therefore returns "copy has been terminated".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`Coordinator::sequence_read_then_write` is the legacy path for
DELETE/UPDATE/INSERT. When `frontend_read_then_write_enabled` is on,
the OCC frontend path is the sole intended executor for these
statements: the two paths take different locks (OCC takes none; this
path takes per-table write locks) and therefore do not synchronize
against each other, so running them concurrently double-retracts rows
that both observed.

Reaching this function while the flag is on indicates a routing bug
where something bypassed the frontend gate
(`client.rs::try_frontend_read_then_write`). Concretely, SQL
`EXECUTE <prepared-dml>` is re-dispatched via `Command::Execute`
directly from `coord/sequencer.rs`'s `Plan::Execute` handler and
never crosses the session-client boundary where the gate lives. That
routing fix is in a follow-up commit; this one ensures we don't
silently corrupt a persist shard when a future change adds a similar
bypass.

Returns `AdapterError::Internal` with a self-descriptive message,
which surfaces as a server error to the user. The check happens
before any work (lock acquisition, planning, peek dispatch), so it's
cheap and side-effect free.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The session-client side now unrolls `Statement::Execute` before
running `try_frontend_peek` / `try_frontend_read_then_write`. The
inner statement (the prepared one being executed) goes through both
gates exactly like a directly-issued statement of the same kind,
which means a prepared DML executed via SQL `EXECUTE` now correctly
takes the OCC RTW path instead of the legacy
`Coordinator::sequence_read_then_write`.

Mechanics: `unroll_sql_execute` looks at the named portal; if its
statement is `Execute`, it plans it (yielding `ExecutePlan { name,
params }`), looks up the prepared statement, and installs a fresh
portal for the inner statement using the EXECUTE's parameter values
— mirroring `Coordinator::sequence_execute`. It loops so a chain of
`EXECUTE`s resolves to the underlying statement, with a depth bound
to defend against pathological nesting.

The coordinator's `Plan::Execute` handler in
`coord/sequencer.rs:620-637` stays as defensive fallback. With this
change in place, `EXECUTE` statements no longer reach it via the
normal session-client → coordinator path; combined with the failsafe
in `sequence_read_then_write`, any future code path that bypasses
the frontend gate for a DML will surface as an explicit internal
error rather than silently corrupting persist.

Confirmed by the parallel-workload retraction repro (see
`debug-notes/occ-rtw-handoff.md`'s root-cause section): the bug
required `DeleteAction` to use `PREPARE … AS DELETE …` followed by
`EXECUTE delete<N>`, which previously skipped OCC. With this routing
in place, every DELETE goes through OCC RTW regardless of how the
client invoked it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@aljoscha aljoscha force-pushed the adapter-read-then-write-occ-incremental branch from aec8a1a to 9ad320f Compare May 1, 2026 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant