Skip to content

Various performance improvements inspired by asyncpg#172

Open
Dev-iL wants to merge 1 commit into
psqlpy-python:mainfrom
Dev-iL:2605/performance_improvements
Open

Various performance improvements inspired by asyncpg#172
Dev-iL wants to merge 1 commit into
psqlpy-python:mainfrom
Dev-iL:2605/performance_improvements

Conversation

@Dev-iL
Copy link
Copy Markdown
Contributor

@Dev-iL Dev-iL commented May 24, 2026

Summary

Closes most of psqlpy 0.12.0's performance gap vs asyncpg by fixing four root causes:

  1. Global STMTS_CACHE was incorrect and a serialization point — cross-connection reuse of tokio_postgres::Statement (which carries a Weak to the originating connection) plus a process-global RwLock write-lock on every prepared execute.
  2. execute_many did StatementBuilder::build() per row — 999 prepare lookups + 999 GIL re-entries for a 1 000-row call.
  3. COPY records flushed at 4 KiBBinaryCopyInWriter flushes every 4 KiB; asyncpg flushes at 512 KiB with a single-buffer encoder. Same algorithm, 128× smaller chunk size.
  4. result() returned PyDict per row — asyncpg's Record is one allocation per row with a shared column-name map; PyDict is ~3× the memory with redundant key storage.

D1 — Delete global STMTS_CACHE; add per-connection caches

  • Deleted src/statement/cache.rs — the process-global RwLock<HashMap<u64, StatementCacheInfo>>.
  • PoolConnection: unchanged — deadpool's prepare_cached already provides correct per-connection caching.
  • SingleConnection: gains DashMap<String, Statement> field; prepare() reads/inserts it.
  • StatementBuilder::build no longer takes any global lock.

Breaking change note: Per-SingleConnection cache is unbounded (mirrors deadpool's current behavior). LRU + DEALLOCATE deferred per trade-off T-1.

D2 — execute_many: build statement once, single GIL pass

  • StatementBuilder::build() called once per execute_many invocation, not once per row.
  • Remaining rows reuse the extracted Vec<Type> in a single GIL pass (from_python_typed only).
  • run_pipelined_batch accepts the pre-built Statement — no redundant second prepare.
  • Extracted drain_ordered() free function to deduplicate the FuturesOrdered drain pattern.
  • // TODO(bind-execute-many): marker at the dispatch site citing asyncpg coreproto.pyx:1022-1092 (_EXECUTE_MANY_BUF_NUM=4, _EXECUTE_MANY_BUF_SIZE=32768).

Breaking change (0.12.0): Transaction::execute_many wraps the batch in SAVEPOINT psqlpy_execute_many. On batch failure the savepoint is rolled back and the outer transaction remains live. Callers that previously called transaction.rollback() after a batch error should remove that call.

D3 — COPY records path: 512 KiB BytesMut streaming encoder

  • Replaced BinaryCopyInWriter::write() per-row loop with a hand-rolled BytesMut encoder.
  • Flush threshold: COPY_BUFFER_SIZE = 524288 bytes (matches asyncpg's _COPY_BUFFER_SIZE).
  • Single streaming pass: open copy_in first, then encode one row at a time → write to BytesMut → flush when ≥ 512 KiB. No intermediate Vec<Vec<Py<PyAny>>> materialization.
  • On GIL conversion error, sink.close().await is called before returning to put the connection back in ReadyForQuery.

Algorithm reference: asyncpg asyncpg/protocol/coreproto.pyx COPY binary protocol implementation.

D4 — Cache COPY column-type introspection per (schema, table, columns)

Each copy_records_to_table call previously ran PREPARE + DEALLOCATE for the column-type introspection query (2 extra round-trips). Both PoolConnection and SingleConnection now carry a CopyTypeCache:

pub type CopyTypeCache = DashMap<(Option<String>, String, Vec<String>), Vec<Type>>;

Column order is part of the key — ["a","b"] and ["b","a"] are different COPY targets. Cache is per-connection-checkout.

D5 — New Record pyclass + additive records() method

New #[pyclass] Record in src/query_result.rs:

  • Storage: Vec<Py<PyAny>> (eagerly decoded columns) + Arc<RecordDesc> (shared HashMap<String, usize> + name list, one allocation per result set).
  • API matches asyncpg's Record surface: __getitem__ (int / str / slice), __len__, __iter__, __repr__, .get(), .keys(), .values(), .items().
  • Error semantics: __getitem__ raises IndexError for out-of-range int, KeyError for missing str, TypeError for wrong key type.
  • Duplicate column names in records() raise ConnectionExecuteError instead of silently overwriting the index.

QueryResult.records() returns Vec<Record>. result() is unchanged — this is additive, non-breaking.

Type stubs in python/psqlpy/_internal/__init__.pyi updated with class Record and records() signatures.

D6 — Micro-wins

  • T3#7 (from_python.rs): Replace get_type().name() string comparisons with is_exact_instance against GILOnceCell-cached PyTypeObject pointers for uuid.UUID and decimal.Decimal. Uses get_or_try_init so import failures surface as PSQLPyResult errors instead of panics.
  • T3#8 (parameters.rs): ParametersBuilder::prepare returns PreparedParameters::default() before Python::with_gil when params is None. Empty sequences return early inside with_gil before any conversion work.
  • T3#10 (common.rs): Per-row scratch allocation in the COPY encoder reuses one Vec (.clear() between rows instead of re-allocating).

Tests added

  • python/tests/test_record.py — 10 integration tests covering positional/named/slice access, iteration, dict-like methods, shared descriptor, error paths, and coexistence with result().
  • python/tests/test_copy_records.py — 2 new tests: heterogeneous column types + pg_stat_statements-based introspection-cache verification.
  • src/driver/common.rs — 3 Rust unit tests for encode_copy_field (int, null, text).

Algorithm references

  • asyncpg COPY path: asyncpg/protocol/coreproto.pyx (binary COPY encoding, _COPY_BUFFER_SIZE = 524_288)
  • asyncpg execute_many: asyncpg/protocol/coreproto.pyx:1022-1092 (_EXECUTE_MANY_BUF_NUM=4, _EXECUTE_MANY_BUF_SIZE=32768)
  • asyncpg Record: asyncpg/protocol/record.pyx (PyVarObject + inline column pointer array + shared desc dict)

Breaking changes in 0.12.0

Area Change Migration
Transaction::execute_many Wraps batch in SAVEPOINT; outer transaction survives a batch failure Remove any transaction.rollback() call that immediately follows a caught execute_many error
QueryResult.result() Unchanged — still returns list[dict] None
QueryResult.records() New additive method returning list[Record] Opt-in

D1 — Delete global STMTS_CACHE; add per-connection caches:
  - Delete src/statement/cache.rs (process-global RwLock<HashMap> was
    incorrect for cross-connection Statement reuse and a serialization point)
  - PoolConnection: unchanged — deadpool's prepare_cached already correct
  - SingleConnection: gains DashMap<String, Statement> per-connection cache;
    prepare() consults/inserts it on every prepared query
  - dashmap = "6" added to Cargo.toml

D2 — execute_many: build statement once, single GIL pass:
  - StatementBuilder::build() called once per execute_many call, not per row
  - Remaining rows reuse extracted Vec<Type> in a single GIL pass
  - run_pipelined_batch: accepts pre-built Statement, no redundant prepare
  - TODO(bind-execute-many) marker left citing asyncpg coreproto.pyx:1022-1092

D3 — COPY records path: 512 KiB BytesMut streaming encoder:
  - Replace BinaryCopyInWriter per-row flush (4 KiB) with hand-rolled encoder
    flushing at 512 KiB (COPY_BUFFER_SIZE = 524288, matches asyncpg's value)
  - Single streaming pass: open copy_in before GIL, encode+flush per row
  - Eliminates intermediate Vec<Vec<Py<PyAny>>> materialization

D4 — Cache COPY column-type introspection per (schema, table, columns):
  - Both PoolConnection and SingleConnection gain CopyTypeCache (DashMap)
  - copy_records_to_table checks cache before issuing PREPARE+DEALLOCATE

D5 — Record pyclass + additive records() method:
  - New #[pyclass] Record: Vec<Py<PyAny>> + Arc<RecordDesc> (shared col map)
  - Implements __getitem__ (int/str/slice), __len__, __iter__, __repr__,
    get(), keys(), values(), items() — matches asyncpg Record surface
  - QueryResult::records() returns Vec<Record>; result() unchanged (additive)
  - Type stubs in python/psqlpy/_internal/__init__.pyi updated

D6 — Micro-wins:
  - T3#7: is_exact_instance dispatch in from_python.rs (GILOnceCell-cached
    PyTypeObject pointers for UUID + Decimal replace string name comparison)
  - T3#8: ParametersBuilder::prepare early-returns before Python::with_gil
    when params are None or empty
  - T3#10: per-row scratch Vec cleared between rows in COPY encoder

Tests: 17 new pytest tests (test_record.py + test_copy_records.py extensions)
Lint: ruff D205/PLR2004 suppressed for test files in pyproject.toml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Dev-iL Dev-iL force-pushed the 2605/performance_improvements branch from f91d130 to 814e4da Compare May 24, 2026 11:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant