graph LR
Database_Abstraction_Layer["Database Abstraction Layer"]
Splink_Data_Abstraction["Splink Data Abstraction"]
Backend_Specific_Database_APIs["Backend-Specific Database APIs"]
SQL_Query_Pipeline_Orchestration["SQL Query Pipeline Orchestration"]
SQL_Dialect_Management["SQL Dialect Management"]
SQL_Generation_Modules["SQL Generation Modules"]
Database_Abstraction_Layer -- "delegates operations to" --> Backend_Specific_Database_APIs
Backend_Specific_Database_APIs -- "provides backend-specific logic to" --> Database_Abstraction_Layer
Database_Abstraction_Layer -- "converts query results into" --> Splink_Data_Abstraction
Splink_Data_Abstraction -- "triggers dropping of temporary tables in" --> Database_Abstraction_Layer
SQL_Generation_Modules -- "enqueues SQL snippets into" --> SQL_Query_Pipeline_Orchestration
SQL_Query_Pipeline_Orchestration -- "consults" --> SQL_Dialect_Management
SQL_Query_Pipeline_Orchestration -- "passes final generated query to" --> Database_Abstraction_Layer
The Data I/O & Backend Abstraction subsystem in Splink is responsible for providing a unified interface for data loading, managing data representations, and abstracting interactions with various database backends (DuckDB, Spark, Postgres, etc.) through SQL generation and execution. It acts as the pluggable data access layer, decoupling the core Splink logic from the underlying data storage and processing engines.
Provides a unified, high-level interface for all database operations. It orchestrates SQL execution, manages table registration and deletion, and handles query result caching. It acts as the primary entry point for data interaction from other parts of the Splink library.
Related Classes/Methods:
Offers a consistent, abstract representation of data within Splink, decoupling the library's logic from specific underlying data storage (e.g., Pandas DataFrames, Spark DataFrames, database tables). It manages the lifecycle of the underlying physical table, including its eventual dropping.
Related Classes/Methods:
Implement concrete database operations for specific backends. They handle backend-specific SQL execution, table creation/deletion, and initialization tasks like registering User-Defined Functions (UDFs) relevant to their respective database systems. This embodies the Strategy Pattern for pluggable backends.
Related Classes/Methods:
splink.internals.athena.database_apisplink.internals.duckdb.database_apisplink.internals.postgres.database_apisplink.internals.spark.database_apisplink.internals.sqlite.database_api
Manages the construction and flow of complex SQL queries, typically composed of multiple Common Table Expressions (CTEs). It allows for enqueuing SQL statements from various modules and generating the final, executable SQL query.
Related Classes/Methods:
Provides functions and expressions specific to different SQL dialects. Its role is to ensure that the generated SQL is compatible with the target database backend by handling variations in function names, syntax, and available features.
Related Classes/Methods:
These modules are responsible for generating specific SQL snippets required for various data transformations within the Splink pipeline, such as computing term frequencies for comparisons or performing vertical concatenation of datasets.
Related Classes/Methods: