graph LR
CLI_Handler["CLI Handler"]
Conductor["Conductor"]
DAG_Manager["DAG Manager"]
Database_Client["Database Client"]
SQL_Dialect_Handler["SQL Dialect Handler"]
Script_Handler["Script Handler"]
Session_Manager["Session Manager"]
CLI_Handler -- "initiates" --> Conductor
Conductor -- "manages" --> DAG_Manager
Conductor -- "uses" --> Database_Client
Database_Client -- "uses" --> SQL_Dialect_Handler
Conductor -- "manages" --> Session_Manager
Session_Manager -- "processes" --> Script_Handler
click CLI_Handler href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/lea/CLI Handler.md" "Details"
click Conductor href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/lea/Conductor.md" "Details"
click DAG_Manager href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/lea/DAG Manager.md" "Details"
click Database_Client href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/lea/Database Client.md" "Details"
click SQL_Dialect_Handler href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/lea/SQL Dialect Handler.md" "Details"
click Script_Handler href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/lea/Script Handler.md" "Details"
click Session_Manager href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/lea/Session Manager.md" "Details"
Lea is a data transformation tool that orchestrates the execution of SQL scripts within different database environments. The tool parses command-line arguments, constructs a directed acyclic graph (DAG) of SQL scripts based on their dependencies, and executes these scripts in the correct order using database-specific clients. It manages database sessions, handles SQL dialect variations, and provides mechanisms for testing data quality and managing audit tables. The core functionality revolves around transforming data through SQL scripts, ensuring data integrity, and supporting multiple database systems.
The CLI Handler serves as the entry point for the Lea application. It parses command-line arguments using lea.cli:run and initiates the data transformation pipeline by calling the Conductor. It is responsible for setting up the environment and triggering the execution of the data transformation process based on user-provided arguments.
Related Classes/Methods:
The Conductor orchestrates the entire data transformation process. It manages the DAG of scripts, prepares the database session, and executes the scripts in the correct order. It initializes database clients, manages audit tables, and handles the overall execution flow. The Conductor uses the DAG Manager to understand script dependencies and the Session Manager to execute scripts within a database session, ensuring that the data transformation is performed correctly and efficiently.
Related Classes/Methods:
lea.conductor.Conductor(22:351)lea.conductor.Conductor:__init__(23:88)lea.conductor.Conductor:run(90:118)lea.conductor.Conductor:prepare_session(120:221)lea.conductor.Conductor:run_session(223:254)lea.conductor.Conductor:make_client(256:325)lea.conductor:materialize_scripts(354:396)lea.conductor:delete_audit_tables(424:441)lea.conductor:delete_orphan_tables(444:457)
The DAG Manager constructs and manipulates the Directed Acyclic Graph (DAG) of SQL scripts. It determines the execution order of scripts based on their dependencies. It allows selecting subgraphs based on changed table references and iterating through ancestors and descendants of nodes in the graph. The DAG Manager provides the Conductor with the necessary information to execute scripts in the correct order, ensuring that dependencies are met before a script is executed.
Related Classes/Methods:
lea.dag.DAGOfScripts(15:182)lea.dag.DAGOfScripts:from_directory(32:66)lea.dag.DAGOfScripts:select(68:148)lea.dag.DAGOfScripts:iter_ancestors(173:176)lea.dag.DAGOfScripts:iter_descendants(178:182)
The Database Client provides an abstraction layer for interacting with different database systems (BigQuery, DuckDB, MotherDuck, DuckLake). It handles tasks such as materializing scripts, querying data, cloning tables, deleting tables, and listing table statistics and fields. The Conductor uses the Database Client to execute scripts and manage data within a specific database, abstracting away the complexities of interacting with different database systems.
Related Classes/Methods:
lea.databases.BigQueryClient(266:532)lea.databases.DuckDBClient(584:758)lea.databases.MotherDuckClient(761:772)lea.databases.DuckLakeClient(775:786)lea.databases.BigQueryClient:__init__(267:311)lea.databases.BigQueryClient:materialize_script(334:337)lea.databases.BigQueryClient:materialize_sql_script(339:380)lea.databases.BigQueryClient:query_script(382:385)lea.databases.BigQueryClient:query_sql_script(387:404)lea.databases.BigQueryClient:clone_table(406:428)lea.databases.BigQueryClient:delete_and_insert(430:467)lea.databases.BigQueryClient:delete_table(469:487)lea.databases.BigQueryClient:list_table_stats(489:506)lea.databases.BigQueryClient:list_table_fields(508:522)lea.databases.DuckDBClient:materialize_script(606:609)lea.databases.DuckDBClient:materialize_sql_script(611:634)lea.databases.DuckDBClient:query_script(636:640)lea.databases.DuckDBClient:clone_table(642:658)lea.databases.DuckDBClient:delete_and_insert(660:684)lea.databases.DuckDBClient:delete_table(686:697)lea.databases.DuckDBClient:list_table_stats(706:736)lea.databases.DuckDBClient:list_table_fields(738:750)lea.databases.DuckDBClient:make_job_config(752:758)
The SQL Dialect Handler provides dialect-specific functionality for parsing and formatting table references, converting table references to database-specific formats, and generating SQL code for data quality tests. The Database Client uses the SQL Dialect Handler to adapt SQL scripts to the specific database system being used, ensuring that the SQL scripts are compatible with the target database.
Related Classes/Methods:
lea.dialects.BigQueryDialect(121:172)lea.dialects.DuckDBDialect(175:229)lea.dialects.SQLDialect(15:108)lea.dialects.SQLDialect:make_column_test_unique(26:30)lea.dialects.SQLDialect:make_column_test_unique_by(32:38)lea.dialects.SQLDialect:make_column_test_no_nulls(40:44)lea.dialects.SQLDialect:make_column_test_set(46:52)lea.dialects.SQLDialect:add_dependency_filters(55:76)lea.dialects.SQLDialect:handle_incremental_dependencies(79:108)lea.dialects.BigQueryDialect:parse_table_ref(125:153)lea.dialects.DuckDBDialect:parse_table_ref(179:202)lea.dialects.DuckDBDialect:convert_table_ref_to_duckdb_table_reference(228:229)
The Script Handler is responsible for reading, parsing, and processing SQL scripts. It extracts dependencies, identifies assertion tests, and adds context to the scripts. The Conductor uses the Script Handler to understand the dependencies between scripts and to prepare them for execution, ensuring that all necessary information is available before a script is executed.
Related Classes/Methods:
lea.scripts.SQLScript(23:212)lea.scripts.SQLScript:from_path(74:100)lea.scripts.SQLScript:dependencies(115:140)lea.scripts.SQLScript:assertion_tests(143:204)lea.scripts:read_scripts(218:243)
The Session Manager handles the execution of SQL scripts within a database session. It adds context to scripts, runs scripts, monitors job execution, and promotes audit tables. The Conductor uses the Session Manager to manage the execution of scripts within a specific database session, ensuring that scripts are executed correctly and that audit tables are properly maintained.
Related Classes/Methods: