graph LR
Application_Orchestrator["Application Orchestrator"]
Kafka_Client["Kafka Client"]
Stream_Processing_Engine["Stream Processing Engine"]
Data_Sources["Data Sources"]
Data_Sinks["Data Sinks"]
State_Store["State Store"]
Serialization_Deserialization_SerDe_["Serialization/Deserialization (SerDe)"]
Topic_Manager["Topic Manager"]
Data_Enrichment_Joins["Data Enrichment & Joins"]
Application_Orchestrator -- "initiates and manages" --> Kafka_Client
Application_Orchestrator -- "initiates and manages" --> Topic_Manager
Application_Orchestrator -- "initiates and manages" --> Stream_Processing_Engine
Application_Orchestrator -- "initiates and manages" --> Data_Sources
Application_Orchestrator -- "initiates and manages" --> Data_Sinks
Application_Orchestrator -- "initiates and manages" --> State_Store
Data_Sources -- "produces data to" --> Kafka_Client
Kafka_Client -- "consumes data for" --> Stream_Processing_Engine
Kafka_Client -- "utilizes" --> Serialization_Deserialization_SerDe_
Stream_Processing_Engine -- "interacts with" --> Kafka_Client
Stream_Processing_Engine -- "utilizes" --> State_Store
Stream_Processing_Engine -- "utilizes" --> Data_Enrichment_Joins
Stream_Processing_Engine -- "routes processed data to" --> Data_Sinks
Topic_Manager -- "interacts with Kafka via" --> Kafka_Client
click Application_Orchestrator href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/quix-streams/Application_Orchestrator.md" "Details"
click Kafka_Client href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/quix-streams/Kafka_Client.md" "Details"
click Stream_Processing_Engine href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/quix-streams/Stream_Processing_Engine.md" "Details"
click Data_Sources href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/quix-streams/Data_Sources.md" "Details"
click Data_Sinks href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/quix-streams/Data_Sinks.md" "Details"
click State_Store href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/quix-streams/State_Store.md" "Details"
click Topic_Manager href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/quix-streams/Topic_Manager.md" "Details"
click Data_Enrichment_Joins href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/quix-streams/Data_Enrichment_Joins.md" "Details"
The quixstreams project is designed as a robust streaming data processing framework, centered around a reactive data flow paradigm. The Application Orchestrator acts as the central control plane, responsible for the setup and lifecycle management of the entire streaming application. It configures and initializes key components such as the Kafka Client, Topic Manager, Stream Processing Engine, Data Sources, Data Sinks, and State Store.
Data ingress begins with Data Sources which produce raw data into Kafka topics, facilitated by the Kafka Client. The Kafka Client is the primary interface for all Kafka interactions, handling both data consumption and production, and relies on the Serialization/Deserialization (SerDe) component for data format conversions.
The core of the data processing lies within the Stream Processing Engine. This engine consumes messages from Kafka via the Kafka Client, applies defined transformations, aggregations, and routing logic using its declarative API. During processing, it can leverage the State Store for persisting stateful operations and the Data Enrichment & Joins component for combining data streams or enriching them with external information. Once processed, the data is routed to Data Sinks for external consumption, again utilizing the Kafka Client for publishing to Kafka topics before being consumed by external systems. The Topic Manager works in conjunction with the Kafka Client to manage Kafka topics, ensuring proper configuration and co-partitioning for efficient stream processing operations.
- Application Initialization & Management: The
Application Orchestratorinitiates and manages theKafka Client,Topic Manager,Stream Processing Engine,Data Sources,Data Sinks, andState Store. This represents the setup and control flow of the entire application. - Data Ingestion Flow:
Data Sourcesproduce data to theKafka Client, which then makes it available for theStream Processing Engine. This is the primary path for raw data entering the system. - Data Processing Flow: The
Stream Processing Engineconsumes data from theKafka Client, processes it (potentially interacting withState StoreandData Enrichment & Joins), and then routes the processed data toData Sinksvia theKafka Client. This highlights the core data transformation pipeline. - Kafka Interaction: The
Kafka Clientis central to all Kafka-related operations, consuming data for theStream Processing Engine, being utilized bySerialization/Deserialization (SerDe), and interacting with theTopic Managerfor administrative tasks.
Application Orchestrator [Expand]
The central control unit responsible for initializing, configuring, and managing the overall lifecycle of the streaming application. It orchestrates the data flow by setting up sources, sinks, and the processing engine.
Related Classes/Methods:
Kafka Client [Expand]
Manages all low-level interactions with Apache Kafka, including producing and consuming messages. It acts as the primary communication channel for data ingress and egress within the application.
Related Classes/Methods:
Stream Processing Engine [Expand]
Provides a high-level, declarative API (StreamingDataFrame) for defining data transformations, aggregations, and routing logic on streaming data. It consumes messages from Kafka, processes them, and produces results.
Related Classes/Methods:
Data Sources [Expand]
Modules responsible for ingesting data from various external systems into Kafka topics, serving as the entry points for raw data into the streaming application.
Related Classes/Methods:
Data Sinks [Expand]
Modules responsible for publishing processed data from Kafka topics to various external systems, serving as the exit points for processed data from the streaming application.
Related Classes/Methods:
State Store [Expand]
Provides an interface for managing and persisting state associated with stream processing operations (e.g., aggregations, joins), supporting different storage backends.
Related Classes/Methods:
Handles the conversion of data between its raw byte representation (for Kafka transport) and structured Python objects used within the application, supporting various data formats.
Related Classes/Methods:
Topic Manager [Expand]
Interacts with Kafka's Admin API to manage topics, including creation, configuration, and ensuring co-partitioning for operations like joins.
Related Classes/Methods:
Data Enrichment & Joins [Expand]
Facilitates the enrichment of streaming data with external information and performs join operations between different data streams.
Related Classes/Methods: