graph LR
Address_Parsing_API["Address Parsing API"]
Address_Pre_processing_Feature_Extraction["Address Pre-processing & Feature Extraction"]
Probabilistic_Tagging_Engine_CRF_["Probabilistic Tagging Engine (CRF)"]
Training_Data_Processors["Training Data Processors"]
Address_Parsing_API -- "Passes raw address string for initial processing" --> Address_Pre_processing_Feature_Extraction
Address_Pre_processing_Feature_Extraction -- "Provides tokenized and featured data for tagging" --> Probabilistic_Tagging_Engine_CRF_
Probabilistic_Tagging_Engine_CRF_ -- "Returns the classified and tagged address components" --> Address_Parsing_API
Training_Data_Processors -- "Generates and supplies training data for model updates" --> Probabilistic_Tagging_Engine_CRF_
click Address_Parsing_API href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/usaddress/Address_Parsing_API.md" "Details"
click Address_Pre_processing_Feature_Extraction href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/usaddress/Address_Pre_processing_Feature_Extraction.md" "Details"
click Probabilistic_Tagging_Engine_CRF_ href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/usaddress/Probabilistic_Tagging_Engine_CRF_.md" "Details"
click Training_Data_Processors href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/usaddress/Training_Data_Processors.md" "Details"
The usaddress library is architected as a specialized address parsing pipeline, designed for clarity in data flow and extensibility of its core probabilistic model. It features a distinct Address Parsing API as the entry point, which orchestrates the transformation of raw address strings. This transformation involves an Address Pre-processing & Feature Extraction stage, preparing data for the central Probabilistic Tagging Engine (CRF), which applies machine learning to assign address component tags. An independent Training Data Processors component supports the continuous improvement of the tagging engine by generating and supplying model training data. This modular design emphasizes the separation of concerns, allowing for independent evolution of the parsing logic and the underlying probabilistic model, making usaddress an ideal "Address Parsing Service" layer within larger systems.
Address Parsing API [Expand]
The public interface of the usaddress library, responsible for receiving raw address strings, orchestrating the parsing workflow, and returning structured address data. It serves as the primary integration point.
Related Classes/Methods:
Address Pre-processing & Feature Extraction [Expand]
Transforms raw address strings into a format suitable for the probabilistic model by tokenizing input and extracting relevant features.
Related Classes/Methods:
usaddress.__init__.tokenize:731-749usaddress.__init__.tokens2features:785-807usaddress.__init__.tokenFeatures:755-782
Probabilistic Tagging Engine (CRF) [Expand]
The core intelligence, encapsulating the trained Conditional Random Field (CRF) model. It applies learned patterns to features to predict and assign address component labels.
Related Classes/Methods:
Training Data Processors [Expand]
Manages the ingestion and conversion of raw data sources (e.g., OpenAddress, OpenStreetMap) into the specific XML format required for training or retraining the Probabilistic Tagging Engine (CRF). This is an offline process crucial for model accuracy.
Related Classes/Methods: