Skip to content

Latest commit

 

History

History
318 lines (263 loc) · 39.2 KB

File metadata and controls

318 lines (263 loc) · 39.2 KB
graph LR
    Core_Data_Models["Core Data Models"]
    NLP_Pipeline_Management["NLP Pipeline Management"]
    Named_Entity_Recognition_NER_["Named Entity Recognition (NER)"]
    Entity_Linking_Disambiguation["Entity Linking & Disambiguation"]
    Ontology_Management["Ontology Management"]
    Model_Training_Evaluation["Model Training & Evaluation"]
    Resource_Management_Tools_KRT_["Resource Management Tools (KRT)"]
    Web_API_Interface["Web API Interface"]
    Shared_Utilities["Shared Utilities"]
    Annotation_Quality_Assurance["Annotation & Quality Assurance"]
    NLP_Pipeline_Management -- "uses" --> Core_Data_Models
    NLP_Pipeline_Management -- "orchestrates" --> Named_Entity_Recognition_NER_
    NLP_Pipeline_Management -- "orchestrates" --> Entity_Linking_Disambiguation
    Named_Entity_Recognition_NER_ -- "produces" --> Core_Data_Models
    Entity_Linking_Disambiguation -- "consumes" --> Core_Data_Models
    Entity_Linking_Disambiguation -- "uses" --> Ontology_Management
    Ontology_Management -- "produces" --> Core_Data_Models
    Ontology_Management -- "uses" --> Shared_Utilities
    Model_Training_Evaluation -- "uses" --> Core_Data_Models
    Model_Training_Evaluation -- "uses" --> Named_Entity_Recognition_NER_
    Model_Training_Evaluation -- "evaluates" --> NLP_Pipeline_Management
    Model_Training_Evaluation -- "integrates with" --> Annotation_Quality_Assurance
    Resource_Management_Tools_KRT_ -- "manages" --> Ontology_Management
    Resource_Management_Tools_KRT_ -- "uses" --> Core_Data_Models
    Web_API_Interface -- "exposes" --> NLP_Pipeline_Management
    Web_API_Interface -- "consumes" --> Core_Data_Models
    Web_API_Interface -- "integrates with" --> Annotation_Quality_Assurance
    Shared_Utilities -- "supports" --> NLP_Pipeline_Management
    Shared_Utilities -- "supports" --> Named_Entity_Recognition_NER_
    Shared_Utilities -- "supports" --> Entity_Linking_Disambiguation
    Shared_Utilities -- "supports" --> Ontology_Management
    Shared_Utilities -- "supports" --> Model_Training_Evaluation
    Shared_Utilities -- "supports" --> Resource_Management_Tools_KRT_
    Shared_Utilities -- "supports" --> Web_API_Interface
    Shared_Utilities -- "supports" --> Annotation_Quality_Assurance
    Annotation_Quality_Assurance -- "uses" --> Core_Data_Models
    Annotation_Quality_Assurance -- "evaluates" --> NLP_Pipeline_Management
    click Core_Data_Models href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//KAZU/Core Data Models.md" "Details"
    click NLP_Pipeline_Management href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//KAZU/NLP Pipeline Management.md" "Details"
    click Named_Entity_Recognition_NER_ href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//KAZU/Named Entity Recognition (NER).md" "Details"
    click Entity_Linking_Disambiguation href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//KAZU/Entity Linking & Disambiguation.md" "Details"
    click Ontology_Management href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//KAZU/Ontology Management.md" "Details"
    click Model_Training_Evaluation href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//KAZU/Model Training & Evaluation.md" "Details"
    click Resource_Management_Tools_KRT_ href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//KAZU/Resource Management Tools (KRT).md" "Details"
    click Web_API_Interface href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//KAZU/Web API Interface.md" "Details"
    click Shared_Utilities href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//KAZU/Shared Utilities.md" "Details"
    click Annotation_Quality_Assurance href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//KAZU/Annotation & Quality Assurance.md" "Details"
Loading

CodeBoardingDemoContact

Component Details

The KAZU system is a comprehensive Natural Language Processing (NLP) framework designed for biomedical text analysis. Its main flow involves processing documents through a configurable NLP pipeline that performs Named Entity Recognition (NER) and Entity Linking & Disambiguation. Core data models underpin all operations, while ontology management provides the necessary knowledge base. The system also includes robust tooling for resource curation, model training and evaluation, and a web API for external integration, all supported by a suite of general utilities and quality assurance mechanisms.

Core Data Models

Defines fundamental data structures (documents, entities, sections, mappings, ontology resources) used across the KAZU system.

Related Classes/Methods:

  • KAZU.kazu.data.Entity (full file reference)
  • KAZU.kazu.data.Document (full file reference)
  • KAZU.kazu.data.Section (full file reference)
  • KAZU.kazu.data.Mapping (full file reference)
  • KAZU.kazu.data.OntologyStringResource (full file reference)
  • kazu.data.EquivalentIdSet (full file reference)
  • kazu.data.CharSpan (full file reference)
  • kazu.data.Synonym (full file reference)
  • kazu.data.LinkingCandidate (full file reference)
  • kazu.data.MentionConfidence (full file reference)
  • kazu.data.LinkingMetrics (full file reference)
  • kazu.data.TokWordSpan (full file reference)
  • KAZU.kazu.data.GliNERBatchItem (full file reference)
  • KAZU.kazu.data.SavedModel (full file reference)
  • KAZU.kazu.data.GlobalParserActions (full file reference)
  • KAZU.kazu.data.PipelineValueError (full file reference)
  • KAZU.kazu.data.KazuConfigurationError (full file reference)

NLP Pipeline Management

Orchestrates the execution of NLP processing steps on documents and manages spaCy models within the pipeline.

Related Classes/Methods:

Named Entity Recognition (NER)

Identifies and extracts named entities from text using transformer models, rule-based approaches, and post-processing.

Related Classes/Methods:

Entity Linking & Disambiguation

Links identified entities to external knowledge bases and disambiguates between potential links using dictionary, rule-based, and context-scoring strategies.

Related Classes/Methods:

Ontology Management

Manages the parsing, curation, and generation of synonyms for various ontologies, supporting external knowledge integration.

Related Classes/Methods:

Model Training & Evaluation

Provides functionalities for training, predicting, and evaluating machine learning models, particularly for multi-label NER, including data handling and metric calculation.

Related Classes/Methods:

Resource Management Tools (KRT)

Offers interactive tools for managing and curating Kazu resources, including resource editing, conflict resolution, and ontology updates.

Related Classes/Methods:

Web API Interface

Provides RESTful API endpoints for external applications to interact with the KAZU system, enabling NER and entity linking operations.

Related Classes/Methods:

Shared Utilities

A collection of reusable utility functions and helper classes supporting various KAZU functionalities, including string normalization, caching, and abbreviation detection.

Related Classes/Methods:

Annotation & Quality Assurance

Provides tools for converting KAZU data for annotation and performing acceptance tests to ensure the quality and consistency of annotations and pipeline results.

Related Classes/Methods: