graph LR
Core_Data_Models["Core Data Models"]
NLP_Pipeline_Management["NLP Pipeline Management"]
Named_Entity_Recognition_NER_["Named Entity Recognition (NER)"]
Entity_Linking_Disambiguation["Entity Linking & Disambiguation"]
Ontology_Management["Ontology Management"]
Model_Training_Evaluation["Model Training & Evaluation"]
Resource_Management_Tools_KRT_["Resource Management Tools (KRT)"]
Web_API_Interface["Web API Interface"]
Shared_Utilities["Shared Utilities"]
Annotation_Quality_Assurance["Annotation & Quality Assurance"]
NLP_Pipeline_Management -- "uses" --> Core_Data_Models
NLP_Pipeline_Management -- "orchestrates" --> Named_Entity_Recognition_NER_
NLP_Pipeline_Management -- "orchestrates" --> Entity_Linking_Disambiguation
Named_Entity_Recognition_NER_ -- "produces" --> Core_Data_Models
Entity_Linking_Disambiguation -- "consumes" --> Core_Data_Models
Entity_Linking_Disambiguation -- "uses" --> Ontology_Management
Ontology_Management -- "produces" --> Core_Data_Models
Ontology_Management -- "uses" --> Shared_Utilities
Model_Training_Evaluation -- "uses" --> Core_Data_Models
Model_Training_Evaluation -- "uses" --> Named_Entity_Recognition_NER_
Model_Training_Evaluation -- "evaluates" --> NLP_Pipeline_Management
Model_Training_Evaluation -- "integrates with" --> Annotation_Quality_Assurance
Resource_Management_Tools_KRT_ -- "manages" --> Ontology_Management
Resource_Management_Tools_KRT_ -- "uses" --> Core_Data_Models
Web_API_Interface -- "exposes" --> NLP_Pipeline_Management
Web_API_Interface -- "consumes" --> Core_Data_Models
Web_API_Interface -- "integrates with" --> Annotation_Quality_Assurance
Shared_Utilities -- "supports" --> NLP_Pipeline_Management
Shared_Utilities -- "supports" --> Named_Entity_Recognition_NER_
Shared_Utilities -- "supports" --> Entity_Linking_Disambiguation
Shared_Utilities -- "supports" --> Ontology_Management
Shared_Utilities -- "supports" --> Model_Training_Evaluation
Shared_Utilities -- "supports" --> Resource_Management_Tools_KRT_
Shared_Utilities -- "supports" --> Web_API_Interface
Shared_Utilities -- "supports" --> Annotation_Quality_Assurance
Annotation_Quality_Assurance -- "uses" --> Core_Data_Models
Annotation_Quality_Assurance -- "evaluates" --> NLP_Pipeline_Management
click Core_Data_Models href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//KAZU/Core Data Models.md" "Details"
click NLP_Pipeline_Management href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//KAZU/NLP Pipeline Management.md" "Details"
click Named_Entity_Recognition_NER_ href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//KAZU/Named Entity Recognition (NER).md" "Details"
click Entity_Linking_Disambiguation href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//KAZU/Entity Linking & Disambiguation.md" "Details"
click Ontology_Management href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//KAZU/Ontology Management.md" "Details"
click Model_Training_Evaluation href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//KAZU/Model Training & Evaluation.md" "Details"
click Resource_Management_Tools_KRT_ href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//KAZU/Resource Management Tools (KRT).md" "Details"
click Web_API_Interface href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//KAZU/Web API Interface.md" "Details"
click Shared_Utilities href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//KAZU/Shared Utilities.md" "Details"
click Annotation_Quality_Assurance href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//KAZU/Annotation & Quality Assurance.md" "Details"
The KAZU system is a comprehensive Natural Language Processing (NLP) framework designed for biomedical text analysis. Its main flow involves processing documents through a configurable NLP pipeline that performs Named Entity Recognition (NER) and Entity Linking & Disambiguation. Core data models underpin all operations, while ontology management provides the necessary knowledge base. The system also includes robust tooling for resource curation, model training and evaluation, and a web API for external integration, all supported by a suite of general utilities and quality assurance mechanisms.
Defines fundamental data structures (documents, entities, sections, mappings, ontology resources) used across the KAZU system.
Related Classes/Methods:
KAZU.kazu.data.Entity(full file reference)KAZU.kazu.data.Document(full file reference)KAZU.kazu.data.Section(full file reference)KAZU.kazu.data.Mapping(full file reference)KAZU.kazu.data.OntologyStringResource(full file reference)kazu.data.EquivalentIdSet(full file reference)kazu.data.CharSpan(full file reference)kazu.data.Synonym(full file reference)kazu.data.LinkingCandidate(full file reference)kazu.data.MentionConfidence(full file reference)kazu.data.LinkingMetrics(full file reference)kazu.data.TokWordSpan(full file reference)KAZU.kazu.data.GliNERBatchItem(full file reference)KAZU.kazu.data.SavedModel(full file reference)KAZU.kazu.data.GlobalParserActions(full file reference)KAZU.kazu.data.PipelineValueError(full file reference)KAZU.kazu.data.KazuConfigurationError(full file reference)
Orchestrates the execution of NLP processing steps on documents and manages spaCy models within the pipeline.
Related Classes/Methods:
KAZU.kazu.pipeline.Pipeline(full file reference)KAZU.kazu.pipeline.batch_metrics(full file reference)KAZU.kazu.pipeline.FailedDocsFileHandler(full file reference)KAZU.kazu.utils.spacy_pipeline.SpacyPipelines(87:254)
Identifies and extracts named entities from text using transformer models, rule-based approaches, and post-processing.
Related Classes/Methods:
KAZU.kazu.steps.ner.hf_token_classification.TransformersModelForTokenClassificationNerStep(63:351)KAZU.kazu.steps.ner.tokenized_word_processor.TokenizedWordProcessor(319:432)KAZU.kazu.steps.ner.tokenized_word_processor.SimpleSpanFinder(68:211)KAZU.kazu.steps.ner.tokenized_word_processor.MultilabelSpanFinder(214:316)KAZU.kazu.steps.ner.llm_ner.LLMNERStep(193:290)KAZU.kazu.steps.ner.llm_ner.VertexLLMModel(133:179)KAZU.kazu.steps.ner.spacy_ner.SpacyNerStep(5:40)KAZU.kazu.steps.ner.opsin.OpsinStep(24:380)KAZU.kazu.steps.ner.seth.SethStep(19:133)KAZU.kazu.steps.ner.gliner.GLiNERStep(125:318)KAZU.kazu.steps.ner.gliner.ConflictScorer(37:91)KAZU.kazu.steps.ner.gliner.MajorityVoteScorer(94:106)KAZU.kazu.steps.ner.gliner.MaxScoreScorer(109:122)KAZU.kazu.steps.ner.entity_post_processing.SplitOnConjunctionPattern(28:92)KAZU.kazu.steps.ner.entity_post_processing.SplitOnNumericalListPatternWithPrefix(95:166)
Links identified entities to external knowledge bases and disambiguates between potential links using dictionary, rule-based, and context-scoring strategies.
Related Classes/Methods:
KAZU.kazu.steps.linking.dictionary.DictionaryEntityLinkingStep(13:83)KAZU.kazu.steps.linking.entity_class_disambiguation.EntityClassDisambiguationStep(101:192)KAZU.kazu.steps.linking.entity_class_disambiguation.EntityClassTfIdfScorer(33:98)KAZU.kazu.steps.linking.rules_based_disambiguation.RulesBasedEntityClassDisambiguationFilterStep(36:287)KAZU.kazu.steps.linking.post_processing.mapping_step.MappingStep(12:25)KAZU.kazu.steps.linking.post_processing.strategy_runner.StrategyRunner(161:341)KAZU.kazu.steps.linking.post_processing.strategy_runner.ConfidenceLevelStrategyExecution(39:158)KAZU.kazu.steps.linking.post_processing.xref_manager.CrossReferenceManager(40:104)KAZU.kazu.steps.linking.post_processing.xref_manager.OxoCrossReferenceManager(107:215)KAZU.kazu.steps.linking.post_processing.mapping_strategies.strategies.MappingFactory(20:104)KAZU.kazu.steps.linking.post_processing.mapping_strategies.strategies.MappingStrategy(107:278)KAZU.kazu.steps.linking.post_processing.mapping_strategies.strategies.SymbolMatchMappingStrategy(295:334)KAZU.kazu.steps.linking.post_processing.mapping_strategies.strategies.SynNormIsSubStringMappingStrategy(337:398)KAZU.kazu.steps.linking.post_processing.mapping_strategies.strategies.StrongMatchMappingStrategy(401:476)KAZU.kazu.steps.linking.post_processing.mapping_strategies.strategies.StrongMatchWithEmbeddingConfirmationStringMatchingStrategy(479:552)KAZU.kazu.steps.linking.post_processing.disambiguation.strategies.DisambiguationStrategy(33:96)KAZU.kazu.steps.linking.post_processing.disambiguation.strategies.DefinedElsewhereInDocumentDisambiguationStrategy(99:148)KAZU.kazu.steps.linking.post_processing.disambiguation.strategies.TfIdfDisambiguationStrategy(151:268)KAZU.kazu.steps.linking.post_processing.disambiguation.strategies.GildaTfIdfDisambiguationStrategy(271:357)KAZU.kazu.steps.linking.post_processing.disambiguation.strategies.AnnotationLevelDisambiguationStrategy(360:396)KAZU.kazu.steps.linking.post_processing.disambiguation.strategies.PreferDefaultLabelMatchDisambiguationStrategy(399:445)KAZU.kazu.steps.linking.post_processing.disambiguation.strategies.PreferNearestEmbeddingToDefaultLabelDisambiguationStrategy(448:506)KAZU.kazu.steps.linking.post_processing.disambiguation.context_scoring.TfIdfScorer(48:88)KAZU.kazu.steps.linking.post_processing.disambiguation.context_scoring.GildaTfIdfScorer(91:231)
Manages the parsing, curation, and generation of synonyms for various ontologies, supporting external knowledge integration.
Related Classes/Methods:
KAZU.kazu.ontology_preprocessing.base.OntologyParser(52:742)KAZU.kazu.ontology_preprocessing.curation_utils.OntologyResourceSetConflictReport(132:163)KAZU.kazu.ontology_preprocessing.curation_utils.OntologyResourceSetMergeReport(167:208)KAZU.kazu.ontology_preprocessing.curation_utils.OntologyResourceSetCompleteReport(212:259)KAZU.kazu.ontology_preprocessing.curation_utils.OntologyStringConflictAnalyser(268:699)KAZU.kazu.ontology_preprocessing.curation_utils.OntologyResourceProcessor(709:1164)KAZU.kazu.ontology_preprocessing.downloads.SimpleOntologyDownloader(92:102)KAZU.kazu.ontology_preprocessing.downloads.OBOOntologyDownloader(105:114)KAZU.kazu.ontology_preprocessing.downloads.OwlOntologyDownloader(117:154)KAZU.kazu.ontology_preprocessing.downloads.ChemblParquetOntologyDownloader(157:215)KAZU.kazu.ontology_preprocessing.downloads.OpenTargetsOntologyDownloader(218:260)KAZU.kazu.ontology_preprocessing.autocuration.SymbolicToCaseSensitiveAction(12:33)KAZU.kazu.ontology_preprocessing.parsers.JsonLinesOntologyParser(full file reference)KAZU.kazu.ontology_preprocessing.parsers.OpenTargetsDiseaseOntologyParser(full file reference)KAZU.kazu.ontology_preprocessing.parsers.OpenTargetsTargetOntologyParser(full file reference)KAZU.kazu.ontology_preprocessing.parsers.RDFGraphParser(full file reference)KAZU.kazu.ontology_preprocessing.parsers.SKOSXLGraphParser(full file reference)KAZU.kazu.ontology_preprocessing.parsers.GeneOntologyParser(full file reference)KAZU.kazu.ontology_preprocessing.parsers.BiologicalProcessGeneOntologyParser(full file reference)KAZU.kazu.ontology_preprocessing.parsers.MolecularFunctionGeneOntologyParser(full file reference)KAZU.kazu.ontology_preprocessing.parsers.CellularComponentGeneOntologyParser(full file reference)KAZU.kazu.ontology_preprocessing.parsers.UberonOntologyParser(full file reference)KAZU.kazu.ontology_preprocessing.parsers.MondoOntologyParser(full file reference)KAZU.kazu.ontology_preprocessing.parsers.HGNCGeneOntologyParser(full file reference)KAZU.kazu.ontology_preprocessing.parsers.CLOOntologyParser(full file reference)KAZU.kazu.ontology_preprocessing.parsers.CellosaurusOntologyParser(full file reference)KAZU.kazu.ontology_preprocessing.parsers.MeddraOntologyParser(full file reference)KAZU.kazu.ontology_preprocessing.parsers.CLOntologyParser(full file reference)KAZU.kazu.ontology_preprocessing.parsers.TabularOntologyParser(full file reference)KAZU.kazu.ontology_preprocessing.parsers.ATCDrugClassificationParser(full file reference)KAZU.kazu.ontology_preprocessing.parsers.StatoParser(full file reference)KAZU.kazu.ontology_preprocessing.synonym_generation.SynonymGenerator(21:39)KAZU.kazu.ontology_preprocessing.synonym_generation.CombinatorialSynonymGenerator(42:105)KAZU.kazu.ontology_preprocessing.synonym_generation.SeparatorExpansion(110:153)KAZU.kazu.ontology_preprocessing.synonym_generation.StringReplacement(200:268)KAZU.kazu.ontology_preprocessing.synonym_generation.TokenListReplacementGenerator(338:382)KAZU.kazu.ontology_preprocessing.synonym_generation.VerbPhraseVariantGenerator(385:460)KAZU.kazu.ontology_preprocessing.ontology_upgrade_report.OntologyUpgradeReport(15:81)
Provides functionalities for training, predicting, and evaluating machine learning models, particularly for multi-label NER, including data handling and metric calculation.
Related Classes/Methods:
KAZU.kazu.training.predict_script.main(full file reference)KAZU.kazu.training.train_multilabel_ner.ModelSaver(55:113)KAZU.kazu.training.train_multilabel_ner.KazuMultiHotNerMultiLabelTrainingDataset(116:225)KAZU.kazu.training.train_multilabel_ner.calculate_metrics(242:302)KAZU.kazu.training.train_multilabel_ner.Trainer(305:514)KAZU.kazu.training.modelling_utils.doc_yielder(26:32)KAZU.kazu.training.modelling_utils.test_doc_yielder(35:53)KAZU.kazu.training.modelling_utils.get_label_list(62:69)KAZU.kazu.training.modelling_utils.LSManagerViewWrapper(80:120)KAZU.kazu.training.modelling_utils.create_wrapper(123:132)KAZU.kazu.training.evaluate_script.main(full file reference)KAZU.kazu.training.train_script.create_view_for_labels(28:36)KAZU.kazu.training.train_script.run(40:100)KAZU.kazu.training.modelling.DebertaForMultiLabelTokenClassification(52:101)KAZU.kazu.training.modelling.DistilBertForMultiLabelTokenClassification(104:155)KAZU.kazu.training.modelling.BertForMultiLabelTokenClassification(158:214)
Offers interactive tools for managing and curating Kazu resources, including resource editing, conflict resolution, and ontology updates.
Related Classes/Methods:
KAZU.kazu.krt.resource_manager.ResourceManager(19:177)KAZU.kazu.krt.components.save(23:40)KAZU.kazu.krt.components.PlaceholderResource(51:87)KAZU.kazu.krt.components.ResourceEditor(90:494)KAZU.kazu.krt.components.ParserSelector(497:531)KAZU.kazu.krt.utils.load_parsers(12:17)KAZU.kazu.krt.utils.get_resource_manager(21:22)KAZU.kazu.krt.ontology_update_editor.components.get_upgrade_manager(8:14)KAZU.kazu.krt.ontology_update_editor.components.OntologyUpdateForm(17:155)KAZU.kazu.krt.ontology_update_editor.utils.OntologyUpdateManager(18:142)KAZU.kazu.krt.resource_discrepancy_editor.components.get_resource_merge_manager(13:18)KAZU.kazu.krt.resource_discrepancy_editor.components.get_resource_merge_manager_for_parser(27:28)KAZU.kazu.krt.resource_discrepancy_editor.components.ResourceDiscrepancyResolutionForm(31:192)KAZU.kazu.krt.resource_discrepancy_editor.utils.SynonymDiscrepancy(11:73)KAZU.kazu.krt.resource_discrepancy_editor.utils.ResourceDiscrepancyManger(76:172)KAZU.kazu.krt.string_editor.components.get_manager(18:19)KAZU.kazu.krt.string_editor.components.StringConflictForm(28:219)KAZU.kazu.krt.string_editor.utils.ResourceConflict(21:90)KAZU.kazu.krt.string_editor.utils.ResourceConflictManager(93:243)KAZU.kazu.krt.pages.4_pipeline_test.load_pipeline_after_change(full file reference)KAZU.kazu.krt.pages.4_pipeline_test._process_text(full file reference)
Provides RESTful API endpoints for external applications to interact with the KAZU system, enabling NER and entity linking operations.
Related Classes/Methods:
KAZU.kazu.web.req_id_header.AddRequestIdMiddleware(full file reference)KAZU.kazu.web.jwtauth.JWTAuthenticationBackend(full file reference)KAZU.kazu.web.server.get_id_log_prefix_if_available(150:160)KAZU.kazu.web.server.log_request_to_path_with_prefix(163:175)KAZU.kazu.web.server.SectionedWebDocument(205:212)KAZU.kazu.web.server.SimpleWebDocument(215:222)KAZU.kazu.web.server.DocumentCollection(228:244)KAZU.kazu.web.server.SingleEntityDocumentConverter(319:347)KAZU.kazu.web.server.KazuWebAPI(352:616)KAZU.kazu.web.ls_web_utils.LSWebUtils(12:28)
A collection of reusable utility functions and helper classes supporting various KAZU functionalities, including string normalization, caching, and abbreviation detection.
Related Classes/Methods:
KAZU.kazu.utils.abbreviation_detector.filter_matches(full file reference)KAZU.kazu.utils.abbreviation_detector.KazuAbbreviationDetector(full file reference)KAZU.kazu.utils.caching.EntityLinkingLookupCache(95:128)KAZU.kazu.utils.spacy_object_mapper.KazuToSpacyObjectMapper(7:96)KAZU.kazu.utils.sapbert.SapBertHelper(83:253)KAZU.kazu.utils.download_gilda_contexts.retry_wiki_with_maxlag(full file reference)KAZU.kazu.utils.download_gilda_contexts.get_wikipedia_url_from_wikidata_id(full file reference)KAZU.kazu.utils.download_gilda_contexts.get_wikipedia_contents_from_urls(full file reference)KAZU.kazu.utils.download_gilda_contexts.create_wiki_mappings(full file reference)KAZU.kazu.utils.download_gilda_contexts.extract_open_targets(full file reference)KAZU.kazu.utils.link_index.DictionaryIndex(25:138)KAZU.kazu.utils.string_normalizer.DefaultStringNormalizer(42:235)KAZU.kazu.utils.string_normalizer.DiseaseStringNormalizer(238:264)KAZU.kazu.utils.string_normalizer.AnatomyStringNormalizer(267:291)KAZU.kazu.utils.string_normalizer.GeneStringNormalizer(294:387)KAZU.kazu.utils.string_normalizer.CompanyStringNormalizer(390:412)KAZU.kazu.utils.utils.linking_candidates_to_ontology_string_resources(23:49)KAZU.kazu.utils.utils.documents_to_document_section_batch_encodings_map(80:106)KAZU.kazu.utils.utils.create_char_ngrams(171:173)KAZU.kazu.utils.utils.create_word_ngrams(176:180)KAZU.kazu.utils.build_and_test_model_packs.ModelPackBuilder(65:309)KAZU.kazu.utils.build_and_test_model_packs.build_all_model_packs(317:385)
Provides tools for converting KAZU data for annotation and performing acceptance tests to ensure the quality and consistency of annotations and pipeline results.
Related Classes/Methods:
KAZU.kazu.annotation.label_studio.KazuToLabelStudioConverter(30:191)KAZU.kazu.annotation.label_studio.LSToKazuConversion(194:335)KAZU.kazu.annotation.label_studio.LabelStudioAnnotationView(338:478)KAZU.kazu.annotation.label_studio.LabelStudioManager(481:655)KAZU.kazu.annotation.acceptance_test.execute_full_pipeline_acceptance_test(33:36)KAZU.kazu.annotation.acceptance_test.SectionScorer(39:104)KAZU.kazu.annotation.acceptance_test.score_sections(107:133)KAZU.kazu.annotation.acceptance_test.aggregate_ner_results(197:210)KAZU.kazu.annotation.acceptance_test.check_results_meet_threshold(233:262)KAZU.kazu.annotation.acceptance_test.analyse_full_pipeline(265:280)KAZU.kazu.annotation.acceptance_test.analyse_annotation_consistency(283:304)KAZU.kazu.annotation.acceptance_test.check_annotation_consistency(308:312)