graph LR
Genomic_Data_Models["Genomic Data Models"]
Sequence_Processing["Sequence Processing"]
Genomic_Data_Loaders["Genomic Data Loaders"]
Genomic_Feature_Extractors["Genomic Feature Extractors"]
Variant_Data_Management["Variant Data Management"]
Genomic_Data_Models -- "provides data models to" --> Sequence_Processing
Genomic_Data_Models -- "provides data models to" --> Genomic_Data_Loaders
Genomic_Data_Models -- "provides data models to" --> Genomic_Feature_Extractors
Genomic_Data_Models -- "provides data models to" --> Variant_Data_Management
Sequence_Processing -- "applies transformations to data for" --> Genomic_Data_Loaders
Sequence_Processing -- "applies transformations to extracted data" --> Genomic_Feature_Extractors
Genomic_Data_Loaders -- "loads data using" --> Genomic_Feature_Extractors
Genomic_Data_Loaders -- "integrates variant information from" --> Variant_Data_Management
Genomic_Feature_Extractors -- "provides sequences and features to" --> Genomic_Data_Loaders
Genomic_Feature_Extractors -- "extracts sequences for variant analysis" --> Variant_Data_Management
Variant_Data_Management -- "provides variant data to" --> Genomic_Data_Loaders
Variant_Data_Management -- "queries and filters variants for extraction" --> Genomic_Feature_Extractors
click Genomic_Data_Models href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/kipoiseq/Genomic Data Models.md" "Details"
click Sequence_Processing href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/kipoiseq/Sequence Processing.md" "Details"
click Genomic_Data_Loaders href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/kipoiseq/Genomic Data Loaders.md" "Details"
click Genomic_Feature_Extractors href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/kipoiseq/Genomic Feature Extractors.md" "Details"
click Variant_Data_Management href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/kipoiseq/Variant Data Management.md" "Details"
The kipoiseq library provides a comprehensive framework for handling and processing genomic sequence and variant data. Its main flow involves loading biological data from various sources (FASTA, GTF, VCF), transforming these sequences for downstream analysis (e.g., one-hot encoding), extracting specific genomic features or sequences around variants, and managing genetic variant information. The library's purpose is to facilitate the preparation and manipulation of genomic data for machine learning and other bioinformatics applications.
Provides fundamental data structures like Variant for genetic variations and Interval for genomic regions, which are used throughout the kipoiseq library to represent and manipulate biological data.
Related Classes/Methods:
Offers a suite of functions and classes for transforming biological sequences, including one-hot encoding, resizing intervals, and handling sequence axes, crucial for preparing data for machine learning models. It also includes general-purpose helper functions.
Related Classes/Methods:
kipoiseq.kipoiseq.transforms.transforms.OneHot(91:124)kipoiseq.kipoiseq.transforms.functional.one_hot_dna(118:133)kipoiseq.kipoiseq.transforms.transforms.ResizeInterval(77:86)kipoiseq.utils.batch_iter(48:57)kipoiseq.utils.parse_alphabet(27:31)kipoiseq.utils.parse_dtype(34:42)kipoiseq.utils.to_scalar(18:24)
Provides various data loaders for handling genomic sequences based on intervals, BED files, and GTF annotations, supporting both string and one-hot encoded sequence outputs, including specialized loaders for splicing and protein data.
Related Classes/Methods:
kipoiseq.kipoiseq.dataloaders.sequence.SeqIntervalDl(268:384)kipoiseq.kipoiseq.dataloaders.sequence.StringSeqIntervalDl(135:261)kipoiseq.kipoiseq.dataloaders.sequence.BedDataset(26:131)kipoiseq.kipoiseq.dataloaders.splicing.MMSpliceDl(182:276)kipoiseq.kipoiseq.dataloaders.splicing.ExonInterval(32:129)kipoiseq.kipoiseq.dataloaders.protein.SingleVariantProteinDataLoader(30:164)kipoiseq.kipoiseq.dataloaders.protein.SingleVariantUTRDataLoader(168:356)
Extracts genomic sequences from FASTA files and features from GTF files (e.g., CDS, UTRs). It also provides functionalities for extracting sequences across multiple genomic intervals and protein sequences.
Related Classes/Methods:
kipoiseq.extractors.fasta.FastaStringExtractor(7:63)kipoiseq.kipoiseq.extractors.gtf.CDSFetcher(154:265)kipoiseq.kipoiseq.extractors.gtf.UTRFetcher(268:377)kipoiseq.kipoiseq.extractors.gtf.GTFMultiIntervalFetcher(88:151)kipoiseq.kipoiseq.extractors.protein.ProteinSeqExtractor(169:180)kipoiseq.kipoiseq.extractors.protein.TranscriptSeqExtractor(112:166)kipoiseq.kipoiseq.extractors.protein.UTRSeqExtractor(35:69)kipoiseq.kipoiseq.extractors.multi_interval.GenericMultiIntervalSeqExtractor(129:195)
Manages core operations related to Variant Call Format (VCF) files, including fetching, querying, filtering, and matching genetic variants. It also handles the extraction of genomic sequences around variants and the generation of variant combinations.
Related Classes/Methods:
kipoiseq.kipoiseq.extractors.vcf.MultiSampleVCF(21:211)kipoiseq.kipoiseq.extractors.vcf_query.VariantQuery(33:45)kipoiseq.kipoiseq.extractors.vcf_query.VariantIntervalQueryable(113:281)kipoiseq.kipoiseq.extractors.vcf_matching.SingleVariantMatcher(199:258)kipoiseq.kipoiseq.extractors.vcf_matching.variants_to_pyranges(22:39)kipoiseq.kipoiseq.extractors.vcf_matching.intervals_to_pyranges(63:83)kipoiseq.kipoiseq.extractors.vcf_seq.VariantSeqExtractor(60:303)kipoiseq.kipoiseq.extractors.vcf_seq.SingleVariantVCFSeqExtractor(328:341)kipoiseq.kipoiseq.extractors.vcf_seq.SingleSeqVCFSeqExtractor(344:355)kipoiseq.kipoiseq.variant_source.VariantFetcher(8:39)kipoiseq.kipoiseq.extractors.variant_combinations.VariantCombinator(9:98)