Skip to content

Latest commit

 

History

History
130 lines (84 loc) · 6.16 KB

File metadata and controls

130 lines (84 loc) · 6.16 KB
graph LR
    Bioinformatics_Workflows_Tools["Bioinformatics Workflows & Tools"]
    Main_Workflows["Main Workflows"]
    Subworkflows["Subworkflows"]
    Atomic_Modules["Atomic Modules"]
    Containerization_and_Environment_Management["Containerization and Environment Management"]
    External_Tool_Integration_Layer["External Tool Integration Layer"]
    Data_Access_Layer["Data Access Layer"]
    Configuration_Management["Configuration Management"]
    Error_Handling_and_Logging["Error Handling and Logging"]
    Data_Schema_and_Validation["Data Schema and Validation"]
    Bioinformatics_Workflows_Tools -- "orchestrates" --> Main_Workflows
    Bioinformatics_Workflows_Tools -- "integrates with" --> External_Tool_Integration_Layer
    Bioinformatics_Workflows_Tools -- "interacts with" --> Data_Access_Layer
    Bioinformatics_Workflows_Tools -- "consumes" --> Configuration_Management
    Bioinformatics_Workflows_Tools -- "reports to" --> Error_Handling_and_Logging
    Bioinformatics_Workflows_Tools -- "adheres to" --> Data_Schema_and_Validation
    Main_Workflows -- "composes" --> Subworkflows
    Main_Workflows -- "depends on" --> Containerization_and_Environment_Management
    Subworkflows -- "composes" --> Atomic_Modules
    Subworkflows -- "depends on" --> Containerization_and_Environment_Management
    Atomic_Modules -- "utilizes" --> Containerization_and_Environment_Management
    Atomic_Modules -- "wraps" --> External_Tool_Integration_Layer
    click Bioinformatics_Workflows_Tools href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/genome-manager/Bioinformatics_Workflows_Tools.md" "Details"
Loading

CodeBoardingDemoContact

Details

Overview of the Bioinformatics Workflows & Tools component, detailing its structure, flow, purpose, and interactions with other parts of the system.

Bioinformatics Workflows & Tools [Expand]

The top-level component responsible for orchestrating and executing complex bioinformatics pipelines using Nextflow. It acts as the central hub for all genomic data processing, ensuring reproducibility and scalability.

Related Classes/Methods:

  • main.nf (1:1)
  • workflows/ (1:1)
  • subworkflows/ (1:1)
  • modules/ (1:1)
  • genome_manager/Dockerfile (1:1)
  • genome_manager/env.yaml (1:1)

Main Workflows

High-level Nextflow scripts that define complete, end-to-end bioinformatics pipelines for specific use cases (e.g., adding a new genome). They serve as the primary entry points for users to initiate complex operations.

Related Classes/Methods:

  • main.nf (1:1)
  • workflows/add_genome.nf (1:1)
  • workflows/register_user_defined_gene.nf (1:1)
  • workflows/update_user_defined_gene.nf (1:1)

Subworkflows

Intermediate Nextflow scripts that encapsulate common sequences of atomic modules. They promote code reusability and modularity by breaking down complex main workflows into manageable, logical steps.

Related Classes/Methods:

  • subworkflows/prep_genome.nf (1:1)
  • subworkflows/make_picard_files.nf (1:1)
  • load_genome_extra_files/subworkflows/load_genome.nf (1:1)

Atomic Modules

The smallest, self-contained units of work within the Nextflow ecosystem. Each module typically wraps a single external bioinformatics tool or performs a very specific data manipulation step, ensuring high reusability and testability.

Related Classes/Methods:

  • modules/genome_manager/download_genome/download_genome.nf (1:1)
  • modules/genome_manager/register_genome/register_genome.nf (1:1)
  • modules/genome_manager/register_gene/register_gene.nf (1:1)
  • modules/genome_manager/update_gene/update_gene.nf (1:1)
  • modules/star/genomegenerate/main.nf (1:1)
  • load_genome_extra_files/modules/custom/gtf2gff3/gtf2gff3.nf (1:1)
  • modules/unpigz/unpigz.nf (1:1)
  • modules/gffread/to_transcriptome/main.nf (1:1)
  • modules/nf-core/custom/dumpsoftwareversions/main.nf (1:1)

Containerization and Environment Management

This component defines and manages the isolated execution environments for all bioinformatics tasks. It leverages Docker for containerization and Conda/Mamba (via env.yaml) for precise dependency management, ensuring workflow reproducibility and portability across different systems.

Related Classes/Methods:

  • genome_manager/Dockerfile (1:1)
  • genome_manager/env.yaml (1:1)

External Tool Integration Layer

Represents the interface and mechanisms through which the Nextflow modules interact with and execute external bioinformatics command-line tools (e.g., STAR, Picard, GFFread).

Related Classes/Methods: None

Data Access Layer

Handles the reading and writing of genomic data (e.g., FASTA, GTF, GFF files) and metadata to and from the underlying data storage. Workflows interact with this layer to retrieve input data and persist results.

Related Classes/Methods: None

Configuration Management

Manages all configurable parameters for the workflows, including paths to reference data, tool-specific settings, and execution profiles. This ensures flexibility and adaptability of the workflows without modifying source code.

Related Classes/Methods: None

Error Handling and Logging

Provides mechanisms for capturing, reporting, and logging errors and execution events from the Nextflow workflows and underlying tools. This is critical for debugging, monitoring, and ensuring the reliability of long-running bioinformatics tasks.

Related Classes/Methods: None

Data Schema and Validation

Defines the expected structure and format of input and output data for workflows and modules. It includes mechanisms to validate data against these schemas, ensuring data quality and compatibility throughout the pipeline.

Related Classes/Methods: None