- Uses gff2fasta to convert a
.gtfformatted geneset into a FASTA format, building a transcriptome. - Builds a salmon index from the FASTA formatted transcriptome generated
- Quantifies
.fastqformatted RNA-seq reads and - Generates gene expression estimates (TPM and counts) at the transcript and gene level, using Salmon as an alignment-free expression estimation.
- A geneset in a
.gtfformat. Unzip.gtf.gzfiles before running the pipeline_salmonquant. - RNA-seq reads in
.fastq.gzformat
Clone/download the repository
Copy the pipeline_salmonquant.py and pipeline_salmonquant folder to your cgat/cgat-flow/CGATPipelines/.
The pipeline requires a configured :file: pipeline.yml file.
Make a directory with your project name, for Salmon quantification.
Configure the pipeline with cgatflow salmonquant config.
A pipeline.log and pipeline.yml file(s) will be added to your new directory.
Modify the pipeline.yml according to your project (specify genome, genome directory, annotation database and directory, database for uploading the outputs; specify options for Salmon quantification).
Run the pipeline with cgatflow salmonquant make full.
For running the pipeline on a large set of samples, submit the pipeline onto the cluster (sharc), using a submit_pipeline_cgtaflow custom script.