Salmon Tutorial

Overview of Salmon

Salmon is a bioinformatics tool designed to quantify transcript abundance from RNA-seq data. It is known for its speed and accuracy, and it stands out because it corrects for fragment GC content bias, which significantly improves the accuracy of abundance estimates and the reliability of subsequent differential expression analysis.

Salmon employs a novel dual-phase parallel inference algorithm and feature-rich bias models, combined with an ultra-fast read mapping procedure. This allows it to account for various biases that are typical in RNA-seq data, such as positional biases in coverage, sequence-specific biases at the ends of sequenced fragments, fragment-level GC bias, strand-specific protocols, and the fragment length distribution.

The tool operates in two phases: an online phase that estimates initial expression levels and model parameters, and an offline phase that refines expression estimates. This two-phase inference procedure enables Salmon to build a probabilistic model of the sequencing experiment, incorporating information beyond simple fragment-transcript compatibility.

Salmon's rich model accounts for the effects of sample-specific parameters and biases, automatically learning these in the online phase. It also estimates the conditional probability of a fragment being generated from each transcript to which it multimaps. This is assessed by a general fragment-transcript agreement model, providing considerable information beyond simple fragment-transcript compatibility.

Salmon can be run in quasi-mapping mode, where it takes as input an index of the transcriptome and a set of raw sequencing reads (i.e., unaligned reads in FASTA/Q format) and performs quantification directly without generating any intermediate alignment files. This saves considerable time and space since quasi-mapping is faster than traditional alignment.

The development and support for Salmon are openly conducted on GitHub, with additional support provided through a Google Users Group and a Gitter channel. This ensures that users have multiple avenues for having their questions answered quickly and efficiently.

Salmon's dual-phase inference algorithm and sample-specific bias models yield improved inter-replicate concordance compared to other tools like kallisto and eXpress. It also produces fewer false-positive differential expression calls in comparisons that are expected to contain few or no true differences in transcript expression. The use of Salmon's estimates for gene-level differential expression analysis leads to a decrease in the number of genes that are called as differentially expressed.

In summary, Salmon is a comprehensive tool that encompasses both alignment and quantification in a single package, providing a fast and bias-aware approach to transcript quantification.

Installation

To install Salmon, users can obtain the software from the official GitHub repository or through package managers like homebrew-science and bioconda, which simplify the installation and upgrading process. Detailed installation instructions are typically provided in the repository's README file, guiding users through the process for different operating systems.

Quick Start

Once installed, users can quickly start using Salmon by creating an index of the transcriptome and then quantifying transcript abundance from RNA-seq reads. The basic steps involve building the index with the salmon index command and then running the quantification with the salmon quant command, specifying the index and the location of the sequencing reads.

Code Examples Of Popular Commands

Here are five popular commands used with Salmon:

  1. Building the Transcriptome Index:

    salmon index -t transcripts.fa -i transcripts_index --type quasi -k 31
    

    This command creates an index of the transcriptome, which is required for quantification. The -t option specifies the transcript sequences, -i specifies the output directory for the index, --type sets the type of index, and -k sets the k-mer size used in the quasi-mapping.

  2. Quantifying Transcripts:

    salmon quant -i transcripts_index -l A -r reads.fq -o quant
    

    This command runs the quantification process using the index created earlier. The -l option specifies the library type, -r specifies the location of the reads, and -o specifies the output directory for the quantification results.

  3. Quantifying with Paired-End Reads:

    salmon quant -i transcripts_index -l A -1 reads_1.fq -2 reads_2.fq -o quant
    

    Similar to the previous command, but for paired-end reads, where -1 and -2 specify the locations of the forward and reverse reads, respectively.

  4. Using Bias Correction:

    salmon quant -i transcripts_index -l A -r reads.fq --gcBias -o quant
    

    This command includes the --gcBias flag to enable GC bias correction during quantification, which can improve the accuracy of abundance estimates.

  5. Generating Gene-Level Estimates:

    salmon quant -i transcripts_index -l A -r reads.fq -g transcripts_to_genes.txt -o quant
    

    This command uses the -g option to specify a mapping from transcripts to genes, allowing Salmon to produce gene-level abundance estimates in addition to transcript-level estimates.

These commands represent a starting point for using Salmon. Users should consult the official documentation for more advanced options and best practices tailored to their specific datasets and research questions.