StarSolo Tutoral

Overview of StarSolo

StarSolo is a powerful module designed for the quantification of gene expression in single-cell and single-nucleus RNA-seq data. It is a comprehensive solution that is integrated into the widely-used RNA-seq aligner, STAR (Spliced Transcripts Alignment to a Reference). StarSolo stands out for its high accuracy and speed, significantly outperforming other pseudoalignment-to-transcriptome tools.

The module is capable of handling read mapping, read-to-gene assignment, cell barcode demultiplexing, and UMI (Unique Molecular Identifier) collapsing in a tightly integrated manner. This integration avoids input/output bottlenecks and enhances processing speed. Unlike tools that align reads only to the transcriptome, StarSolo aligns reads to the full genome, which results in higher accuracy.

StarSolo is not only faster than CellRanger, the most widely used tool for pre-processing scRNA-seq data, but it also accounts for multi-gene reads. This is crucial for detecting certain classes of biologically important genes, such as paralogs. Moreover, StarSolo supports a flexible cell barcode processing scheme, making it compatible with many established scRNA-seq protocols and extendable to emerging technologies.

Beyond gene expression, StarSolo can quantify other transcriptomic features, such as splice junctions and spliced/unspliced transcripts, which are essential for RNA Velocity calculations. It can also output a standard BAM file containing read alignments and error-corrected cell barcodes and UMIs for a variety of downstream analyses, including differential splicing, alternative polyadenylation, allele-specific expression, and fusion detection.

StarSolo is truly open-source software, distributed under the MIT license on GitHub, which encourages community contributions and feature requests. Its robust computational efficiency and technological versatility make it a valuable tool for single-cell genomic studies.

Installation

To install StarSolo, you will need to download and compile the STAR aligner, which includes the StarSolo module. The installation process typically involves the following steps:

  1. Ensure that you have a compatible operating system (Linux or macOS) and the necessary build tools (e.g., make and gcc).
  2. Download the latest STAR source code from the official GitHub repository.
  3. Unpack the downloaded source code and navigate to the source directory.
  4. Compile the STAR aligner by running make in the source directory.
  5. Add the STAR executable to your system's PATH to make it accessible from any location.

Detailed installation instructions and system requirements can be found in the STAR aligner's README file on GitHub.

Quick Start

To quickly start using StarSolo for single-cell RNA-seq data analysis, you will need to prepare your input files, including the reference genome, gene annotation files, and the single-cell RNA-seq reads. Once you have these files ready, you can run StarSolo with a command that specifies the necessary parameters for your single-cell RNA-seq protocol.

Here is a basic example of a StarSolo command:

STAR --runThreadN NumberOfThreads \
     --genomeDir /path/to/genomeDir \
     --readFilesIn /path/to/read1.fastq /path/to/read2.fastq \
     --soloType Droplet \
     --soloCBwhitelist /path/to/whitelist.txt \
     --soloUMIlen UmiLength \
     --soloCBlen CellBarcodeLength \
     --outFileNamePrefix /path/to/output

This command will start the StarSolo analysis using the specified number of threads, reference genome directory, input read files, and single-cell parameters such as the type of protocol (e.g., Droplet), cell barcode whitelist, UMI length, and cell barcode length. The output will be prefixed as specified in the command.

Code Examples Of Popular Commands

Here are five popular StarSolo command examples that you can use as a reference for your analyses:

  1. Basic Gene Quantification

    STAR --runThreadN 8 \
         --genomeDir /path/to/genomeDir \
         --readFilesIn /path/to/read1.fastq /path/to/read2.fastq \
         --soloType Droplet \
         --soloCBwhitelist /path/to/whitelist.txt \
         --soloUMIlen 10 \
         --soloCBlen 16 \
         --outFileNamePrefix /path/to/output
    
  2. Quantification with Multi-Gene Reads

    STAR --runThreadN 8 \
         --genomeDir /path/to/genomeDir \
         --readFilesIn /path/to/read1.fastq /path/to/read2.fastq \
         --soloType Droplet \
         --soloFeatures GeneFull \
         --soloCBwhitelist /path/to/whitelist.txt \
         --soloUMIlen 10 \
         --soloCBlen 16 \
         --outFileNamePrefix /path/to/output
    
  3. Quantification for RNA Velocity

    STAR --runThreadN 8 \
         --genomeDir /path/to/genomeDir \
         --readFilesIn /path/to/read1.fastq /path/to/read2.fastq \
         --soloType Droplet \
         --soloFeatures Gene Velocyto \
         --soloCBwhitelist /path/to/whitelist.txt \
         --soloUMIlen 10 \
         --soloCBlen 16 \
         --outFileNamePrefix /path/to/output
    
  4. Using Custom Cell Barcode and UMI Patterns

    STAR --runThreadN 8 \
         --genomeDir /path/to/genomeDir \
         --readFilesIn /path/to/read1.fastq /path/to/read2.fastq \
         --soloType Droplet \
         --soloCBpattern NNNNNNNNNNCCCCCCCCCC \
         --soloUMIpattern NNNNNNNNNN \
         --outFileNamePrefix /path/to/output
    
  5. Outputting BAM for Downstream Analysis

    STAR --runThreadN 8 \
         --genomeDir /path/to/genomeDir \
         --readFilesIn /path/to/read1.fastq /path/to/read2.fastq \
         --soloType Droplet \
         --soloCBwhitelist /path/to/whitelist.txt \
         --soloUMIlen 10 \
         --soloCBlen 16 \
         --outSAMtype BAM SortedByCoordinate \
         --outFileNamePrefix /path/to/output
    

These examples cover a range of common scenarios for single-cell RNA-seq data analysis with StarSolo. Users can adjust the parameters to fit their specific experimental setup and analysis needs.