featureCounts Tutoral

Overview of featureCounts

What is featureCounts?

featureCounts is a powerful tool used in bioinformatics to summarize mapped reads for various genomic features such as genes, exons, promoters, gene bodies, genomic bins, and chromosomal locations. It is designed to work with data from both RNA-seq and genomic DNA-seq reads. This program is part of the SourceForge Subread package or the Bioconductor Rsubread package.

How does featureCounts work?

The program takes SAM/BAM files as input along with an annotation file that includes chromosomal coordinates of features. The annotation file can be in GTF format or a simplified annotation format (SAF). featureCounts then outputs the number of reads assigned to each feature or meta-feature, as well as statistical information about the summarization results, such as the number of successfully assigned reads and the number of reads that failed to be assigned for various reasons.

Why use featureCounts?

One of the main advantages of using featureCounts is its efficiency. It is known for its ultrafast feature search algorithm and its highly efficient implementation in the C programming language. This efficiency is not only in terms of speed but also in terms of memory usage, making it suitable for large-scale sequencing studies. Additionally, featureCounts supports multithreading, which allows for even faster processing when dealing with large datasets.

Key features of featureCounts

  • Efficient: featureCounts is optimized for performance, with a focus on speed and low memory usage.
  • Accurate: It provides accurate read summarization, with high concordance between alternative methods.
  • Multithreading support: The tool can process data in parallel, making it ideal for large datasets.
  • Flexible input formats: It accepts both SAM and BAM files, as well as GTF and SAF annotation formats.
  • Comprehensive output: Along with read counts, it provides detailed statistics on the summarization process.

In the following sections, we will delve into how to install featureCounts, get started with basic usage, and explore some popular commands through code examples.

Installation

Before we can start using featureCounts, we need to install it. featureCounts is available as part of the Subread package, which can be downloaded from SourceForge, or as part of the Rsubread package for R users, available through Bioconductor.

Installing Subread package

To install the Subread package, which includes featureCounts, follow these steps:

  1. Visit the Subread SourceForge page.
  2. Download the appropriate version for your operating system (Linux, macOS, or Windows).
  3. Extract the downloaded file to a directory of your choice.
  4. Add the bin directory from the extracted files to your PATH environment variable to access the featureCounts command from anywhere.

Installing Rsubread package

For R users, the Rsubread package can be installed using Bioconductor with the following commands in R:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("Rsubread")

After installation, you can load the package using library(Rsubread).

Quick Start

Once featureCounts is installed, you can start using it to count reads. Here's a quick guide to get you started:

  1. Prepare your input files: a SAM or BAM file containing your aligned reads and a GTF or SAF file containing the annotations.
  2. Run featureCounts with a command like:
featureCounts -a annotation.gtf -o counts.txt alignments.bam

This command will count the reads in alignments.bam that map to the features described in annotation.gtf and output the results to counts.txt.

Code Examples Of Popular Commands

Let's look at five popular commands that you can use with featureCounts to perform various tasks:

1. Basic read counting

featureCounts -t exon -g gene_id -a annotation.gtf -o gene_counts.txt alignments.bam

This command counts reads aligned to exons and summarizes them at the gene level using the gene_id attribute from the GTF file.

2. Counting with multiple threads

featureCounts -T 4 -a annotation.gtf -o counts.txt alignments.bam

Using the -T option, this command utilizes 4 CPU threads to speed up the counting process.

3. Counting paired-end reads

featureCounts -p -a annotation.gtf -o paired_counts.txt alignments.bam

The -p flag tells featureCounts to count fragments (or pairs of reads) instead of individual reads, which is important for paired-end data.

4. Filtering multi-mapping reads

featureCounts -M -a annotation.gtf -o multimapped_counts.txt alignments.bam

The -M option allows featureCounts to include reads that map to multiple locations, which can be useful in certain analyses.

5. Annotating reads with strand specificity

featureCounts -s 2 -a annotation.gtf -o strand_specific_counts.txt alignments.bam

The -s option specifies the strand-specificity of the library: 1 for stranded, 2 for reversely stranded, and 0 for unstranded.

These examples provide a glimpse into the versatility of featureCounts. By combining different options, you can tailor the read counting process to fit the specific needs of your analysis.

In conclusion, featureCounts is a valuable tool in the bioinformatics toolkit, offering speed, accuracy, and flexibility for read summarization tasks. Whether you're working with RNA-seq or DNA-seq data, featureCounts can help you efficiently quantify genomic features and advance your research.