featureCounts is a powerful tool used in bioinformatics to summarize mapped reads for various genomic features such as genes, exons, promoters, gene bodies, genomic bins, and chromosomal locations. It is designed to work with data from both RNA-seq and genomic DNA-seq reads. This program is part of the SourceForge Subread package or the Bioconductor Rsubread package.
The program takes SAM/BAM files as input along with an annotation file that includes chromosomal coordinates of features. The annotation file can be in GTF format or a simplified annotation format (SAF). featureCounts then outputs the number of reads assigned to each feature or meta-feature, as well as statistical information about the summarization results, such as the number of successfully assigned reads and the number of reads that failed to be assigned for various reasons.
One of the main advantages of using featureCounts is its efficiency. It is known for its ultrafast feature search algorithm and its highly efficient implementation in the C programming language. This efficiency is not only in terms of speed but also in terms of memory usage, making it suitable for large-scale sequencing studies. Additionally, featureCounts supports multithreading, which allows for even faster processing when dealing with large datasets.
- Efficient: featureCounts is optimized for performance, with a focus on speed and low memory usage.
- Accurate: It provides accurate read summarization, with high concordance between alternative methods.
- Multithreading support: The tool can process data in parallel, making it ideal for large datasets.
- Flexible input formats: It accepts both SAM and BAM files, as well as GTF and SAF annotation formats.
- Comprehensive output: Along with read counts, it provides detailed statistics on the summarization process.
In the following sections, we will delve into how to install featureCounts, get started with basic usage, and explore some popular commands through code examples.
Before we can start using featureCounts, we need to install it. featureCounts is available as part of the Subread package, which can be downloaded from SourceForge, or as part of the Rsubread package for R users, available through Bioconductor.
To install the Subread package, which includes featureCounts, follow these steps:
- Visit the Subread SourceForge page.
- Download the appropriate version for your operating system (Linux, macOS, or Windows).
- Extract the downloaded file to a directory of your choice.
- Add the bin directory from the extracted files to your PATH environment variable to access the featureCounts command from anywhere.
For R users, the Rsubread package can be installed using Bioconductor with the following commands in R:
if (!requireNamespace("BiocManager", quietly = TRUE))
After installation, you can load the package using
Once featureCounts is installed, you can start using it to count reads. Here's a quick guide to get you started:
- Prepare your input files: a SAM or BAM file containing your aligned reads and a GTF or SAF file containing the annotations.
- Run featureCounts with a command like:
featureCounts -a annotation.gtf -o counts.txt alignments.bam
This command will count the reads in
alignments.bam that map to the features described in
annotation.gtf and output the results to
Let's look at five popular commands that you can use with featureCounts to perform various tasks:
featureCounts -t exon -g gene_id -a annotation.gtf -o gene_counts.txt alignments.bam
This command counts reads aligned to exons and summarizes them at the gene level using the
gene_id attribute from the GTF file.
featureCounts -T 4 -a annotation.gtf -o counts.txt alignments.bam
-T option, this command utilizes 4 CPU threads to speed up the counting process.
featureCounts -p -a annotation.gtf -o paired_counts.txt alignments.bam
-p flag tells featureCounts to count fragments (or pairs of reads) instead of individual reads, which is important for paired-end data.
featureCounts -M -a annotation.gtf -o multimapped_counts.txt alignments.bam
-M option allows featureCounts to include reads that map to multiple locations, which can be useful in certain analyses.
featureCounts -s 2 -a annotation.gtf -o strand_specific_counts.txt alignments.bam
-s option specifies the strand-specificity of the library:
1 for stranded,
2 for reversely stranded, and
0 for unstranded.
These examples provide a glimpse into the versatility of featureCounts. By combining different options, you can tailor the read counting process to fit the specific needs of your analysis.
In conclusion, featureCounts is a valuable tool in the bioinformatics toolkit, offering speed, accuracy, and flexibility for read summarization tasks. Whether you're working with RNA-seq or DNA-seq data, featureCounts can help you efficiently quantify genomic features and advance your research.
Updated about 1 month ago