Trimmomatic Tutorial

Trimmomatic: A Comprehensive Guide for Bioinformatics

Overview

Trimmomatic is a highly flexible and efficient preprocessing tool designed for the trimming and filtering of Illumina next-generation sequencing (NGS) data. It stands out due to its ability to handle paired-end data correctly and its performance in various scenarios. Developed by Anthony M. Bolger, Marc Lohse, and Bjoern Usadel, Trimmomatic addresses the need for a tool that can offer flexibility, correct paired-end data handling, and high performance.

The main algorithmic innovations of Trimmomatic are related to the identification of adapter sequences and quality filtering. It uses two approaches for detecting technical sequences within reads: 'simple mode' and a more complex method that requires a substantial minimum overlap to prevent false positives. Trimmomatic is particularly adept at removing adapter sequences and other technical sequences that can interfere with downstream analyses.

Key Features

  • Adapter Removal: Trimmomatic can detect and remove adapter sequences from reads, which is crucial for accurate downstream analysis.
  • Quality Filtering: It offers two main quality filtering alternatives, exploiting the Illumina quality score to determine where the read should be cut.
  • Sliding Window Quality Filtering: This method scans from the 5‚Ä≤ end of the read and trims the 3‚Ä≤ end when the average quality drops below a threshold.
  • Maximum Information Quality Filtering: An alternative approach that also focuses on quality-based trimming.

Trimmomatic is licensed under GPL V3 and is cross-platform, requiring Java 1.5 or higher. It is available for download from the Usadel Lab website.

Installation

To install Trimmomatic, you will need to have Java installed on your system as it is a Java-based program. The installation process is straightforward:

  1. Download the latest version of Trimmomatic from the official website.
  2. Unzip the downloaded file to a directory of your choice.
  3. Ensure that the Java Runtime Environment (JRE) is installed and that you can run Java from the command line.

Once installed, Trimmomatic can be run from the command line using the java -jar command followed by the path to the Trimmomatic jar file and the appropriate options for your data.

Quick Start

To get started with Trimmomatic, you'll need to prepare your input files, which are typically in FASTQ format. Here's a quick example of how to run Trimmomatic on paired-end data:

java -jar trimmomatic-0.39.jar PE -phred33 input_forward.fq.gz input_reverse.fq.gz output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

This command specifies the input and output files for both the forward and reverse reads, sets the Phred quality score to 33, and includes several trimming options such as adapter removal, leading and trailing low-quality base removal, sliding window trimming, and minimum length filtering.

Code Examples Of Popular Commands

Here are five popular commands used in Trimmomatic to preprocess NGS data:

1. Adapter Trimming

java -jar trimmomatic-0.39.jar PE -phred33 input_forward.fq.gz input_reverse.fq.gz output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:adapter_sequences.fa:2:30:10

This command removes adapter sequences using the ILLUMINACLIP option, where adapter_sequences.fa is the file containing adapter sequences, 2 is the seed mismatches, 30 is the palindrome clip threshold, and 10 is the simple clip threshold.

2. Quality Trimming

java -jar trimmomatic-0.39.jar SE -phred33 input.fq.gz output_trimmed.fq.gz LEADING:20 TRAILING:20

This command trims low-quality bases from the start (LEADING) and end (TRAILING) of the reads, with a quality threshold of 20.

3. Sliding Window Trimming

java -jar trimmomatic-0.39.jar SE -phred33 input.fq.gz output_trimmed.fq.gz SLIDINGWINDOW:4:15

The SLIDINGWINDOW option trims the read once the average quality within the 4-base window falls below 15.

4. Minimum Length Filtering

java -jar trimmomatic-0.39.jar SE -phred33 input.fq.gz output_trimmed.fq.gz MINLEN:36

MINLEN discards reads that fall below the specified length after trimming, in this case, 36 bases.

5. Cropping

java -jar trimmomatic-0.39.jar SE -phred33 input.fq.gz output_trimmed.fq.gz CROP:75

The CROP option cuts the read to a specified length, here 75 bases, regardless of the quality.

Trimmomatic is a powerful tool for NGS data preprocessing, offering a range of options to ensure high-quality data for downstream analysis. Its flexibility and efficiency make it a preferred choice for bioinformaticians working with Illumina sequencing data.