Minimap2 Tutorial

Overview of Minimap2

Minimap2 is a versatile and efficient bioinformatics tool designed for the alignment of DNA or mRNA sequences against a large reference database. It is capable of handling various types of sequences, including short reads of at least 100 base pairs (bp), long genomic reads with an error rate of around 15%, full-length noisy Direct RNA or cDNA reads, and even assembly contigs or full chromosomes that can be hundreds of megabases long.

One of the key strengths of Minimap2 is its ability to perform split-read alignment, which is crucial for identifying structural variations in the genome. It also employs a concave gap cost model for long insertions and deletions, which helps in accurately aligning sequences with large gaps. The tool introduces new heuristics to minimize spurious alignments, ensuring that the results are as accurate as possible.

Minimap2 stands out for its speed and accuracy. It is reported to be three to four times faster than mainstream short-read mappers while maintaining comparable accuracy. When it comes to long-read genomic or cDNA mappers, Minimap2 is at least 30 times faster and achieves higher accuracy, outperforming most aligners that specialize in one type of alignment.

The tool's capability stems from a fast base-level alignment algorithm and an accurate chaining algorithm. The Suzuki–Kasahara algorithm, which is part of Minimap2, significantly improves the performance of base-level alignment, making it possible to handle alignments involving very long introns that were previously too slow to process. The chaining algorithm of Minimap2 is not only fast but also highly accurate on its own, often more accurate than other long-read mappers.

Minimap2 is also a versatile mapper and pairwise aligner for nucleotide sequences. It can be used as a read mapper, long-read overlapper, or a full-genome aligner. Its general form allows it to be adapted to non-typical data types such as spliced reads and multiple reads per fragment, extending its applicability to a wide range of use cases.

The tool is available on GitHub, and supplementary data and information can be found in the Bioinformatics journal online.

Installation

To install Minimap2, you will need to visit the GitHub repository at https://github.com/lh3/minimap2. The repository contains the source code and instructions for compiling and installing the tool on your system. Minimap2 can be compiled using make and is designed to work on Unix-like operating systems. Precompiled binaries may also be available for download, which can simplify the installation process for users who do not wish to compile the software from source.

Quick Start

Once Minimap2 is installed, you can quickly start aligning sequences by using the command line interface. The basic usage involves specifying the reference database and the query sequences. Minimap2 supports various file formats, including FASTA, FASTQ, and SAM/BAM for input and output.

A simple command to align reads to a reference might look like this:

minimap2 -ax map-ont reference.fa reads.fq > alignment.sam

In this example, -ax map-ont specifies the preset for Oxford Nanopore reads, reference.fa is the reference database in FASTA format, and reads.fq is the file containing the reads in FASTQ format. The output is directed to alignment.sam, which is a SAM file containing the alignments.

Code Examples Of Popular Commands

Here are five popular commands that you can use with Minimap2:

  1. Aligning long reads from Oxford Nanopore:
minimap2 -ax map-ont reference.fa ont_reads.fq > ont_alignment.sam
  1. Aligning PacBio reads:
minimap2 -ax map-pb reference.fa pacbio_reads.fq > pb_alignment.sam
  1. Aligning Illumina reads:
minimap2 -ax sr reference.fa illumina_reads.fq > illumina_alignment.sam
  1. Aligning spliced sequences (such as RNA-seq data):
minimap2 -ax splice reference.fa rnaseq_reads.fq > rnaseq_alignment.sam
  1. Generating an overlap layout for long reads (useful for assembly):
minimap2 -x ava-ont reads.fq reads.fq > overlaps.paf

In each of these commands, -ax specifies the preset for different types of reads, and -x specifies the preset for the overlap layout. The input and output files are specified accordingly, with the output being directed to SAM or PAF files.

Minimap2 is a powerful tool that has revolutionized the way we align sequences in bioinformatics. Its speed, accuracy, and versatility make it an essential tool for researchers working with various types of sequencing data. Whether you are dealing with short reads, long reads, or spliced sequences, Minimap2 offers a reliable solution for your alignment needs.