HMMER Tutorial
Go to ai.tinybio.cloud/chat to chat with a life sciences focused ChatGPT.
Overview of HMMER
HMMER is a powerful bioinformatics tool used for searching sequence databases for homologs and for making sequence alignments. It is based on probabilistic models known as profile hidden Markov models (profile HMMs). These models are particularly effective in detecting remote homologs, which are sequences that have diverged significantly from their common ancestors but still retain functional or structural similarities.
The strength of HMMER lies in its underlying probability models, which allow it to detect homologs with high sensitivity. Historically, this computational power came with a significant cost in terms of processing time. However, with the release of HMMER3, the tool has become as fast as BLAST, a widely used program for sequence comparison.
HMMER can be used in conjunction with profile databases such as Pfam or those that participate in Interpro. It can also work with individual query sequences, similar to BLAST, using commands like phmmer
for searching a protein query sequence against a database, or jackhmmer
for iterative searches.
The tool is available for download and can be installed as a command-line tool on your own hardware. Additionally, it is accessible to the scientific community through search servers at the European Bioinformatics Institute, where users can search against the latest Uniprot databases.
For more detailed information and guidance, the HMMER User's Guide is available in PDF format, and ongoing discussions about the tool can be found on the blog Cryptogenomicon.
Installation
To install HMMER, you will need to download the source code from the official HMMER website. The latest version available is v3.4, but archived older versions are also accessible. The installation process typically involves compiling the source code on your system, which requires a C compiler and possibly other development tools, depending on your operating system.
The installation instructions are detailed in the HMMER User's Guide, which provides step-by-step guidance for different platforms. It is important to follow these instructions carefully to ensure that the tool is installed correctly and all necessary dependencies are met.
Quick Start
Once HMMER is installed, you can begin using it to search sequence databases for homologs or to create sequence alignments. The tool comes with a variety of commands, each designed for specific tasks. For a quick start, you can use the phmmer
command to search a protein sequence against a database or jackhmmer
for iterative searches that can identify more distant homologs.
The basic syntax for using phmmer
is as follows:
phmmer -i query.fasta -d database.fasta
This command will take a query sequence from query.fasta
and search it against the sequences in database.fasta
. The results will include a list of potential homologs ranked by their probability scores.
For jackhmmer
, the syntax is similar:
jackhmmer -i query.fasta -d database.fasta
jackhmmer
will perform multiple rounds of searching, refining the search space with each iteration to find more distant homologs.
Code Examples Of Popular Commands
Here are five popular commands used in HMMER and examples of how to use them:
-
phmmer
: Search a protein sequence against a database.phmmer --cpu 4 -i query.fasta -d database.fasta -o results.out
This command uses 4 CPU cores to search the query sequence in
query.fasta
against the databasedatabase.fasta
, with the results written toresults.out
. -
jackhmmer
: Perform iterative searches to find distant homologs.jackhmmer --cpu 4 -N 5 -i query.fasta -d database.fasta -o results.out
This command performs 5 iterations (
-N 5
) of searching using 4 CPU cores. -
hmmbuild
: Build a profile HMM from a multiple sequence alignment.hmmbuild -n mymodel profile.hmm alignment.sto
This command creates a profile HMM named
mymodel
and saves it toprofile.hmm
using the alignment inalignment.sto
. -
hmmsearch
: Search a sequence database with a profile HMM.hmmsearch --tblout hits.table profile.hmm database.fasta
This command searches the database
database.fasta
with the profile HMMprofile.hmm
and outputs the results in a tabular format tohits.table
. -
hmmscan
: Scan a sequence against a database of profile HMMs.hmmscan --domtblout domains.table pfam_db.hmm query.fasta
This command scans the query sequence in
query.fasta
against a database of profile HMMspfam_db.hmm
and writes domain hits todomains.table
.
These commands represent just a fraction of HMMER's capabilities, but they are among the most commonly used for sequence analysis tasks. Each command comes with a variety of options and flags that can be used to customize the analysis, and users are encouraged to consult the HMMER User's Guide for comprehensive documentation on each command.
In conclusion, HMMER is a versatile and powerful tool for bioinformatics research, providing researchers with the ability to detect and analyze sequence homologs with high sensitivity and speed. Whether you are working with individual sequences or large databases, HMMER offers a range of commands to suit your research needs.
Updated 8 months ago