Picard Tutorial

Overview of Picard

Picard is a robust set of command line tools designed for high-throughput sequencing (HTS) data manipulation and analysis. Developed by the Broad Institute, Picard handles various file formats such as SAM (Sequence Alignment/Map), BAM (Binary Alignment/Map), CRAM (Compressed Reference Alignment/Map), and VCF (Variant Call Format), which are essential in genomic data analysis.

The toolkit is particularly useful for tasks such as quality control, read manipulation, and data analysis. It's widely used in the bioinformatics community for its efficiency and comprehensive set of features. Picard is open-source and available under the MIT license, making it free for both personal and commercial use.

Installation

To install Picard, you need to download the executable JAR file from the Latest Release page on GitHub. Once downloaded, you can place the JAR file in a directory of your choice. Since Picard is a Java program, it does not need to be added to your PATH like C-compiled programs. However, setting up an environment variable or a shell alias can simplify the execution of Picard commands.

Quick Start

After downloading Picard, you can run it using the Java Virtual Machine (JVM). A typical command to execute Picard looks like this:

java -jar picard.jar <PicardCommand> OPTION1=value1 OPTION2=value2...

Replace <PicardCommand> with the specific Picard tool you wish to use, and provide the necessary options for your task.

Code Examples Of Popular Commands

Here are five popular Picard commands with examples to illustrate their usage:

1. SortSam

SortSam sorts a SAM or BAM file by coordinate or query name (QNAME).

java -jar picard.jar SortSam \
      I=input.bam \
      O=sorted.bam \
      SORT_ORDER=coordinate

2. MarkDuplicates

MarkDuplicates identifies and marks duplicate reads in a BAM or SAM file, which is crucial for many downstream analyses.

java -jar picard.jar MarkDuplicates \
      I=sorted.bam \
      O=marked_duplicates.bam \
      M=marked_dup_metrics.txt

3. CollectAlignmentSummaryMetrics

CollectAlignmentSummaryMetrics generates metrics for the quality of alignment.

java -jar picard.jar CollectAlignmentSummaryMetrics \
      R=reference.fasta \
      I=input.bam \
      O=alignment_metrics.txt

4. CollectHsMetrics

CollectHsMetrics calculates metrics specific to hybrid selection experiments, such as exome sequencing.

java -jar picard.jar CollectHsMetrics \
      R=reference.fasta \
      I=input.bam \
      O=hs_metrics.txt \
      BAIT_INTERVALS=bait_intervals.list \
      TARGET_INTERVALS=target_intervals.list

5. CreateSequenceDictionary

CreateSequenceDictionary creates a .dict file from a reference sequence, which is often required for analysis tools.

java -jar picard.jar CreateSequenceDictionary \
      R=reference.fasta \
      O=reference.dict

These commands are just a starting point to explore the extensive functionality of Picard. Each tool comes with a variety of options that allow you to tailor the command to your specific needs. The Picard documentation provides detailed information on each tool and its options.

Picard is a powerful asset in the bioinformatics toolkit, and its wide range of functions makes it indispensable for genomic data processing and analysis. Whether you're sorting files, marking duplicates, or collecting metrics, Picard provides a reliable and efficient way to prepare your data for further analysis.