Picard Tutorial
Go to ai.tinybio.cloud/chat to chat with a life sciences focused ChatGPT.
Overview of Picard
Picard is a robust set of command line tools designed for high-throughput sequencing (HTS) data manipulation and analysis. Developed by the Broad Institute, Picard handles various file formats such as SAM (Sequence Alignment/Map), BAM (Binary Alignment/Map), CRAM (Compressed Reference Alignment/Map), and VCF (Variant Call Format), which are essential in genomic data analysis.
The toolkit is particularly useful for tasks such as quality control, read manipulation, and data analysis. It's widely used in the bioinformatics community for its efficiency and comprehensive set of features. Picard is open-source and available under the MIT license, making it free for both personal and commercial use.
Installation
To install Picard, you need to download the executable JAR file from the Latest Release page on GitHub. Once downloaded, you can place the JAR file in a directory of your choice. Since Picard is a Java program, it does not need to be added to your PATH like C-compiled programs. However, setting up an environment variable or a shell alias can simplify the execution of Picard commands.
Quick Start
After downloading Picard, you can run it using the Java Virtual Machine (JVM). A typical command to execute Picard looks like this:
java -jar picard.jar <PicardCommand> OPTION1=value1 OPTION2=value2...
Replace <PicardCommand>
with the specific Picard tool you wish to use, and provide the necessary options for your task.
Code Examples Of Popular Commands
Here are five popular Picard commands with examples to illustrate their usage:
1. SortSam
SortSam sorts a SAM or BAM file by coordinate or query name (QNAME).
java -jar picard.jar SortSam \
I=input.bam \
O=sorted.bam \
SORT_ORDER=coordinate
2. MarkDuplicates
MarkDuplicates identifies and marks duplicate reads in a BAM or SAM file, which is crucial for many downstream analyses.
java -jar picard.jar MarkDuplicates \
I=sorted.bam \
O=marked_duplicates.bam \
M=marked_dup_metrics.txt
3. CollectAlignmentSummaryMetrics
CollectAlignmentSummaryMetrics generates metrics for the quality of alignment.
java -jar picard.jar CollectAlignmentSummaryMetrics \
R=reference.fasta \
I=input.bam \
O=alignment_metrics.txt
4. CollectHsMetrics
CollectHsMetrics calculates metrics specific to hybrid selection experiments, such as exome sequencing.
java -jar picard.jar CollectHsMetrics \
R=reference.fasta \
I=input.bam \
O=hs_metrics.txt \
BAIT_INTERVALS=bait_intervals.list \
TARGET_INTERVALS=target_intervals.list
5. CreateSequenceDictionary
CreateSequenceDictionary creates a .dict file from a reference sequence, which is often required for analysis tools.
java -jar picard.jar CreateSequenceDictionary \
R=reference.fasta \
O=reference.dict
These commands are just a starting point to explore the extensive functionality of Picard. Each tool comes with a variety of options that allow you to tailor the command to your specific needs. The Picard documentation provides detailed information on each tool and its options.
Picard is a powerful asset in the bioinformatics toolkit, and its wide range of functions makes it indispensable for genomic data processing and analysis. Whether you're sorting files, marking duplicates, or collecting metrics, Picard provides a reliable and efficient way to prepare your data for further analysis.
Updated 7 months ago