Germline Genomics
Our OP² germline genome transcriptomics pipeline is a bioinformatics analysis workflow used for whole genome, whole-exome or targeted DNA sequencing data.
It allows you to analyze your genome sequencing data using this gold standard analysis pipeline.
You get insights into the quality of your data, identify small to large nucleotide and structural variation and annotate with biological knowledge.
The workflow processes raw data from FastQ inputs, aligns the reads, calls variants and performs variant annotation.
These results are made available to you via two interactive reports, and a data package with all essential intermediate files to perform more in-depth data analysis.
The pre-processing workflow processes your raw sequence data until QC approved aligned data.
Next, the post-processing workflow enables you to review the biological meaning of your data via data annotation.
-
1
Input
Whole genome, whole-exome and targeted genome data
Paired-end compressed raw FastQ files
Reference genome (GRCh37, GRCh38, GRCm38) -
2
Sequence QC
Reads with low-quality are discarded
-
3
Trimming
Adaptor and quality trimming of reads
-
4
Alignment
BWA aligns reads to reference genome
-
5
Alignment QC
Alignment statistics: read depths, per base, GC content, …
-
6
Mark Duplicates
GATK MarkDuplicates removes potential PCR artefacts
Construction of expression matrices -
7
Base Quality Score Recalibration
BQSR is recalibrated
BQSR model is applied -
8
Merge to final alignment file
All steps are consolidated in one alignment file per sample
-
1
Input
Trimmed, recalibrated alignment file
-
2
Variant Calling
SNVs, small indels, structural variants are called
GATK HaplotypeCaller, Strelka2, FreeBayes, Manta, … -
3
Merge multi-variant files
All variant calling results are consolidated in one variant calling file per sample
-
4
Variant Annotation
Variants get biological knowledge assigned
snpEff and VEP -
5
Variant QC and reporting
Quality score of variants are summarized
Summary statistics on variant categories, etc