My personal genome


Oragene OG-500 human DNA collection device

I bought whole genome sequencing for myself, with raw sequencing data, from Dante Labs in late 2017. They advertised this level of sequencing quality:

Dante Labs Whole Genome Sequencing - Technical Sheet
Paired End Reads Length 100 base pairs
Mapping Rate 99.22%
Unique Rate 96.47%
Average Sequencing Depth 37.44
Average Coverage 30X
Coverage 99.12%
Coverage at least 4X 98.12%
Coverage at least 10X 97.81%
Coverage at least 20X 96.06%

After they successfully extracted DNA from my second saliva sample – the first sample did not qualify – it took much longer than the 50 business days advertised to get results:

  • 15001702301675A.gvcf file (28,139,979,844 bytes) (88 business days after DNA extraction)
  • A hard disk with raw sequencing reads and more (~150 business days after DNA extraction)

The gVCF file

The genome Variant Call Format (gVCF) file represents the genotype as constructed by the sequencing pipeline. Let's go through the gVCF file line by line:

The string ## at the beginning of the line denotes file meta-information which follows in format key=value. The file format is Variant Call Format (VCF) Version 4.1. The VCF format is maintained and specified by the Global Alliance for Genomics and Health Data Working group file format team / Samtools organization. A gVCF file is a valid VCF 4.1 file that follows a set of gVCF conventions maintained in the gvcftools repository by Chris Saunders (Illumina).

This is work in progress that is published to force myself to improve it. To be continued...