Biology
Dr Mahul Chakraborty
University of California Irvine
Accurate characterization of genetic variation is essential for understanding phenotypic evolution. Typically, high throughput short reads are aligned to a reference genome to identify the single nucleotide polymorphisms (SNP) and small indels. However, large scale structural mutations (e.g. duplications, deletions, insertions, etc.) are often overlooked by short read based methods. Thus, although such structural variants (SVs) play pivotal roles in genome evolution and the genetic basis of diseases and adaptations, our perception of structural genetic variation is drastically limited by current methods. To overcome these limitations, we resequenced the founder strains of the Drosophila Synthetic Population Resources (www.flyrils.org ) using Pacific Biosciences long reads. To shed light on the missing SVs, we constructed de novo assemblies for each of these strains. Notably, completeness and contiguity of the assemblies are comparable to or better thanthe current release of the reference strain, with the majority of the genome represented by contiguous sequences (contigs) measuring 20Mb or longer. Comparisons of these assemblies revealed ubiquitous duplicates, transposon insertions, and inversions, revealing the dynamic nature of genome structure. A large number of these SVs, which were previously unknown, contribute to extensive gene structure polymorphism, expression level variation, and phenotypic adaptations, several of which I describe in detail.