Posted by Dan Koboldt on Friday, March 16, 2007, 12:06 PM
Our lab became interested in large insertions, deletions, and copy number polymorphisms following a landmark issue of Nature Genetics (2006) describing the prevalence of such structural variants in the human genome. In particular, we have focused on developing an algorithm to detect "arrangement polymorphisms", or ARPs, in resequencing data.The principle behind our approach is that sequencing reads spanning the boundaries of arrangement polymorphisms will exhibit certain unusual patterns when aligned to the reference sequence using BLAST. In essence, such "breakpoint reads" will have two high-scoring alignments against the reference genome rather than one. The presence of two alignments for a single read suggests the presence of an ARP, and their relative relationship can often serve to infer the type and size of the underlying polymorphism.
We used BLAST to 13,632 sequencing traces from the HK104 strain of C. briggsae against the "cb25" (ultracontig) genome assembly that is based on canonical strain AF16. This model system is advantageous for testing our algorithm because AF16-HK104 divergence is high (one difference every ~400 bp) and the read coverage (0.6X) is decent, both of which increase the probability of a hit. Though we are still refining the technique, it has detected at least three large deletions in HK104 that are likely to be real polymorphisms.
| ultra_contig | start | stop | type | size |
| cb25.fpc0090 | 603965 | 604648 | DELETION | 267 bp |
| cb25.fpc4118 | 273978 | 272428 | DELETION | 1,252 bp |
| cb25.fpc2260 | 650151 | 657894 | DELETION | 7,516 bp |
PCR validation of our ARP predictions is under way.




( 2.9 / 168 )
Posted by Dan Koboldt on Tuesday, March 13, 2007, 01:11 PM
Most of our SNP discovery efforts in C. briggsae make use of the ssaha-SNP program developed by Jim Mullikin (with A. Spargo and Z. Ning). Today I used a companion script, parse_indel, to extract insertion-deletion polymorphisms detected in 13,632 HK104 shotgun reads. For the reference sequence, I used the "cb3" assembly now available on Wormbase (L. Hillier and R. Waterston). When I combined results from all reads, some 7,537 unique indel loci were detected. There were 306 observed in multiple reads, of which 7 were discordant (and tossed). That left me with 7,530 candidate insertion-deletion variants. Around two-thirds of these (4686) were deletions; the remaining third (2844) were insertions. This is most likely not due to biology, but due to the higher probability for detecting deletions in shotgun sequence data than insertions.
Looking at the distribution of indel sizes:
5,906 indels were 1-2 bp
1,338 indels were 3-10 bp
214 indels were 11-20 bp
72 indels were >20 bp
I hope to do some more processing (providing flanking sequence, filtering by size, etc) before making these available on the C. briggsae data downloads page.
Posted by Dan Koboldt on Monday, March 12, 2007, 03:14 PM
The SNP Research Facility, headed by Dr. Raymond D. Miller (Department of Genetics) at Washington University School of Medicine, studies patterns of genetic variation in humans and model organisms. We perform high-throughput SNP genotyping using the FP-TDI platform from PerkinElmer. Our laboratory, along with Pui Kwok's group at UCSF, served as a major genotyping center for Phase I of the International HapMap Project. Currently, we are funded to develop a high-density genetic map and ancillary resources to support C. briggsae as a model organism.This blog was launched in March of 2007 to share news, data, and perspectives related to our research.
Back

Search



