SNPs, CNVs, and Gene Expression 
Posted by Dan Koboldt on Tuesday, March 27, 2007, 11:12 AM
Another useful application of data from the International HapMap Project was published in Science last month: evaluation the relative contributions of single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) to inter-individual differences in gene expression. Barbara Stranger, Manolis Dermitzakis, and colleagues performed association analyses between expression of 14,925 genes and SNPs/CNVs in HapMap cell lines. There were at least two key findings. First, much more of the heritable variation in gene expression is due to SNPs (83.6%) than to CNVs (17.7%). Second, there was little overlap in signals between both types of variants; less than 20% of CNV associations had a corresponding SNP association.

Taken together, these findings reinforce the importance of SNPs, despite the excitement over recent discoveries related to copy-number variation. However, they also highlight the fact that both SNPs and CNVs make significant, largely-independent contributions to the genetic variability that underlies complex phenotypes like disease susceptibility and drug response.

Stranger BE et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science. 2007 Feb 9;315(5813):848-53.


1 comment ( 3 views )   |  0 trackbacks   |  related link   |   ( 3 / 158 )

Large deletions in C. briggsae strain HK104 
Posted by Dan Koboldt on Friday, March 16, 2007, 12:06 PM
Our lab became interested in large insertions, deletions, and copy number polymorphisms following a landmark issue of Nature Genetics (2006) describing the prevalence of such structural variants in the human genome. In particular, we have focused on developing an algorithm to detect "arrangement polymorphisms", or ARPs, in resequencing data.

The principle behind our approach is that sequencing reads spanning the boundaries of arrangement polymorphisms will exhibit certain unusual patterns when aligned to the reference sequence using BLAST. In essence, such "breakpoint reads" will have two high-scoring alignments against the reference genome rather than one. The presence of two alignments for a single read suggests the presence of an ARP, and their relative relationship can often serve to infer the type and size of the underlying polymorphism.

We used BLAST to 13,632 sequencing traces from the HK104 strain of C. briggsae against the "cb25" (ultracontig) genome assembly that is based on canonical strain AF16. This model system is advantageous for testing our algorithm because AF16-HK104 divergence is high (one difference every ~400 bp) and the read coverage (0.6X) is decent, both of which increase the probability of a hit. Though we are still refining the technique, it has detected at least three large deletions in HK104 that are likely to be real polymorphisms.

ultra_contigstartstoptypesize
cb25.fpc0090603965604648DELETION267 bp
cb25.fpc4118273978272428DELETION1,252 bp
cb25.fpc2260650151657894DELETION7,516 bp


PCR validation of our ARP predictions is under way.
9 comments ( 43 views )   |  0 trackbacks   |  related link   |   ( 2.9 / 168 )

Small Indels in C. briggsae strain HK104 
Posted by Dan Koboldt on Tuesday, March 13, 2007, 01:11 PM
Most of our SNP discovery efforts in C. briggsae make use of the ssaha-SNP program developed by Jim Mullikin (with A. Spargo and Z. Ning). Today I used a companion script, parse_indel, to extract insertion-deletion polymorphisms detected in 13,632 HK104 shotgun reads. For the reference sequence, I used the "cb3" assembly now available on Wormbase (L. Hillier and R. Waterston).

When I combined results from all reads, some 7,537 unique indel loci were detected. There were 306 observed in multiple reads, of which 7 were discordant (and tossed). That left me with 7,530 candidate insertion-deletion variants. Around two-thirds of these (4686) were deletions; the remaining third (2844) were insertions. This is most likely not due to biology, but due to the higher probability for detecting deletions in shotgun sequence data than insertions.

Looking at the distribution of indel sizes:
5,906 indels were 1-2 bp
1,338 indels were 3-10 bp
214 indels were 11-20 bp
72 indels were >20 bp

I hope to do some more processing (providing flanking sequence, filtering by size, etc) before making these available on the C. briggsae data downloads page.


4 comments ( 23 views )   |  0 trackbacks   |  related link   |   ( 2.9 / 174 )

Blog Launched for WUSM SNP Research 
Posted by Dan Koboldt on Monday, March 12, 2007, 03:14 PM
The SNP Research Facility, headed by Dr. Raymond D. Miller (Department of Genetics) at Washington University School of Medicine, studies patterns of genetic variation in humans and model organisms. We perform high-throughput SNP genotyping using the FP-TDI platform from PerkinElmer. Our laboratory, along with Pui Kwok's group at UCSF, served as a major genotyping center for Phase I of the International HapMap Project. Currently, we are funded to develop a high-density genetic map and ancillary resources to support C. briggsae as a model organism.

This blog was launched in March of 2007 to share news, data, and perspectives related to our research.
16 comments ( 3 views )   |  0 trackbacks   |  related link   |   ( 3.1 / 183 )


Back