Breakpoint read work presented at MC-GARD meeting 
Posted by Dan Koboldt on Wednesday, May 9, 2007, 01:03 PM

Last week I attended the first MC-GARD (Marie Curie - Genome Architecture in Relation to Disease) meeting, titled "Molecular Profiling of the Genome". The conference, chaired by Bauke Ylstra, was hosted at the VU University Medical Center in Amsterdam, the Netherlands. Lars Feuk (Hospital for Sick Children in Toronto) set the tone for the meeting with his keynote lecture titled Discovery of Structural Variation: New insights into disease. He described the Database of Genomic Variants hosted at Sickkids as well as some work on CNVs and autism.

Many of the talks discussed copy number variants (CNVs) and their relationship to human cancers. Copy number analysis with array CGH (aCGH) was a popular topic. On Friday (4 May), presentations opened with a very nice talk by Erwin Schurr (McGill Univ.) on the host genetics of tuberculosis and leprosy. He won points for discussing the common disease common variant (CDCV) hypothesis and the HapMap Project . Their work, however, focuses on finding "major gene effects" modulating the immunogenetics of infection. Counterintuitively, the major gene they found exhibited a dominant (not recessive) effect in Tb/Leprosy.

The invited speaker for my section was Mathew Hurles (Sanger), who described a very nice body of work to build the comprehensive map of copy number variation in the human genome. Their consortium has used high-resolution aCGH analysis of the HapMap samples to construct a dense map of CNVs >500 bp in size, a valuable addition to the haplotype map. In addition, they collaborated with E. Dermitzakis on correlating CNVs with heritable changes in gene expression.

The data Mat showed were impressive and made an excellent lead-in to my talk concerning high-throughput identification of structural variations from sequence trace data, which will be covered in another posting.
5 comments ( 30 views )   |  0 trackbacks   |  related link   |   ( 3.1 / 117 )

Average DAF of dbSNP Functional Classes 
Posted by Dan Koboldt on Thursday, April 19, 2007, 11:13 AM
I have calculated derived allele frequency (DAF) values for 2,539,864 SNPs characterized by the International HapMap Project in four populations of different ancestry. Although dbSNP offers only limited annotations of functional relevance for SNPs, I thought it might be interesting to plot the average DAF values for each of their functional classes (nonsynonymous coding, splice site, synonymous coding, mRNA-UTR, intron, locus, and unknown).

Differences in average derived allele frequency are apparent both between HapMap panels, and between dbSNP functional classes. The increased relative DAF values for Europeans and Asians is consistent with a population bottleneck in recent human history. Looking at the functional classes, splice-site SNPs were the most scarce and also exhibited the lowest DAFs on average. Unsurprisingly, DAF values were notably decreased for nonsynonymous coding SNPs. Perhaps more surprising are synonymous SNPs, which allegedly have no functional relevance, but nevertheless have lower allele frequencies than UTR, intronic, and intergenic SNPs. It would be useful if dbSNP offered some further classifications (e.g. "promoter") of putative functional relevance.
fxn_classsnpsYRICEUCHBJPT
unknown1,464,3720.25210.28250.28170.2816
locus71,9120.24390.27380.27330.2731
intron835,7610.23860.26900.26910.2690
mrna-utr139,0460.23600.26570.26520.2654
coding-syn13,4470.21240.23650.24080.2419
coding-non15,0610.16650.19310.19660.1970
splice-site2650.03710.03950.04920.0500

7 comments ( 16 views )   |  0 trackbacks   |  related link   |   ( 3.1 / 125 )

SNPs in human cytochrome P450 genes 
Posted by Dan Koboldt on Friday, April 6, 2007, 04:00 PM
I recently used our SNPseek web tool to perform an analysis of genetic (SNP) variation across cytochrome P450 genes in humans. A simple search by product name retrieved 53 CYP genes from the UCSC Known Genes (hg18). I plugged these gene symbols into SNPseek's analysis tool to retrieve a comprehensive report of the SNPs in CYP 450 genes according to dbSNP (b126).

SNPseek reported some 6,658 SNPs across the 53 CYP loci; of the 2,120 variants that were characterized by the HapMap Project just over half (1,093) were polymorphic.

Because nonsynonymous SNPs are of particular interest, I retrieved the 346 'coding-nonsynon' variants in CYP genes from our modified database of amino acid variants (coming soon). SNPseek had data for 328 of these; more than half (189) were also classified as either human-rodent or human-vertebrate conserved. Looking at the 105 nsSNPs for which HapMap data was available, it appears that amino acid variants in CYP genes have relatively high rates of monomorphism (42.35%) and population-specificity (24.71%) compared to nsSNPs overall. Even more striking was the low incidence of "common link" SNPs (5.88%).

To me, these patterns fit nicely with the expectation of purifying selection acting on the coding sequences of genes for CYP 450 enzymes.


4 comments ( 30 views )   |  0 trackbacks   |  related link   |   ( 3 / 129 )

Modeling the genetics of complex traits in mice 
Posted by Dan Koboldt on Friday, March 30, 2007, 02:07 PM
A fascinating talk was given by Joseph Nadeau (Case Western) at this week's Genetics departmental seminar. He described a long-term collaborative project with Eric Lander in which they studied metabolic traits in mice, particularly resistance to diet-induced obesity. They put two established mouse strains (A/J and BL/6) on a high-fat, high-sugar diet (the rodent equivalent of a Big Mac and large Coke every day). Despite the fact that A/J mice ate more and were less active, they stayed thin while BL/6 mice developed obesity, hypertension, insulin resistance - your complete cardiovascular disease package. Over the course of 7 years (starting in '96) they genotyped 17,000 mice and developed a panel of 22 Chromosome Substitution Strains (CSSs) that you can get from Jackson Labs. They made all kinds of interesting observations; here are five of the most striking ones:
1. Complexity. Whereas previous mouse obesity mapping studies came up with 2-4 loci per trait, Nadeau and Lander's system makes it possible to find more QTLs (8 CSSs per trait on average) with small effects
2. Effect Size. They expected to find many QTLs with small effects. Instead, they found many QTLs with large (51% on average) effects. For example, 8 CSSs account for 99.8% of variation in cholesterol levels among mice on the "diet".
3. Fractal Genetics.They observed a similarly large number of large effects on resistance to diet-induced obesity at the chromosome, congenic, and sub-congenic levels.
4. Epistasis. Some 20 genes conferred resistance to diet-induced obesity, but all of their effects were non-additive.
5. Alternating Stable States. In fact, the effect of having more genes reversed the phenotype completely. One gene = obese, 2 genes = lean, 3 genes = obese, 4 genes = lean.

I think that everyone left the room with a new appreciation for "systems biology" in mice to study the genetic architecture of complex traits.
1 comment ( 3 views )   |  0 trackbacks   |  related link   |   ( 2.9 / 168 )

SNPs, CNVs, and Gene Expression 
Posted by Dan Koboldt on Tuesday, March 27, 2007, 11:12 AM
Another useful application of data from the International HapMap Project was published in Science last month: evaluation the relative contributions of single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) to inter-individual differences in gene expression. Barbara Stranger, Manolis Dermitzakis, and colleagues performed association analyses between expression of 14,925 genes and SNPs/CNVs in HapMap cell lines. There were at least two key findings. First, much more of the heritable variation in gene expression is due to SNPs (83.6%) than to CNVs (17.7%). Second, there was little overlap in signals between both types of variants; less than 20% of CNV associations had a corresponding SNP association.

Taken together, these findings reinforce the importance of SNPs, despite the excitement over recent discoveries related to copy-number variation. However, they also highlight the fact that both SNPs and CNVs make significant, largely-independent contributions to the genetic variability that underlies complex phenotypes like disease susceptibility and drug response.

Stranger BE et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science. 2007 Feb 9;315(5813):848-53.


1 comment ( 3 views )   |  0 trackbacks   |  related link   |   ( 3 / 158 )

Large deletions in C. briggsae strain HK104 
Posted by Dan Koboldt on Friday, March 16, 2007, 12:06 PM
Our lab became interested in large insertions, deletions, and copy number polymorphisms following a landmark issue of Nature Genetics (2006) describing the prevalence of such structural variants in the human genome. In particular, we have focused on developing an algorithm to detect "arrangement polymorphisms", or ARPs, in resequencing data.

The principle behind our approach is that sequencing reads spanning the boundaries of arrangement polymorphisms will exhibit certain unusual patterns when aligned to the reference sequence using BLAST. In essence, such "breakpoint reads" will have two high-scoring alignments against the reference genome rather than one. The presence of two alignments for a single read suggests the presence of an ARP, and their relative relationship can often serve to infer the type and size of the underlying polymorphism.

We used BLAST to 13,632 sequencing traces from the HK104 strain of C. briggsae against the "cb25" (ultracontig) genome assembly that is based on canonical strain AF16. This model system is advantageous for testing our algorithm because AF16-HK104 divergence is high (one difference every ~400 bp) and the read coverage (0.6X) is decent, both of which increase the probability of a hit. Though we are still refining the technique, it has detected at least three large deletions in HK104 that are likely to be real polymorphisms.

ultra_contigstartstoptypesize
cb25.fpc0090603965604648DELETION267 bp
cb25.fpc4118273978272428DELETION1,252 bp
cb25.fpc2260650151657894DELETION7,516 bp


PCR validation of our ARP predictions is under way.
9 comments ( 43 views )   |  0 trackbacks   |  related link   |   ( 2.9 / 168 )

Small Indels in C. briggsae strain HK104 
Posted by Dan Koboldt on Tuesday, March 13, 2007, 01:11 PM
Most of our SNP discovery efforts in C. briggsae make use of the ssaha-SNP program developed by Jim Mullikin (with A. Spargo and Z. Ning). Today I used a companion script, parse_indel, to extract insertion-deletion polymorphisms detected in 13,632 HK104 shotgun reads. For the reference sequence, I used the "cb3" assembly now available on Wormbase (L. Hillier and R. Waterston).

When I combined results from all reads, some 7,537 unique indel loci were detected. There were 306 observed in multiple reads, of which 7 were discordant (and tossed). That left me with 7,530 candidate insertion-deletion variants. Around two-thirds of these (4686) were deletions; the remaining third (2844) were insertions. This is most likely not due to biology, but due to the higher probability for detecting deletions in shotgun sequence data than insertions.

Looking at the distribution of indel sizes:
5,906 indels were 1-2 bp
1,338 indels were 3-10 bp
214 indels were 11-20 bp
72 indels were >20 bp

I hope to do some more processing (providing flanking sequence, filtering by size, etc) before making these available on the C. briggsae data downloads page.


4 comments ( 23 views )   |  0 trackbacks   |  related link   |   ( 2.9 / 174 )

Blog Launched for WUSM SNP Research 
Posted by Dan Koboldt on Monday, March 12, 2007, 03:14 PM
The SNP Research Facility, headed by Dr. Raymond D. Miller (Department of Genetics) at Washington University School of Medicine, studies patterns of genetic variation in humans and model organisms. We perform high-throughput SNP genotyping using the FP-TDI platform from PerkinElmer. Our laboratory, along with Pui Kwok's group at UCSF, served as a major genotyping center for Phase I of the International HapMap Project. Currently, we are funded to develop a high-density genetic map and ancillary resources to support C. briggsae as a model organism.

This blog was launched in March of 2007 to share news, data, and perspectives related to our research.
16 comments ( 3 views )   |  0 trackbacks   |  related link   |   ( 3.1 / 183 )