Posted by Dan Koboldt on Thursday, April 19, 2007, 11:13 AM
I have calculated derived allele frequency (DAF) values for 2,539,864 SNPs characterized by the International HapMap Project in four populations of different ancestry. Although dbSNP offers only limited annotations of functional relevance for SNPs, I thought it might be interesting to plot the average DAF values for each of their functional classes (nonsynonymous coding, splice site, synonymous coding, mRNA-UTR, intron, locus, and unknown).
Differences in average derived allele frequency are apparent both between HapMap panels, and between dbSNP functional classes. The increased relative DAF values for Europeans and Asians is consistent with a population bottleneck in recent human history. Looking at the functional classes, splice-site SNPs were the most scarce and also exhibited the lowest DAFs on average. Unsurprisingly, DAF values were notably decreased for nonsynonymous coding SNPs. Perhaps more surprising are synonymous SNPs, which allegedly have no functional relevance, but nevertheless have lower allele frequencies than UTR, intronic, and intergenic SNPs. It would be useful if dbSNP offered some further classifications (e.g. "promoter") of putative functional relevance.
| fxn_class | snps | YRI | CEU | CHB | JPT |
| unknown | 1,464,372 | 0.2521 | 0.2825 | 0.2817 | 0.2816 |
| locus | 71,912 | 0.2439 | 0.2738 | 0.2733 | 0.2731 |
| intron | 835,761 | 0.2386 | 0.2690 | 0.2691 | 0.2690 |
| mrna-utr | 139,046 | 0.2360 | 0.2657 | 0.2652 | 0.2654 |
| coding-syn | 13,447 | 0.2124 | 0.2365 | 0.2408 | 0.2419 |
| coding-non | 15,061 | 0.1665 | 0.1931 | 0.1966 | 0.1970 |
| splice-site | 265 | 0.0371 | 0.0395 | 0.0492 | 0.0500 |




( 3.1 / 125 )
Posted by Dan Koboldt on Friday, April 6, 2007, 04:00 PM
I recently used our SNPseek web tool to perform an analysis of genetic (SNP) variation across cytochrome P450 genes in humans. A simple search by product name retrieved 53 CYP genes from the UCSC Known Genes (hg18). I plugged these gene symbols into SNPseek's analysis tool to retrieve a comprehensive report of the SNPs in CYP 450 genes according to dbSNP (b126).SNPseek reported some 6,658 SNPs across the 53 CYP loci; of the 2,120 variants that were characterized by the HapMap Project just over half (1,093) were polymorphic.
Because nonsynonymous SNPs are of particular interest, I retrieved the 346 'coding-nonsynon' variants in CYP genes from our modified database of amino acid variants (coming soon). SNPseek had data for 328 of these; more than half (189) were also classified as either human-rodent or human-vertebrate conserved. Looking at the 105 nsSNPs for which HapMap data was available, it appears that amino acid variants in CYP genes have relatively high rates of monomorphism (42.35%) and population-specificity (24.71%) compared to nsSNPs overall. Even more striking was the low incidence of "common link" SNPs (5.88%).
To me, these patterns fit nicely with the expectation of purifying selection acting on the coding sequences of genes for CYP 450 enzymes.
Posted by Dan Koboldt on Friday, March 30, 2007, 02:07 PM
A fascinating talk was given by Joseph Nadeau (Case Western) at this week's Genetics departmental seminar. He described a long-term collaborative project with Eric Lander in which they studied metabolic traits in mice, particularly resistance to diet-induced obesity. They put two established mouse strains (A/J and BL/6) on a high-fat, high-sugar diet (the rodent equivalent of a Big Mac and large Coke every day). Despite the fact that A/J mice ate more and were less active, they stayed thin while BL/6 mice developed obesity, hypertension, insulin resistance - your complete cardiovascular disease package. Over the course of 7 years (starting in '96) they genotyped 17,000 mice and developed a panel of 22 Chromosome Substitution Strains (CSSs) that you can get from Jackson Labs. They made all kinds of interesting observations; here are five of the most striking ones: 1. Complexity. Whereas previous mouse obesity mapping studies came up with 2-4 loci per trait, Nadeau and Lander's system makes it possible to find more QTLs (8 CSSs per trait on average) with small effects
2. Effect Size. They expected to find many QTLs with small effects. Instead, they found many QTLs with large (51% on average) effects. For example, 8 CSSs account for 99.8% of variation in cholesterol levels among mice on the "diet".
3. Fractal Genetics.They observed a similarly large number of large effects on resistance to diet-induced obesity at the chromosome, congenic, and sub-congenic levels.
4. Epistasis. Some 20 genes conferred resistance to diet-induced obesity, but all of their effects were non-additive.
5. Alternating Stable States. In fact, the effect of having more genes reversed the phenotype completely. One gene = obese, 2 genes = lean, 3 genes = obese, 4 genes = lean.
I think that everyone left the room with a new appreciation for "systems biology" in mice to study the genetic architecture of complex traits.
Posted by Dan Koboldt on Tuesday, March 27, 2007, 11:12 AM
Another useful application of data from the International HapMap Project was published in Science last month: evaluation the relative contributions of single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) to inter-individual differences in gene expression. Barbara Stranger, Manolis Dermitzakis, and colleagues performed association analyses between expression of 14,925 genes and SNPs/CNVs in HapMap cell lines. There were at least two key findings. First, much more of the heritable variation in gene expression is due to SNPs (83.6%) than to CNVs (17.7%). Second, there was little overlap in signals between both types of variants; less than 20% of CNV associations had a corresponding SNP association. Taken together, these findings reinforce the importance of SNPs, despite the excitement over recent discoveries related to copy-number variation. However, they also highlight the fact that both SNPs and CNVs make significant, largely-independent contributions to the genetic variability that underlies complex phenotypes like disease susceptibility and drug response.
Stranger BE et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science. 2007 Feb 9;315(5813):848-53.
Posted by Dan Koboldt on Friday, March 16, 2007, 12:06 PM
Our lab became interested in large insertions, deletions, and copy number polymorphisms following a landmark issue of Nature Genetics (2006) describing the prevalence of such structural variants in the human genome. In particular, we have focused on developing an algorithm to detect "arrangement polymorphisms", or ARPs, in resequencing data.The principle behind our approach is that sequencing reads spanning the boundaries of arrangement polymorphisms will exhibit certain unusual patterns when aligned to the reference sequence using BLAST. In essence, such "breakpoint reads" will have two high-scoring alignments against the reference genome rather than one. The presence of two alignments for a single read suggests the presence of an ARP, and their relative relationship can often serve to infer the type and size of the underlying polymorphism.
We used BLAST to 13,632 sequencing traces from the HK104 strain of C. briggsae against the "cb25" (ultracontig) genome assembly that is based on canonical strain AF16. This model system is advantageous for testing our algorithm because AF16-HK104 divergence is high (one difference every ~400 bp) and the read coverage (0.6X) is decent, both of which increase the probability of a hit. Though we are still refining the technique, it has detected at least three large deletions in HK104 that are likely to be real polymorphisms.
| ultra_contig | start | stop | type | size |
| cb25.fpc0090 | 603965 | 604648 | DELETION | 267 bp |
| cb25.fpc4118 | 273978 | 272428 | DELETION | 1,252 bp |
| cb25.fpc2260 | 650151 | 657894 | DELETION | 7,516 bp |
PCR validation of our ARP predictions is under way.
Posted by Dan Koboldt on Tuesday, March 13, 2007, 01:11 PM
Most of our SNP discovery efforts in C. briggsae make use of the ssaha-SNP program developed by Jim Mullikin (with A. Spargo and Z. Ning). Today I used a companion script, parse_indel, to extract insertion-deletion polymorphisms detected in 13,632 HK104 shotgun reads. For the reference sequence, I used the "cb3" assembly now available on Wormbase (L. Hillier and R. Waterston). When I combined results from all reads, some 7,537 unique indel loci were detected. There were 306 observed in multiple reads, of which 7 were discordant (and tossed). That left me with 7,530 candidate insertion-deletion variants. Around two-thirds of these (4686) were deletions; the remaining third (2844) were insertions. This is most likely not due to biology, but due to the higher probability for detecting deletions in shotgun sequence data than insertions.
Looking at the distribution of indel sizes:
5,906 indels were 1-2 bp
1,338 indels were 3-10 bp
214 indels were 11-20 bp
72 indels were >20 bp
I hope to do some more processing (providing flanking sequence, filtering by size, etc) before making these available on the C. briggsae data downloads page.
Posted by Dan Koboldt on Monday, March 12, 2007, 03:14 PM
The SNP Research Facility, headed by Dr. Raymond D. Miller (Department of Genetics) at Washington University School of Medicine, studies patterns of genetic variation in humans and model organisms. We perform high-throughput SNP genotyping using the FP-TDI platform from PerkinElmer. Our laboratory, along with Pui Kwok's group at UCSF, served as a major genotyping center for Phase I of the International HapMap Project. Currently, we are funded to develop a high-density genetic map and ancillary resources to support C. briggsae as a model organism.This blog was launched in March of 2007 to share news, data, and perspectives related to our research.


Search



