Highlights of the HUGO Meeting 
Posted by Dan Koboldt on Wednesday, May 30, 2007, 04:51 PM

Last week I attended the HGM2007 meeting hosted by HUGO in Montreal, Canada. Here are some of the presentation highlights.

David Altschuler gave a talk entitled "Genomic variation and Inheritance of Common Disease" in which he mainly discussed a genome-wide association study of Type 2 Diabetes (T2D). They looked for genetic association of T2D and 18 additional complex traits (cardiovascular phenotypes, etc.) in a sizeable sample population (1464 cases and 1467 controls). Using the Affy 500K chip and applying strict QC cutoffs, they tested 386,731 markers covering around 78% of common SNPs (CEU MAF >5%) in the genome. Interestingly, at least two of the associated SNPs he mentioned were in noncoding regions: a common SNP 125kb upstream of CDKN2B, and another in the intron of TCF7L2. Bottom line: Altschuler et al identified 8 common genetic risk factors for T2D, all of which had frequencies of 26-85% in European populations. Each factor, however, had a very modest effect on risk (11-34%) - a good example of the genetics of common disease.

Another esteemed speaker was Sarah Tishkoff, who presented work on genetic structure and adaptation in African populations . Her group published some work late last year on a mutation common to East African populations that confers the ability to digest milk. The mutation appears to have arisen independently of the lactose-tolerance variant in Europeans, but affects the same gene (LCT). More interestingly, the African variant is over 14kb upstream of the LCT gene, in an intron of a different gene altogether. Dr. Tishkoff showed some results from a luciferase-expression assay that they developed to test the effect of ancestral and derived haplotypes on LCT expression. Even within Africa there appear to be at least four genetic sub-populations, suggesting that variation across ethnic groups is quite extensive on the dark continent.

The Autism Genome Project was well represented by Stephen Scherer, whose talk about chromosomal rearrangements in autism spectrum disorder (ASD) fell under the "Structural Variation" symposium. Evidently there are three main categories of clinical symptoms in ASD, behavior, social interaction, and communication/play, make up the so-called "triad of symptoms" for autism diagnosis. It has long been known that chromosomal aberrations were implicated in ASD, as 7.4% of patients have cytogenetically-visible chromosome rearrangements. Scherer et al identified some 3,443 copy number variations (CNVs) across 111 or so genomic regions, and found that 16% of their autism cases had chromosomal or CNV aberrations. Unsurprisingly, ASD is proving to be a complex disorder, with "many genetic ways to get there."

Next up, " 454 Does Jim Watson " - the Roche company seminar by Bruce Taillon described a 3.5X coverage sequencing of Jim Watson's genome. Their project offers an interesting perspective because it offers a broad picture of sequence variation in a single individual. Around 1.3 million reads did not match the current human genome sequence (NCBI b36). Of these, 20% matched the Celera assembly and 65% were repetitive sequence - clearly the human genome sequence remains a "draft assembly". Jim had 1,942,500 substitutions (SNPs), 67.8% of which matched known variants from dbSNP. That left 625,238 novel variants - at least 400,000 of these have 2+ reads and are thus likely to be real. Among known SNPs, some 50 variants he carries are listed in databases of known phenotypes (e.g. OMIM), but Roche didn't say which ones.
3 comments ( 7 views )   |  0 trackbacks   |  related link   |   ( 3 / 100 )

Breakpoint read work presented at MC-GARD meeting 
Posted by Dan Koboldt on Wednesday, May 9, 2007, 01:03 PM

Last week I attended the first MC-GARD (Marie Curie - Genome Architecture in Relation to Disease) meeting, titled "Molecular Profiling of the Genome". The conference, chaired by Bauke Ylstra, was hosted at the VU University Medical Center in Amsterdam, the Netherlands. Lars Feuk (Hospital for Sick Children in Toronto) set the tone for the meeting with his keynote lecture titled Discovery of Structural Variation: New insights into disease. He described the Database of Genomic Variants hosted at Sickkids as well as some work on CNVs and autism.

Many of the talks discussed copy number variants (CNVs) and their relationship to human cancers. Copy number analysis with array CGH (aCGH) was a popular topic. On Friday (4 May), presentations opened with a very nice talk by Erwin Schurr (McGill Univ.) on the host genetics of tuberculosis and leprosy. He won points for discussing the common disease common variant (CDCV) hypothesis and the HapMap Project . Their work, however, focuses on finding "major gene effects" modulating the immunogenetics of infection. Counterintuitively, the major gene they found exhibited a dominant (not recessive) effect in Tb/Leprosy.

The invited speaker for my section was Mathew Hurles (Sanger), who described a very nice body of work to build the comprehensive map of copy number variation in the human genome. Their consortium has used high-resolution aCGH analysis of the HapMap samples to construct a dense map of CNVs >500 bp in size, a valuable addition to the haplotype map. In addition, they collaborated with E. Dermitzakis on correlating CNVs with heritable changes in gene expression.

The data Mat showed were impressive and made an excellent lead-in to my talk concerning high-throughput identification of structural variations from sequence trace data, which will be covered in another posting.
5 comments ( 30 views )   |  0 trackbacks   |  related link   |   ( 3.1 / 117 )

Average DAF of dbSNP Functional Classes 
Posted by Dan Koboldt on Thursday, April 19, 2007, 11:13 AM
I have calculated derived allele frequency (DAF) values for 2,539,864 SNPs characterized by the International HapMap Project in four populations of different ancestry. Although dbSNP offers only limited annotations of functional relevance for SNPs, I thought it might be interesting to plot the average DAF values for each of their functional classes (nonsynonymous coding, splice site, synonymous coding, mRNA-UTR, intron, locus, and unknown).

Differences in average derived allele frequency are apparent both between HapMap panels, and between dbSNP functional classes. The increased relative DAF values for Europeans and Asians is consistent with a population bottleneck in recent human history. Looking at the functional classes, splice-site SNPs were the most scarce and also exhibited the lowest DAFs on average. Unsurprisingly, DAF values were notably decreased for nonsynonymous coding SNPs. Perhaps more surprising are synonymous SNPs, which allegedly have no functional relevance, but nevertheless have lower allele frequencies than UTR, intronic, and intergenic SNPs. It would be useful if dbSNP offered some further classifications (e.g. "promoter") of putative functional relevance.
fxn_classsnpsYRICEUCHBJPT
unknown1,464,3720.25210.28250.28170.2816
locus71,9120.24390.27380.27330.2731
intron835,7610.23860.26900.26910.2690
mrna-utr139,0460.23600.26570.26520.2654
coding-syn13,4470.21240.23650.24080.2419
coding-non15,0610.16650.19310.19660.1970
splice-site2650.03710.03950.04920.0500

7 comments ( 16 views )   |  0 trackbacks   |  related link   |   ( 3.1 / 125 )

SNPs in human cytochrome P450 genes 
Posted by Dan Koboldt on Friday, April 6, 2007, 04:00 PM
I recently used our SNPseek web tool to perform an analysis of genetic (SNP) variation across cytochrome P450 genes in humans. A simple search by product name retrieved 53 CYP genes from the UCSC Known Genes (hg18). I plugged these gene symbols into SNPseek's analysis tool to retrieve a comprehensive report of the SNPs in CYP 450 genes according to dbSNP (b126).

SNPseek reported some 6,658 SNPs across the 53 CYP loci; of the 2,120 variants that were characterized by the HapMap Project just over half (1,093) were polymorphic.

Because nonsynonymous SNPs are of particular interest, I retrieved the 346 'coding-nonsynon' variants in CYP genes from our modified database of amino acid variants (coming soon). SNPseek had data for 328 of these; more than half (189) were also classified as either human-rodent or human-vertebrate conserved. Looking at the 105 nsSNPs for which HapMap data was available, it appears that amino acid variants in CYP genes have relatively high rates of monomorphism (42.35%) and population-specificity (24.71%) compared to nsSNPs overall. Even more striking was the low incidence of "common link" SNPs (5.88%).

To me, these patterns fit nicely with the expectation of purifying selection acting on the coding sequences of genes for CYP 450 enzymes.


4 comments ( 30 views )   |  0 trackbacks   |  related link   |   ( 3 / 129 )

Modeling the genetics of complex traits in mice 
Posted by Dan Koboldt on Friday, March 30, 2007, 02:07 PM
A fascinating talk was given by Joseph Nadeau (Case Western) at this week's Genetics departmental seminar. He described a long-term collaborative project with Eric Lander in which they studied metabolic traits in mice, particularly resistance to diet-induced obesity. They put two established mouse strains (A/J and BL/6) on a high-fat, high-sugar diet (the rodent equivalent of a Big Mac and large Coke every day). Despite the fact that A/J mice ate more and were less active, they stayed thin while BL/6 mice developed obesity, hypertension, insulin resistance - your complete cardiovascular disease package. Over the course of 7 years (starting in '96) they genotyped 17,000 mice and developed a panel of 22 Chromosome Substitution Strains (CSSs) that you can get from Jackson Labs. They made all kinds of interesting observations; here are five of the most striking ones:
1. Complexity. Whereas previous mouse obesity mapping studies came up with 2-4 loci per trait, Nadeau and Lander's system makes it possible to find more QTLs (8 CSSs per trait on average) with small effects
2. Effect Size. They expected to find many QTLs with small effects. Instead, they found many QTLs with large (51% on average) effects. For example, 8 CSSs account for 99.8% of variation in cholesterol levels among mice on the "diet".
3. Fractal Genetics.They observed a similarly large number of large effects on resistance to diet-induced obesity at the chromosome, congenic, and sub-congenic levels.
4. Epistasis. Some 20 genes conferred resistance to diet-induced obesity, but all of their effects were non-additive.
5. Alternating Stable States. In fact, the effect of having more genes reversed the phenotype completely. One gene = obese, 2 genes = lean, 3 genes = obese, 4 genes = lean.

I think that everyone left the room with a new appreciation for "systems biology" in mice to study the genetic architecture of complex traits.
1 comment ( 3 views )   |  0 trackbacks   |  related link   |   ( 2.9 / 168 )

SNPs, CNVs, and Gene Expression 
Posted by Dan Koboldt on Tuesday, March 27, 2007, 11:12 AM
Another useful application of data from the International HapMap Project was published in Science last month: evaluation the relative contributions of single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) to inter-individual differences in gene expression. Barbara Stranger, Manolis Dermitzakis, and colleagues performed association analyses between expression of 14,925 genes and SNPs/CNVs in HapMap cell lines. There were at least two key findings. First, much more of the heritable variation in gene expression is due to SNPs (83.6%) than to CNVs (17.7%). Second, there was little overlap in signals between both types of variants; less than 20% of CNV associations had a corresponding SNP association.

Taken together, these findings reinforce the importance of SNPs, despite the excitement over recent discoveries related to copy-number variation. However, they also highlight the fact that both SNPs and CNVs make significant, largely-independent contributions to the genetic variability that underlies complex phenotypes like disease susceptibility and drug response.

Stranger BE et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science. 2007 Feb 9;315(5813):848-53.


1 comment ( 3 views )   |  0 trackbacks   |  related link   |   ( 3 / 158 )

Large deletions in C. briggsae strain HK104 
Posted by Dan Koboldt on Friday, March 16, 2007, 12:06 PM
Our lab became interested in large insertions, deletions, and copy number polymorphisms following a landmark issue of Nature Genetics (2006) describing the prevalence of such structural variants in the human genome. In particular, we have focused on developing an algorithm to detect "arrangement polymorphisms", or ARPs, in resequencing data.

The principle behind our approach is that sequencing reads spanning the boundaries of arrangement polymorphisms will exhibit certain unusual patterns when aligned to the reference sequence using BLAST. In essence, such "breakpoint reads" will have two high-scoring alignments against the reference genome rather than one. The presence of two alignments for a single read suggests the presence of an ARP, and their relative relationship can often serve to infer the type and size of the underlying polymorphism.

We used BLAST to 13,632 sequencing traces from the HK104 strain of C. briggsae against the "cb25" (ultracontig) genome assembly that is based on canonical strain AF16. This model system is advantageous for testing our algorithm because AF16-HK104 divergence is high (one difference every ~400 bp) and the read coverage (0.6X) is decent, both of which increase the probability of a hit. Though we are still refining the technique, it has detected at least three large deletions in HK104 that are likely to be real polymorphisms.

ultra_contigstartstoptypesize
cb25.fpc0090603965604648DELETION267 bp
cb25.fpc4118273978272428DELETION1,252 bp
cb25.fpc2260650151657894DELETION7,516 bp


PCR validation of our ARP predictions is under way.
9 comments ( 43 views )   |  0 trackbacks   |  related link   |   ( 2.9 / 168 )

Small Indels in C. briggsae strain HK104 
Posted by Dan Koboldt on Tuesday, March 13, 2007, 01:11 PM
Most of our SNP discovery efforts in C. briggsae make use of the ssaha-SNP program developed by Jim Mullikin (with A. Spargo and Z. Ning). Today I used a companion script, parse_indel, to extract insertion-deletion polymorphisms detected in 13,632 HK104 shotgun reads. For the reference sequence, I used the "cb3" assembly now available on Wormbase (L. Hillier and R. Waterston).

When I combined results from all reads, some 7,537 unique indel loci were detected. There were 306 observed in multiple reads, of which 7 were discordant (and tossed). That left me with 7,530 candidate insertion-deletion variants. Around two-thirds of these (4686) were deletions; the remaining third (2844) were insertions. This is most likely not due to biology, but due to the higher probability for detecting deletions in shotgun sequence data than insertions.

Looking at the distribution of indel sizes:
5,906 indels were 1-2 bp
1,338 indels were 3-10 bp
214 indels were 11-20 bp
72 indels were >20 bp

I hope to do some more processing (providing flanking sequence, filtering by size, etc) before making these available on the C. briggsae data downloads page.


4 comments ( 23 views )   |  0 trackbacks   |  related link   |   ( 2.9 / 174 )


Next