<?xml version="1.0" encoding="ISO-8859-1"?>
<feed version="0.3" xmlns="http://purl.org/atom/ns#" xml:lang="en-US">
	<title>SNPs In Action</title>
	<link rel="alternate" type="text/html" href="http://snp.wustl.edu/blog/index.php" />
	<modified>2009-11-07T16:08:03Z</modified>
	<author>
		<name>Lab of Raymond D. Miller, Department of Genetics, Washington University School of Medicine</name>
	</author>
	<copyright>Copyright 2009, Lab of Raymond D. Miller, Department of Genetics, Washington University School of Medicine</copyright>
	<generator url="http://www.sourceforge.net/projects/sphpblog" version="0.4.8">SPHPBLOG</generator>
	<entry>
		<title>NCI&amp;#039;s Stephen Chanock on Genetics of Common Cancers</title>
		<link rel="alternate" type="text/html" href="http://snp.wustl.edu/blog/index.php?entry=entry080110-103714" />
		<content type="text/html" mode="escaped"><![CDATA[<TABLE CELLSPACING=0 CELLPADDING=3 BORDER=0><TR VALIGN="top"><TD><TABLE CELLSPACING=0 CELLPADDING=1 BORDER=0 BGCOLOR="#000000" ALIGN="center"><TR><TD><IMG SRC="images/chanock.jpg"></TD></TR></TABLE><TD>
Stephen Chanock of the NCI gave an excellent talk on the genetics of susceptibility to common cancers, focusing on the work of the <a href="http://cgems.cancer.gov" target="_blank" >Cancer Genetic Markers of Susceptibility (CGEMS)</a> initiative.  He began by quipping that <b>"cancer is 100% genetics and 100% environmental"</b> in other words, the result of variable hosts responding to variable environments.  
<BR><BR>
In CGEMS they are looking at two of the most common cancers: prostate cancer in men, and breast cancer in women.  They began with a genome-wide association study on the Illumina platform, followed by validation and targeted resequencing.  While not admitting any endorsement of Illumina, he said that his group at the NCI used it rather than Affymetrix because they were "happy with the results".  
<BR><BR>
Dr. Chanock pointed out that <b>population structure matters</b> in large-scale studies such as these, especially when combining data sets (e.g. Affymetrix and Illumina SNP-chips).  He showed a nice triangular STRUCTURE plot of their self-identified "Caucasian" samples using European, African, and Asian populations of origin.  Most were clustered around European, but there were definitely some outliers towards the other two groups.
<BR><BR>
Some other interesting points from his talk were:
</TD></TR></TABLE><br /><b>Resequencing by 454 Long-Range Targeted PCR works!</b>  They sequenced 135 kb in 60 cases and 6 (CEU) controls and saw an overall concordance of 99.4% with known data.<br /><br /><b>There are at least 7 loci contributing to prostate cancer susceptibility.</b>  CGEMS identified two independent loci in 8q24, as well as 10q11, 17q21, 11q13, 10q26, and 7p15 as important loci for prostate cancer.  Interestingly, for most loci homozygotes of the risk allele had much higher odds ratios (&gt;1.4) than heterozygoutes (&lt;1.2).  <br /><br /><b>8q24 May Be A &quot;Master Cancer Region&quot;.</b>  Several studies including CGEMS have linked the region with susceptibility for breast, prostate, and colorectal cancer.  <br /><br /><b>Functional determination of causal variants should come before most clinical validation.</b>  Once you have a &quot;hit&quot; from association studies, Dr. Chanock recommends that you examine the region experimentally and bio-informatically, before you &quot;take genetics to the max&quot; with lots of gene resequencing. <br /><br /><b>Regulatory variation may outweigh coding variation in cancer.</b>  Dr. Chanock expressed his expectation that we will find very few nonsynonymous SNPs associated with genetic susceptibility; instead, there will be &quot;lots of regulatory variation&quot; that likely maps outside the exons of protein-coding genes.]]></content>
		<id>http://snp.wustl.edu/blog/index.php?entry=entry080110-103714</id>
		<issued>2008-01-10T00:00:00Z</issued>
		<modified>2008-01-10T00:00:00Z</modified>
	</entry>
	<entry>
		<title>2007: the Year of Human Genetic Variation</title>
		<link rel="alternate" type="text/html" href="http://snp.wustl.edu/blog/index.php?entry=entry071231-155351" />
		<content type="text/html" mode="escaped"><![CDATA[<TABLE CELLSPACING=0 CELLPADDING=1 BORDER=0 BGCOLOR="#000000" ALIGN="center"><TR><TD><IMG SRC="images/BreakThruOfTheYear.jpg"></TD></TR><TR><TD align="right" bgcolor="#FFFFFF"><font style="size: 8px; color: #AAAAAA">Image credit: Science</font></TD></TR></TABLE><br /><br />2007 was a shining year for the field of genomics.  The completion of the phase II human haplotype map (HapMap), advancement of high-throughput genotyping and DNA sequencing technologies, and several large-scale studies of sequence variation yielded a wealth of knowledge about the genetics underlying human diversity.  Indeed, the December 21 issue of <i>Science</i> named human genetic variation the <a href="http://www.sciencemag.org/sciext/btoy2007/" target="_blank" > breakthrough of the year</a>.  The knowledge and technology developed around the human genome yielded many fruits in 2007.  Here are some of the highlights:<br /><br /><b>Understanding Genetic Variation</b><br /><br />The <a href="http://www.ncbi.nlm.nih.gov/pubmed/17943122" target="_blank" > completion of Phase II</a> of the <a href="http://www.hapmap.org" target="_blank" >International HapMap Project</a> offered an unrivaled view of the underlying structure of human genetic variation.  More than three million SNPs have been characterized in populations of European, African, Chinese, and Japanese origin.  Groups such as the Sanger Institute have coupled this new knowledge with advances in high-throughput genotyping to produce stunning genome-wide association studies of common disorders.<br /><br /><b>New Players in Sequence Variation</b><br /><br />The excitement around structural variation continued to grow.  New array-based and sequence-based techniques to identify insertions, deletions, inversions, and CNVs yielded surprising news about the prevalence and extent of such variation in humans.  Also, the number of assocations between SVs and disoders such as mental retardation grew substantially in 2007, reinforcing the early notion that these were important to human health.<br /><br /><b>Exploring Genome Function</b><br /><br />With simultaneous publications of over a dozen papers (in Nature and Genome Research), <a href="http://www.nature.com/nature/focus/encode/index.html" target="_blank" > the ENCODE consortium</a> announced completion of its Phase I pilot project.  In 44 regions totaling around 1% of the human genome, some 35 groups generated more than 200 computational and experimental data sets whose analysis yielded a vast wealth of information about the structure, content, and function of the human genome.  See the earlier entry about the ENCODE pilot project.<br /><br /><b>The Personal Genome</b><br /><br />Finally, the advent of array-based genotyping and sequencing technologies began to offer unprecedented data throughput, bringing the analysis of a single person&#039;s genome within reach.  Next-gen sequencing player <a href="http://www.454.com/" target="_blank" >454 Life Sciences</a> was acquired by Roche, and also announced the complete sequencing of Jim Watson&#039;s genome (though we&#039;re still waiting on the publication).  Perhaps sensing this delay, J. Craig Venter published the sequencing of his entire genome (through standard ABI 3730 methods).  For the ordinary consumer, at least three companies (<a href="https://www.23andme.com/" target="_blank" >23andMe</a>, <a href="http://www.navigenics.com" target="_blank" >Navigenics</a>, and <a href="http://www.decodeme.com/" target="_blank" >deCODEme</a>) announced their intentions to offer &quot;personalized&quot; genetic analysis on SNP-chip arrays.  The fee-for-service starts at a couple thousand dollars.  Finally, an announcement by the FDA marked a major milestone for the field of pharmacogenomics - genetic information was being added to a drug label for the first time.  Indeed, the FDA now includes information about the dose effects of two genes on the drug <a href="http://www.warfarindosing.org" target="_blank" >warfarin</a> (a coumarin-type anticoagulant), and other drugs will no doubt follow.  Taken together, the events of 2007 suggest that the ultimate goal of personalized medicine may be on the horizon.<br />]]></content>
		<id>http://snp.wustl.edu/blog/index.php?entry=entry071231-155351</id>
		<issued>2007-12-31T00:00:00Z</issued>
		<modified>2007-12-31T00:00:00Z</modified>
	</entry>
	<entry>
		<title>HapMap Part II: 3.1 million SNPs and natural selection</title>
		<link rel="alternate" type="text/html" href="http://snp.wustl.edu/blog/index.php?entry=entry071029-135223" />
		<content type="text/html" mode="escaped"><![CDATA[Two publications in <i>Nature</i> earlier this month marked the completion of Phase II of the International HapMap Project, an epic quest to understand the pattern of genetic variation in humans. <br /><br /><TABLE CELLSPACING=0 CELLPADDING=1 BORDER=0 BGCOLOR="#000000" ALIGN="center"><TR><TD><IMG SRC="images/HapMap-Logo.jpg"></TD></TR></TABLE><br /><br />The first publication,  <a href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=pubmed&amp;dopt=Abstract&amp;list_uids=17943122" target="_blank" >A second generation human haplotype map</a> describes the addition of 2.1 million SNPs to an already-dense Phase I map of human SNPs.  Now, the HapMap has one polymorphic SNP every 875 bp on average and within 5 kb of 98.6% of the assembled genome.  Interestingly, because SNP selection in Phase II did not consider SNP spacing or known MAF, the results offer a better view of rare variation in the human genome.  <br /><br /><b>So How Many Tag SNPs Are There?</b> <br />One key finding from Phase II is that up to 1% of common variants are untaggable because they lie in recombination hotspots.  For the &quot;taggable&quot; majority, however, it takes 552,853 tag SNPs to capture common (MAF &gt;= 0.05) variation at r^2 of at least 0.8 in the European-derived population.  As expected, this number is slightly lower for Asian-derived populations (520,111 tag SNPs) and substanitally higher (1.09 million tag SNPs) for African-derived populations. It should be noted that tagging for African populations was dramatically improved by the Phase II data.<br /><br /><b>What Did We Learn About Selection?</b><br />Of the 56,789 nonsynonymous SNPs in dbSNP release 125, the HapMap attempted genotyping for 36,777 (64.76%) and got QC-passed, polymorphic results for 17,427 (47.39% of genotyped).  That&#039;s a fairly dismal validation rate compared to the rest of the genome.  Relative to synonymous SNPs, nonsynonymous SNPs in the HapMap exhibited an excess of rare variation and a paucity of common variation consistent with <b>widespread purifying selection</b> against protein mutations.  Additionally, the patterns of selection appeared stronger in the YRI panel, suggesting a reduced efficacy/strength of selection among non-African populations.  <br /><br />The second HapMap paper focused on  <a href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=pubmed&amp;dopt=Abstract&amp;list_uids=17943131" target="_blank" >positive selection in human populations</a>.  Using a modified extended-haplotype homozygosity test, <i>Sabeti et al</i> identified 26 nsSNPs with regional evidence of positive selection.  The candidate loci contain genes involved in Lassa virus infection (in Africa), skin pigmentation (in Europe), and hair follicle development (in Asia).    ]]></content>
		<id>http://snp.wustl.edu/blog/index.php?entry=entry071029-135223</id>
		<issued>2007-10-29T00:00:00Z</issued>
		<modified>2007-10-29T00:00:00Z</modified>
	</entry>
	<entry>
		<title>Functional Elements of the Genome: The ENCODE Pilot Project</title>
		<link rel="alternate" type="text/html" href="http://snp.wustl.edu/blog/index.php?entry=entry070914-121258" />
		<content type="text/html" mode="escaped"><![CDATA[
<TABLE CELLSPACING=0 CELLPADDING=1 BORDER=0 BGCOLOR="#000000" ALIGN="center"><TR><TD><IMG SRC="images/ENCODE-banner.jpg"></TD></TR></TABLE><br /><br />In June, a <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=17571346" target="_blank" >landmark publication</a> in <i>Nature</i> and dozens of companion articles in other journals heralded the completion of the the ENCODE pilot project.  Funded by the NHGRI in 2003, the <a href="http://www.genome.gov/10005107" target="_blank" >Encyclopedia Of DNA Elements</a> is a major undertaking by researchers at some 50 institutions to characterize functional elements in the human genome.  In the pilot phase, 44 regions representing about 1% of the genome were examined.  With an elegant combination of computational and experimental techniques, this effort represents a major step forward in our understanding of the function and architecture of our genetic code.  <br /><br />The main paper and several companion/commentary articles are listed below, but here is a quick summary of some of the key findings:
<UL>
<LI><b>The human genome is pervasively transcribed.</b>  Some 74% of the bases were represented in transcripts identified by at least two different technologies, and a substantial number of the transcripts are from noncoding regions.
<LI><b>Transcription start sites (TSSs) are far more numerous than previously believed.</b>  Some 4,591 TSS clusters (many of them novel) were detected in ENCODE regions; that&#039;s almost 10 times the number of established protein-coding genes.
<LI><b>Chromatin architecture and histone modifications predict transcriptional activity.</b>  Support vector machine (SVM) modeling of histone modification data proved capable of predicting gene expression status (transcribed or not transcribed) with >90% accuracy.  Transcriptionally-active regions also correlated highly with the presence of TSSs.
<LI><b>DNA replication timing is correlated with chromatin structure.</b>  "Active domains", associated with early replication, were enriched for TSSs, CpG islands, and Alu elements.
Conversely, "repressed domains", associated with late replication, were enriched for LINE1 and LTR transposons.
<LI><b>Some 5% of bases in the genome are selectively constrained in mammals.</b>  Around 40% of constrained bases overlap coding exons and their UTRs, while 20% cover noncoding functional elements supported by experimental data.  That leaves 40% of constrained bases whose function, if any, remains unknown.  Perhaps even more surprising, many of the experimentally-identified functional elements do not appear to be constrained across mammalian evolution.
</UL><br />The findings of the ENCODE consortium are an achivement of collaboration and technological innovation, and offer an unparalleled view into the complexity underlying the human genome.  To read more:<br /><br /><b>The ENCODE Paper</b><br />Birney, E., et al., <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=17571346" target="_blank" >Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project</a>. Nature, 2007. 447(7146): p. 799-816.<br /><br /><b>Related Commentaries</b><br />Weinstock, G.M., <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=17567987" target="_blank" >ENCODE: more genomic empowerment</a>. Genome Res, 2007. 17(6): p. 667-8.<br />Gerstein, M.B., et al., <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=17567988" target="_blank" >What is a gene, post-ENCODE? History and updated definition</a>. Genome Res, 2007. 17(6): p. 669-81.<br />Henikoff, S., <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=17597770" target="_blank" >ENCODE and our very busy genome</a>. Nat Genet, 2007. 39(7): p. 817-8.<br /><br /><b>Select Companion Articles</b><br />Margulies, E.H., et al., <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=17567995" target="_blank" >Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome</a>. Genome Res, 2007. 17(6): p. 760-74.<br />Tress, M.L., et al., <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=17372197" target="_blank" >The implications of alternative splicing in the ENCODE protein complement</a>. Proc Natl Acad Sci U S A, 2007. 104(13): p. 5495-500.<br />Zheng, D., et al., <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=17568002" target="_blank" >Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution</a>. Genome Res, 2007. 17(6): p. 839-51.<br />King, D.C., et al., <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=17567996" target="_blank" >Finding cis-regulatory elements using comparative genomics: some lessons from ENCODE data</a>. Genome Res, 2007. 17(6): p. 775-86.<br />Washietl, S., et al., <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=17568003" target="_blank" >Structured RNAs in the ENCODE selected regions of the human genome. Genome Res, 2007. 17(6): p. 852-64.<br />Thurman, R.E., et al., [url=http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=17568007]Identification of higher-order functional domains in the human ENCODE regions</a>. Genome Res, 2007. 17(6): p. 917-27.<br /><br /><b>Applications</b><br />Elnitski, L.L., et al., <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=17568011" target="_blank" >The ENCODEdb portal: simplified access to ENCODE Consortium data</a>. Genome Res, 2007. 17(6): p. 954-9.<br />Thomas, D.J., et al., <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=17166863" target="_blank" >The ENCODE Project at UC Santa Cruz</a>. Nucleic Acids Res, 2007. 35(Database issue): p. D663-7.]]></content>
		<id>http://snp.wustl.edu/blog/index.php?entry=entry070914-121258</id>
		<issued>2007-09-14T00:00:00Z</issued>
		<modified>2007-09-14T00:00:00Z</modified>
	</entry>
	<entry>
		<title>Medros, Inc. - Ross Cagan&amp;#039;s Legacy at WashU</title>
		<link rel="alternate" type="text/html" href="http://snp.wustl.edu/blog/index.php?entry=entry070605-110855" />
		<content type="text/html" mode="escaped"><![CDATA[&quot;We need more focus on taking what we study to the clinics.&quot;  This was the opener given at last week&#039;s Genetics seminar by Ross Cagan, a 14-year WashU veteran perhaps best known for  <a href="http://www.medrospharma.com/" target="_blank" >Medros, Inc.</a>, the drug discovery company he founded with fellow professor Thomas Baranski.  The talk began with an overview of eye development in <i>Drosophila</i>, particularly the role of adhesion proteins &quot;Hibris&quot; and &quot;Roughest&quot; (homologs of Nephrin/Neph in humans) whose complementary expression directs correct geometric arrangement of ommatidia in the fly eye.  Mutant phenotypes in eye development are easy to observe in flies, because they create a &quot;ripple effect&quot; among ommatidia that is easy to observe.  Cagan and his colleagues also hoped that our extensive knowledge of epithelia in flies might offer a different approach to studying complex diseases like cancer and diabetes.<br /><br />The breakthrough in using flies to model human disease arose from work on Multiple Endocrine Neoplasia Type 2 (MEN2), a cancer syndrome caused by Ret mutations whose spontaneous form was untreatable and accounts for 75% of cases.  The introduction of oncogenic Ret into fly embryos causes a distinct eye-overgrowth phenotype, offering a nice model system to study the underlying cause of the human disease.  Cagan and colleagues developed a high-throughput, fly-based drug screening technology which, long story short, isolated an AstraZeneca compound that rescued the overgrowth phenotype in flies and eventually proved effective against MEN2 in humans.  They also identified 140 genetic modifiers in flies (enhancers and suppressors) whose homologs are likely resistance and susceptibility genes (respectively) in humans.  Double knockdowns of one modifier, Csk, have enabled a fly model for oncogenesis and metastasis in Ret/MEN2 cancers.  <br /><br />Building fly models of human diseases allows Medros to screen drug candidates for toxicity, efficacy, bio-availability, etc. in a high-throughput and cost-effective manner.  Models for cancers other than MEN2, oncogenic &quot;cooperation&quot;, and diabetes are all in the works.  Sadly, Ross Cagan is leaving St. Louis for a post at Mount Sinai School of Medicine. ]]></content>
		<id>http://snp.wustl.edu/blog/index.php?entry=entry070605-110855</id>
		<issued>2007-06-05T00:00:00Z</issued>
		<modified>2007-06-05T00:00:00Z</modified>
	</entry>
	<entry>
		<title>Highlights of the HUGO Meeting</title>
		<link rel="alternate" type="text/html" href="http://snp.wustl.edu/blog/index.php?entry=entry070530-165143" />
		<content type="text/html" mode="escaped"><![CDATA[
<TABLE CELLSPACING=0 CELLPADDING=1 BORDER=0 ALIGN=&#039;center&#039;>
<TR><TD BGCOLOR="#000000"><img src="images/HUGO.jpg" width="484" height="62" border="0" alt="" /></TD></TR>
</TABLE>
<br />Last week I attended the  <a href="http://hgm2007.hugo-international.org" target="_blank" >HGM2007</a> meeting hosted by HUGO in Montreal, Canada.  Here are some of the presentation highlights.<br /><br />David Altschuler gave a talk entitled &quot;Genomic variation and Inheritance of Common Disease&quot; in which he mainly discussed a  <b>genome-wide association study of Type 2 Diabetes</b>  (T2D).  They looked for genetic association of T2D and 18 additional complex traits (cardiovascular phenotypes, etc.) in a sizeable sample population (1464 cases and 1467 controls).  Using the Affy 500K chip and applying strict QC cutoffs, they tested 386,731 markers covering around 78% of common SNPs (CEU MAF &gt;5%) in the genome.  Interestingly, at least two of the associated SNPs he mentioned were in noncoding regions: a common SNP 125kb upstream of CDKN2B, and another in the intron of TCF7L2.  Bottom line: Altschuler et al identified 8 common genetic risk factors for T2D, all of which had frequencies of 26-85% in European populations.  Each factor, however, had a very modest effect on risk (11-34%) - a good example of the genetics of common disease.<br /><br />Another esteemed speaker was Sarah Tishkoff, who presented work on  <b>genetic structure and adaptation in African populations</b> .  Her group published some work late last year on a mutation common to East African populations that confers the ability to digest milk.  The mutation appears to have arisen independently of the lactose-tolerance variant in Europeans, but affects the same gene (LCT).  More interestingly, the African variant is over 14kb upstream of the LCT gene, in an intron of a different gene altogether.  Dr. Tishkoff showed some results from a luciferase-expression assay that they developed to test the effect of ancestral and derived haplotypes on LCT expression.  Even within Africa there appear to be at least four genetic sub-populations, suggesting that variation across ethnic groups is quite extensive on the dark continent.<br /><br />The  Autism Genome Project was well represented by Stephen Scherer, whose talk about  <b>chromosomal rearrangements in autism spectrum disorder</b>  (ASD) fell under the &quot;Structural Variation&quot; symposium.  Evidently there are three main categories of clinical symptoms in ASD, behavior, social interaction, and communication/play, make up the so-called &quot;triad of symptoms&quot; for autism diagnosis.  It has long been known that chromosomal aberrations were implicated in ASD, as 7.4% of patients have cytogenetically-visible chromosome rearrangements.  Scherer et al identified some 3,443 copy number variations (CNVs) across 111 or so genomic regions, and found that 16% of their autism cases had chromosomal or CNV aberrations.  Unsurprisingly, ASD is proving to be a complex disorder, with &quot;many genetic ways to get there.&quot;<br /><br />Next up, &quot; <b>454 Does Jim Watson</b> &quot; - the Roche company seminar by Bruce Taillon described a 3.5X coverage sequencing of Jim Watson&#039;s genome.  Their project offers an interesting perspective because it offers a broad picture of sequence variation in a single individual.  Around 1.3 million reads did not match the current human genome sequence (NCBI b36).  Of these, 20% matched the Celera assembly and 65% were repetitive sequence - clearly the human genome sequence remains a &quot;draft assembly&quot;.  Jim had 1,942,500 substitutions (SNPs), 67.8% of which matched known variants from dbSNP.  That left 625,238 novel variants - at least 400,000 of these have 2+ reads and are thus likely to be real.  Among known SNPs, some 50 variants he carries are listed in databases of known phenotypes (e.g. OMIM), but Roche didn&#039;t say which ones.]]></content>
		<id>http://snp.wustl.edu/blog/index.php?entry=entry070530-165143</id>
		<issued>2007-05-30T00:00:00Z</issued>
		<modified>2007-05-30T00:00:00Z</modified>
	</entry>
	<entry>
		<title>Breakpoint read work presented at MC-GARD meeting</title>
		<link rel="alternate" type="text/html" href="http://snp.wustl.edu/blog/index.php?entry=entry070509-130325" />
		<content type="text/html" mode="escaped"><![CDATA[
<TABLE CELLSPACING=0 CELLPADDING=1 BORDER=0 ALIGN=&#039;center&#039;>
<TR><TD BGCOLOR="#000000"><img src="images/MCGARD-VUMC-Logos.jpg" width="484" height="87" border="0" alt="" /></TD></TR>
</TABLE>
<br />Last week I attended the first  <a href="http://www.mc-gard.eu/events.php?action=conference_1" target="_blank" >MC-GARD</a>  (Marie Curie - Genome Architecture in Relation to Disease) meeting, titled &quot;Molecular Profiling of the Genome&quot;.  The conference, chaired by Bauke Ylstra, was hosted at the VU University Medical Center in Amsterdam, the Netherlands.  Lars Feuk (Hospital for Sick Children in Toronto) set the tone for the meeting with his keynote lecture titled Discovery of Structural Variation: New insights into disease.  He described the  <a href="http://projects.tcag.ca/variation/" target="_blank" > Database of Genomic Variants</a> hosted at Sickkids as well as some work on CNVs and autism.   <br /><br />Many of the talks discussed  <b>copy number variants </b>  (CNVs) and their relationship to human cancers.  Copy number analysis with  <b>array CGH </b>  (aCGH) was a popular topic.  On Friday (4 May), presentations opened with a very nice talk by Erwin Schurr (McGill Univ.) on the host genetics of tuberculosis and leprosy.  He won points for discussing the  <b>common disease common variant</b>  (CDCV) hypothesis and the  <b>HapMap Project</b> .  Their work, however, focuses on finding &quot;major gene effects&quot; modulating the immunogenetics of  infection.  Counterintuitively, the major gene they found exhibited a dominant (not recessive) effect in Tb/Leprosy.<br /><br />The invited speaker for my section was Mathew Hurles (Sanger), who described a very nice body of work to build the comprehensive map of copy number variation in the human genome.  Their consortium has used high-resolution aCGH analysis of the HapMap samples to construct a  <b>dense map of CNVs &gt;500 bp</b>  in size, a valuable addition to the haplotype map.  In addition, they collaborated with E. Dermitzakis on correlating CNVs with heritable changes in gene expression.  <br /><br />The data Mat showed were impressive and made an excellent lead-in to my talk concerning  <b>high-throughput identification of structural variations</b>  from sequence trace data, which will be covered in another posting.  ]]></content>
		<id>http://snp.wustl.edu/blog/index.php?entry=entry070509-130325</id>
		<issued>2007-05-09T00:00:00Z</issued>
		<modified>2007-05-09T00:00:00Z</modified>
	</entry>
	<entry>
		<title>Average DAF of dbSNP Functional Classes</title>
		<link rel="alternate" type="text/html" href="http://snp.wustl.edu/blog/index.php?entry=entry070419-111323" />
		<content type="text/html" mode="escaped"><![CDATA[I have calculated derived allele frequency (DAF) values for 2,539,864 SNPs characterized by the <b>International HapMap Project</b> in four populations of different ancestry.  Although dbSNP offers only limited annotations of functional relevance for SNPs, I thought it might be interesting to plot the average DAF values for each of their functional classes (nonsynonymous coding, splice site, synonymous coding, mRNA-UTR, intron, locus, and unknown).  <br /> <img src="images/HapMap-DAF-by-fxn_class.jpg" width="467" height="330" border="0" alt="" /> <br />Differences in average derived allele frequency are apparent both between HapMap panels, and between dbSNP functional classes.  The increased relative DAF values for Europeans and Asians is consistent with a population bottleneck in recent human history.  Looking at the functional classes, <b>splice-site SNPs</b> were the most scarce and also exhibited the lowest DAFs on average.  Unsurprisingly, DAF values were notably decreased for <b>nonsynonymous coding SNPs</b>.  Perhaps more surprising are synonymous SNPs, which <i>allegedly</i> have no functional relevance, but nevertheless have lower allele frequencies than UTR, intronic, and intergenic SNPs.  It would be useful if dbSNP offered some further classifications (e.g. &quot;promoter&quot;) of putative functional relevance.<br />
<TABLE CELLSPACING=0 CELLPADDING=1 BORDER=1 BORDERCOLOR="#CCCCCC" ALIGN="center">
<TR><TD><b>fxn_class</b><TD><b>snps</b><TD><b>YRI</b><TD><b>CEU</b><TD><b>CHB</b><TD><b>JPT</b></TR>
<TR><TD>unknown<TD>1,464,372<TD>0.2521<TD>0.2825<TD>0.2817<TD>0.2816</TD></TR>
<TR><TD>locus<TD>71,912<TD>0.2439<TD>0.2738<TD>0.2733<TD>0.2731</TD></TR>
<TR><TD>intron<TD>835,761<TD>0.2386<TD>0.2690<TD>0.2691<TD>0.2690</TD></TR>
<TR><TD>mrna-utr<TD>139,046<TD>0.2360<TD>0.2657<TD>0.2652<TD>0.2654</TD></TR>
<TR><TD>coding-syn<TD>13,447<TD>0.2124<TD>0.2365<TD>0.2408<TD>0.2419</TD></TR>
<TR><TD>coding-non<TD>15,061<TD>0.1665<TD>0.1931<TD>0.1966<TD>0.1970</TD></TR>
<TR><TD>splice-site<TD>265<TD>0.0371<TD>0.0395<TD>0.0492<TD>0.0500</TD></TR>
</TABLE>
]]></content>
		<id>http://snp.wustl.edu/blog/index.php?entry=entry070419-111323</id>
		<issued>2007-04-19T00:00:00Z</issued>
		<modified>2007-04-19T00:00:00Z</modified>
	</entry>
	<entry>
		<title>SNPs in human cytochrome P450 genes</title>
		<link rel="alternate" type="text/html" href="http://snp.wustl.edu/blog/index.php?entry=entry070406-160054" />
		<content type="text/html" mode="escaped"><![CDATA[I recently used our  <a href="http://snp.wustl.edu/SNPseek" target="_blank" >SNPseek</a> web tool to perform an analysis of genetic (SNP) variation across cytochrome P450 genes in humans.  A simple search by product name retrieved 53 CYP genes from the UCSC Known Genes (hg18).  I plugged these gene symbols into SNPseek&#039;s analysis tool to retrieve a comprehensive report of the  <a href="http://snp.wustl.edu/SNPseek/Cytochrome-P450-SNPs.pdf" target="_blank" >SNPs in CYP 450 genes</a> according to dbSNP (b126).<br /><br />SNPseek reported some 6,658 SNPs across the 53 CYP loci; of the 2,120 variants that were characterized by the  <a href="http://snp.wustl.edu/snp-research/hapmap-project" target="_blank" >HapMap Project</a> just over half (1,093) were polymorphic.  <br /><br />Because nonsynonymous SNPs are of particular interest, I retrieved the 346 &#039;coding-nonsynon&#039; variants in CYP genes from our modified database of amino acid variants (coming soon).  SNPseek had data for 328 of these; more than half (189) were also classified as either human-rodent or human-vertebrate conserved.  Looking at the 105 nsSNPs for which HapMap data was available, it appears that amino acid variants in CYP genes have relatively high rates of  <b>monomorphism</b>  (42.35%) and  <b>population-specificity</b>  (24.71%) compared to nsSNPs overall.  Even more striking was the low incidence of &quot;common link&quot; SNPs (5.88%).  <br /><br />To me, these patterns fit nicely with the expectation of  <b>purifying selection</b>  acting on the coding sequences of genes for CYP 450 enzymes.    <br /><br />]]></content>
		<id>http://snp.wustl.edu/blog/index.php?entry=entry070406-160054</id>
		<issued>2007-04-06T00:00:00Z</issued>
		<modified>2007-04-06T00:00:00Z</modified>
	</entry>
	<entry>
		<title>Modeling the genetics of complex traits in mice</title>
		<link rel="alternate" type="text/html" href="http://snp.wustl.edu/blog/index.php?entry=entry070330-130719" />
		<content type="text/html" mode="escaped"><![CDATA[A fascinating talk was given by Joseph Nadeau (Case Western) at this week&#039;s Genetics departmental seminar.  He described a long-term collaborative project with Eric Lander in which they studied metabolic traits in mice, particularly resistance to diet-induced obesity.  They put two established mouse strains (A/J and BL/6) on a high-fat, high-sugar diet (the rodent equivalent of a Big Mac and large Coke every day).  Despite the fact that A/J mice ate more and were less active, they stayed thin while BL/6 mice developed obesity, hypertension, insulin resistance - your complete cardiovascular disease package.  Over the course of 7 years (starting in &#039;96) they genotyped 17,000 mice and developed a panel of 22 Chromosome Substitution Strains (CSSs) that you can get from Jackson Labs.  They made all kinds of interesting observations; here are five of the most striking ones: <br /><b>1. Complexity.</b>  Whereas previous mouse obesity mapping studies came up with 2-4 loci per trait, Nadeau and Lander&#039;s system makes it possible to find more QTLs (8 CSSs per trait on average) with small effects<br /><b>2. Effect Size.</b>  They expected to find many QTLs with small effects.  Instead, they found many QTLs with large (51% on average) effects.  For example, 8 CSSs account for 99.8% of variation in cholesterol levels among mice on the &quot;diet&quot;.<br /><b>3. Fractal Genetics.</b>They observed a similarly large number of large effects on resistance to diet-induced obesity at the chromosome, congenic, and sub-congenic levels.<br /><b>4. Epistasis.</b>  Some 20 genes conferred resistance to diet-induced obesity, but all of their effects were non-additive.<br /><b>5. Alternating Stable States</b>.  In fact, the effect of having more genes reversed the phenotype completely.  One gene = obese, 2 genes = lean, 3 genes = obese, 4 genes = lean.  <br /><br />I think that everyone left the room with a new appreciation for &quot;systems biology&quot; in mice to study the genetic architecture of complex traits.]]></content>
		<id>http://snp.wustl.edu/blog/index.php?entry=entry070330-130719</id>
		<issued>2007-03-30T00:00:00Z</issued>
		<modified>2007-03-30T00:00:00Z</modified>
	</entry>
	<entry>
		<title>SNPs, CNVs, and Gene Expression</title>
		<link rel="alternate" type="text/html" href="http://snp.wustl.edu/blog/index.php?entry=entry070327-101227" />
		<content type="text/html" mode="escaped"><![CDATA[Another useful application of data from the  <a href="http://snp.wustl.edu/articles/hapmap-personalized-medicine.html" target="_blank" >International HapMap Project</a> was published in <i>Science</i> last month: evaluation the relative contributions of single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) to inter-individual differences in gene expression.  Barbara Stranger, Manolis Dermitzakis, and colleagues performed association analyses between expression of 14,925 genes and SNPs/CNVs in HapMap cell lines.  There were at least two key findings.  First, much more of the heritable variation in gene expression is due to SNPs (83.6%) than to CNVs (17.7%).  Second, there was little overlap in signals between both types of variants; less than 20% of CNV associations had a corresponding SNP association.  <br /><br />Taken together, these findings <b>reinforce the importance of SNPs</b>, despite the excitement over recent discoveries related to copy-number variation.  However, they also highlight the fact that <b>both SNPs and CNVs</b> make significant, largely-independent contributions to the genetic variability that underlies complex phenotypes like disease susceptibility and drug response.<br /><br />Stranger BE et al. <a href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&amp;cmd=Retrieve&amp;dopt=AbstractPlus&amp;list_uids=17289997" target="_blank" >Relative impact of nucleotide and copy number variation on gene expression phenotypes.</a> <i>Science</i>. 2007 Feb 9;315(5813):848-53.<br /><br />]]></content>
		<id>http://snp.wustl.edu/blog/index.php?entry=entry070327-101227</id>
		<issued>2007-03-27T00:00:00Z</issued>
		<modified>2007-03-27T00:00:00Z</modified>
	</entry>
	<entry>
		<title>Large deletions in &lt;i&gt;C. briggsae&lt;/i&gt; strain HK104</title>
		<link rel="alternate" type="text/html" href="http://snp.wustl.edu/blog/index.php?entry=entry070316-110652" />
		<content type="text/html" mode="escaped"><![CDATA[Our lab became interested in large insertions, deletions, and copy number polymorphisms following a landmark issue of Nature Genetics (2006) describing the prevalence of such structural variants in the human genome.  In particular, we have focused on developing an algorithm to detect &quot;arrangement polymorphisms&quot;, or ARPs, in resequencing data.<br /><br />The principle behind our approach is that sequencing reads spanning the boundaries of arrangement polymorphisms will exhibit certain unusual patterns when aligned to the reference sequence using BLAST.  In essence, such &quot;breakpoint reads&quot; will have two high-scoring alignments against the reference genome rather than one.  The presence of two alignments for a single read suggests the presence of an ARP, and their relative relationship can often serve to infer the type and size of the underlying polymorphism.<br /><br />We used BLAST to 13,632 sequencing traces from the HK104 strain of <i>C. briggsae</i> against the &quot;cb25&quot; (ultracontig) genome assembly that is based on canonical strain AF16.  This model system is advantageous for testing our algorithm because AF16-HK104 divergence is high (one difference every ~400 bp) and the read coverage (0.6X) is decent, both of which increase the probability of a hit.  Though we are still refining the technique, it has detected at least three large deletions in HK104 that are likely to be real polymorphisms.  <br /><br />
<TABLE CELLSPACING=0 CELLPADDING=2 BORDER=1 BORDERCOLOR="#CCCCCC" ALIGN="center">
<TR><TD><B>ultra_contig<TD><B>start<TD><B>stop<TD><B>type<TD><B>size</TR>
<TR><TD>cb25.fpc0090<TD>603965<TD>604648<TD>DELETION<TD>267 bp</TR>
<TR><TD>cb25.fpc4118<TD>273978<TD>272428<TD>DELETION<TD>1,252 bp</TR>
<TR><TD>cb25.fpc2260<TD>650151<TD>657894<TD>DELETION<TD>7,516 bp</TR>
</TABLE>
<br /><br />PCR validation of our ARP predictions is under way.  ]]></content>
		<id>http://snp.wustl.edu/blog/index.php?entry=entry070316-110652</id>
		<issued>2007-03-16T00:00:00Z</issued>
		<modified>2007-03-16T00:00:00Z</modified>
	</entry>
	<entry>
		<title>Small Indels in &lt;i&gt;C. briggsae&lt;/i&gt; strain HK104</title>
		<link rel="alternate" type="text/html" href="http://snp.wustl.edu/blog/index.php?entry=entry070313-121136" />
		<content type="text/html" mode="escaped"><![CDATA[Most of our SNP discovery efforts in <i>C. briggsae</i> make use of the  <a href="http://www.sanger.ac.uk/Software/analysis/ssahaSNP" target="_blank" >ssaha-SNP</a> program developed by Jim Mullikin (with A. Spargo and Z. Ning).  Today I used a companion script, parse_indel, to extract insertion-deletion polymorphisms detected in 13,632 HK104 shotgun reads.  For the reference sequence, I used the &quot;cb3&quot; assembly now available on Wormbase (L. Hillier and R. Waterston).  <br /><br />When I combined results from all reads, some <b>7,537 unique indel loci</b> were detected.  There were 306 observed in multiple reads, of which 7 were discordant (and tossed).  That left me with 7,530 candidate insertion-deletion variants.  Around two-thirds of these (4686) were deletions; the remaining third (2844) were insertions.  This is most likely not due to biology, but due to the higher probability for detecting deletions in shotgun sequence data than insertions.  <br /><br />Looking at the distribution of indel sizes:<br />5,906 indels were 1-2 bp<br />1,338 indels were 3-10 bp<br />214 indels were 11-20 bp<br />72 indels were &gt;20 bp<br /><br />I hope to do some more processing (providing flanking sequence, filtering by size, etc) before making these available on the <i>C. briggsae</i>  <a href="http://snp.wustl.edu/snp-research/c-briggsae/data-downloads.html" >data downloads</a> page.<br /><br />]]></content>
		<id>http://snp.wustl.edu/blog/index.php?entry=entry070313-121136</id>
		<issued>2007-03-13T00:00:00Z</issued>
		<modified>2007-03-13T00:00:00Z</modified>
	</entry>
	<entry>
		<title>Blog Launched for WUSM SNP Research</title>
		<link rel="alternate" type="text/html" href="http://snp.wustl.edu/blog/index.php?entry=entry070312-141434" />
		<content type="text/html" mode="escaped"><![CDATA[The SNP Research Facility, headed by Dr. Raymond D. Miller (Department of Genetics) at Washington University School of Medicine, studies patterns of genetic variation in humans and model organisms.  We perform  <b>high-throughput SNP genotyping</b> using the FP-TDI platform from PerkinElmer.  Our laboratory, along with  <a href="http://www.ucsf.edu/dbps/faculty/pages/kwok.html" target="_blank" >Pui Kwok&#039;s</a> group at UCSF, served as a major genotyping center for Phase I of the International HapMap Project.  Currently, we are funded to develop a high-density genetic map and ancillary resources to support  <i>C. briggsae</i> as a model organism.<br /><br />This blog was launched in March of 2007 to share news, data, and perspectives related to our research.   ]]></content>
		<id>http://snp.wustl.edu/blog/index.php?entry=entry070312-141434</id>
		<issued>2007-03-12T00:00:00Z</issued>
		<modified>2007-03-12T00:00:00Z</modified>
	</entry>
</feed>

