Washington University School of Medicine SNP Research Facility
Google Research
FP-TDI SNP genotyping


Learning: SNPs

Contents:
Overview
SNPs in humans
Human SNP Variation
SNPs and diseases
Learn More About SNPs
References

Overview

Chromosomes provide the genetic instructions for humans, and can be thought of as linear strings of four chemical bases (or nucleotides), A, C, G, and T. The order of all of these bases (also called the sequence) in humans has now been determined and is publicly available at the National Center for Biotechnology Information. The human genome contains about 3 X 109 bases and almost every cell in the body contains a copy.

These instructions for are passed with very great fidelity from one generation to the next. However, occasionally a mutation has occurred and effectively changed one base to another. When several chromosomes from a population are compared, a site where a mutation occurred in the past may be found, and some chromosomes will have the original base and others will have the new base, i.e. the population will be polymorphic. The site is called a single nucleotide polymorphism (SNP) and the two alternate forms are called alleles.

SNPs in humans

A major database of human SNPs is maintained at NCBI as dbSNP, and it contained data for 5.8 X 106 unique human SNPs (called RefSNPs and identified by an "rs" number) as of Build 118.

The most common type of SNP in humans has alleles A and G. Since DNA is a double helix, the opposite strand has alleles T and C. So an A/G SNP can also be described as a T/C SNP, depending upon orientation. We estimated the distribution of the types of SNPs in humans as follows: 63 % A/G (and T/C), 17 % A/C (and T/G), 8 % CG, 4 % AT, and 8 % insertion/deletions (Miller et al. 2001).

While a SNP could conceivably have three or four alleles, nearly all SNPs have only two alleles. Within a population, a particular SNP can be characterized by allele frequency. We present allele frequencies for 28,000 human SNPs in each of three populations, African Americans, Utah (European ancestry), and Asian (Japanese and Chinese) (data).

Human SNP Variation

Compared with many organisms, humans have little variation for SNPs. When part of a chromosome from one person is compared with the same chromosome from another person (even from a different continent) the difference between them (called heterozygosity) has been estimated at 7.51 x 10-4. This means that on average out of 1,331 bases, 1,330 bases will be the same and one base (the SNP site) will be different. In contrast, between some strains of C. briggsae, a nematode and model organism, there is a SNP every 40 bases. Humans have 24 kinds of chromosomes, 22 autosomes (numbered 1-22 with two copies of each for all), the X chromosome (two copies in females and one copy in males), and the Y chromosome (no copies in females and one copy in males). For comparison of two copies of autosomes, there is one SNP every 1,307 bases. For comparison of two X chromosomes or two Y chromosomes, the differences are even less, with one SNP every 2,132 bases or 6,625 bases respectively. The rate of SNPs also varies some along human chromosomes. For example, the region known as the Major Histocompatibility Complex on chromosome 6 has a high rate of SNPs (The_International_SNP_Map_Working_Group 2001). In contrast, we have found regions with very few SNPs on portions of chromosome X in a population from Utah of European descent (Miller et al. 2001).

Why do humans have less variation for SNPs on the X and Y chromosomes than on autosomes? One explanation relates to the number of each of these chromosomes in a population and the sampling that happens every generation. For example, consider a village with about 10 families for many generations. Suppose there is an A/G SNP on chromosome 1 with an A allele frequency of ½ at the time measured. Of the 20 parents there will be 40 copies of chromosome 1 and 20 copies of the A allele. By chance alone (called genetic drift) in the next generation there may well be 19 or 21 copies of the A allele, or a change in frequency of 0.025. Suppose in the same population there is a C/G SNP on the Y chromosome with a C allele frequency of ½. Of the 20 parents there will be 10 copies of the Y chromosome and 5 copies of the C allele. In the next generation there may well be 4 or 6 copies of the C allele, or a change in frequency of 0.10. Over time the genetic drift will lead to the end of the SNP, where only one allele is left, and this will on average happen more quickly for the SNP on the Y chromosome than for the SNP on chromosome 1. The Y chromosome (similar arguments apply for the X chromosome) has less genetic variation than autosomes because the number of chromosomes (called effective population size) is fewer than for autosomes.

SNPs and diseases

Since most DNA does not code for proteins, the great majority of SNPs have no effect on the function of these workhorses of the body and are carried as "silent variation." However, for some SNPs one allele causes a major health problem or "single gene disease." Many of these diseases have been characterized in humans, and a resource is the OMIM database at NCBI. These diseases are usually rare but serious, and an example is cystic fibrosis, caused by mutations in the CFTR gene on chromosome 7 (OMIM number *602421 ). Many common diseases such as heart disease and cancer are genetically complex, with alleles from several genes contributing to the disease. The alleles do not always cause the disease, but they can interact with a person's environment and other alleles to cause disease. It is a goal of the International HapMap Project to develop research tools to aid searches for these alleles.

The variation of most SNPs is "silent" - no allele confers an observable phenotype. Some SNPs, however, are not silent - they make us unique genetically. Differences in physical appearance, susceptibility to diseases, response to prescription drugs, and many other human attributes are often the result of SNP variation.

Learn More About SNPs

Good educational materials and links can be found at the National Center for Biotechnology, the Dolan DNA Learning Center at Cold Spring Harbor, and the Marshfield Clinic. We have also compiled a list of selected software and databases that are useful for SNP research.

References

International_Human_Genome_Sequencing_Consortium. 2001. Initial sequencing and analysis of the human genome. Nature 409: 860-921.

Miller, R.D., P. Taillon-Miller, and P.Y. Kwok. 2001. Regions of Low Single-Nucleotide Polymorphism Incidence in Human and Orangutan Xq: Deserts and Recent Coalescences. Genomics 71: 78-88.

The_International_SNP_Map_Working_Group. 2001. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409: 928-933.

Sequencing Services Genotyping Services HapMap Project Informatics Services

Copyright 2007, Washington University School of Medicine SNP Research Facility. All rights reserved.
Legal   Contact   Site Map