Washington University School of Medicine SNP Research Facility
Google Research
FP-TDI SNP genotyping


PROOF OF PRINCIPLE: In silico Validation of Forensic AIMs using HapMap Genotypes

We selected 16 AIMs from those identified by Yang et al. that were (1) assayable by ABI TaqMan, (2) highly informative on ancestry, and (3) also genotyped by the HapMap consortium. The Celera and dbSNP identifiers of our 16 AIMs, along with their allele frequencies in 9 human populations, are listed in Table 1. We mined the HapMap Project database to obtain genotypes for the selected AIMs in 90 CEPH trios of European descent (CEU), 90 Yoruba trios from Nigeria (YRI), 45 unrelated Han Chinese from Beijing (CHB), and 45 unrelated Japanese from Tokyo (JPT).

Methods

For each sample in the HapMap panels, we computed conditional probabilities of shared ancestry with each of the 9 Yang populations. Our algorithm utilized the log 10 values of genotype frequencies in each population (assuming Hardy-Weinberg equilibrium). The sum of the log 10 values across all 16 SNP loci was calculated for each sample in all 9 populations. The higher the log sum, the less likely a sample came from that population.

Results

Unsurprisingly, the highest-probability populations of origin for YRI samples were African and African-American (Figure 1) with average log sums of 4.6 and 5.2, respectively. Log sums for YRI samples were at least two-fold higher in the other seven populations, the lowest of which was Puerto Ricans with an average log sum of 10.7. Figure 2 shows population log sums for the CEU samples; in these individuals, European-American was by far the most probable population of origin (log sum 4.8), followed by Puerto Rican (7.0), South Asian (7.3), and Mexican-American (8.5). The results for CHB and JPT samples were almost identical (Figure 3), with East Asian scoring as the highest-probability population of origin (log sum 4.48). Log sums were two-fold higher in Mexican (8.4), Mexican-American (8.9), and American Indian (9.5) populations. The last (rightmost) sample in Figure 3 (individual "NA19012" of the JPT panel) was missing data (genotype failure) for 12 of 16 SNPs, substantially reducing power to discriminate between populations.

Tables and Figures

Table 1. Allele frequencies of the 16 Taqman-assayable, ancestry-informative SNPs selected from Yang et al. for in silico validation using HapMap genotypes. Frequencies reflect the first allele.
hcv_numberrs_numberallelesEUAAFRAMIEASSASAFAPRNMAMMXN
hCV2390566rs35395T/C0.070.830.970.840.660.620.450.560.67
hCV12085816rs2715883A/G0.740.060.040.020.230.230.430.360.27
hCV1250137rs1978240G/T0.720.030.510.270.250.180.490.550.56
hCV13880rs762656A/G0.820.270.100.170.390.350.590.360.30
hCV2908190rs1426654A/G1.000.020.050.030.820.190.590.500.38
hCV1645496rs260714C/T0.880.140.020.060.820.310.570.390.33
hCV1648531rs2065160A/G0.890.600.080.230.900.670.660.490.39
hCV2972093rs7453C/T0.420.710.200.960.510.620.490.370.28
hCV15829219rs2833250C/T0.970.570.990.310.860.670.860.900.93
hCV2240547rs218867C/T0.130.910.910.230.230.800.460.530.72
hCV2670954rs3768641G/C0.090.990.000.050.030.840.270.080.06
hCV7625251rs992864A/G0.060.930.000.010.070.820.230.080.04
hCV11446716rs1871534C/G0.010.950.030.010.000.840.200.070.02
hCV3239774rs2165139A/T0.890.980.040.130.720.960.810.520.32
hCV1858838rs1951936A/T0.850.290.060.060.610.410.650.480.37
hCV11713156rs2065982C/T0.060.090.810.700.240.100.170.470.57
Population abbreviations: EUA=European-American; AFR=African; AMI=American Indian; EAS=East Asian; SAS=South Asian; AFA=African-American; PRN=Puerto Rican; MAM=Mexican-American; MXN=Mexican

Figure 1. Population log scores for YRI samples based on their genotypes at the 16 loci.

Figure 2. Population log scores for CEU samples based on their genotypes at the 16 loci.

Figure 3. Population log scores for CHB+JPT samples based on their genotypes at the 16 loci.

References

Yang, N., et al., Examination of ancestry and ethnic affiliation using highly informative diallelic DNA markers: application to diverse and admixed populations and implications for clinical epidemiology and forensic medicine. Hum Genet, 2005. 118(3-4): p. 382-92.

Sequencing Services Genotyping Services HapMap Project Informatics Services

Copyright 2007, Washington University School of Medicine SNP Research Facility. All rights reserved.
Legal   Contact   Site Map