PROOF OF PRINCIPLE: In silico Validation of Forensic AIMs using HapMap Genotypes
We selected 16 AIMs from those identified by Yang et al. that were (1) assayable by ABI TaqMan, (2) highly informative on ancestry, and
(3) also genotyped by the HapMap consortium. The Celera and dbSNP identifiers of our 16 AIMs, along with their allele frequencies in 9 human populations,
are listed in Table 1. We mined the HapMap Project database to obtain genotypes for the selected AIMs in 90 CEPH trios of European descent (CEU), 90
Yoruba trios from Nigeria (YRI), 45 unrelated Han Chinese from Beijing (CHB), and 45 unrelated Japanese from Tokyo (JPT).
Methods
For each sample in the HapMap panels, we computed conditional probabilities of shared ancestry with each of the 9 Yang populations. Our algorithm utilized
the log 10 values of genotype frequencies in each population (assuming Hardy-Weinberg equilibrium). The sum of the
log 10 values across all 16 SNP loci was calculated
for each sample in all 9 populations.
The higher the log sum, the less likely a sample came from that population.
Results
Unsurprisingly, the highest-probability populations of origin for YRI samples were African and African-American (Figure 1)
with average log sums of 4.6 and 5.2, respectively. Log sums for YRI samples were at least two-fold higher in the other
seven populations, the lowest of which was Puerto Ricans with an average log sum of 10.7.
Figure 2 shows population log sums for the CEU samples; in these individuals, European-American was by far the most probable population
of origin (log sum 4.8), followed by Puerto Rican (7.0), South Asian (7.3), and Mexican-American (8.5).
The results for CHB and JPT samples were almost identical (Figure 3), with East Asian scoring as the highest-probability population of origin (log sum 4.48).
Log sums were two-fold higher in Mexican (8.4), Mexican-American (8.9), and American Indian (9.5) populations. The last (rightmost) sample in Figure 3
(individual "NA19012" of the JPT panel) was missing data (genotype failure) for 12 of 16 SNPs, substantially reducing power to discriminate between populations.
Tables and Figures
|
Table 1.
Allele frequencies of the 16 Taqman-assayable, ancestry-informative SNPs selected from Yang et al.
for in silico validation using HapMap genotypes.
Frequencies reflect the first allele.
|
| hcv_number | rs_number | alleles | EUA | AFR | AMI | EAS | SAS | AFA | PRN | MAM | MXN |
| hCV2390566 | rs35395 | T/C | 0.07 | 0.83 | 0.97 | 0.84 | 0.66 | 0.62 | 0.45 | 0.56 | 0.67 |
| hCV12085816 | rs2715883 | A/G | 0.74 | 0.06 | 0.04 | 0.02 | 0.23 | 0.23 | 0.43 | 0.36 | 0.27 |
| hCV1250137 | rs1978240 | G/T | 0.72 | 0.03 | 0.51 | 0.27 | 0.25 | 0.18 | 0.49 | 0.55 | 0.56 |
| hCV13880 | rs762656 | A/G | 0.82 | 0.27 | 0.10 | 0.17 | 0.39 | 0.35 | 0.59 | 0.36 | 0.30 |
| hCV2908190 | rs1426654 | A/G | 1.00 | 0.02 | 0.05 | 0.03 | 0.82 | 0.19 | 0.59 | 0.50 | 0.38 |
| hCV1645496 | rs260714 | C/T | 0.88 | 0.14 | 0.02 | 0.06 | 0.82 | 0.31 | 0.57 | 0.39 | 0.33 |
| hCV1648531 | rs2065160 | A/G | 0.89 | 0.60 | 0.08 | 0.23 | 0.90 | 0.67 | 0.66 | 0.49 | 0.39 |
| hCV2972093 | rs7453 | C/T | 0.42 | 0.71 | 0.20 | 0.96 | 0.51 | 0.62 | 0.49 | 0.37 | 0.28 |
| hCV15829219 | rs2833250 | C/T | 0.97 | 0.57 | 0.99 | 0.31 | 0.86 | 0.67 | 0.86 | 0.90 | 0.93 |
| hCV2240547 | rs218867 | C/T | 0.13 | 0.91 | 0.91 | 0.23 | 0.23 | 0.80 | 0.46 | 0.53 | 0.72 |
| hCV2670954 | rs3768641 | G/C | 0.09 | 0.99 | 0.00 | 0.05 | 0.03 | 0.84 | 0.27 | 0.08 | 0.06 |
| hCV7625251 | rs992864 | A/G | 0.06 | 0.93 | 0.00 | 0.01 | 0.07 | 0.82 | 0.23 | 0.08 | 0.04 |
| hCV11446716 | rs1871534 | C/G | 0.01 | 0.95 | 0.03 | 0.01 | 0.00 | 0.84 | 0.20 | 0.07 | 0.02 |
| hCV3239774 | rs2165139 | A/T | 0.89 | 0.98 | 0.04 | 0.13 | 0.72 | 0.96 | 0.81 | 0.52 | 0.32 |
| hCV1858838 | rs1951936 | A/T | 0.85 | 0.29 | 0.06 | 0.06 | 0.61 | 0.41 | 0.65 | 0.48 | 0.37 |
| hCV11713156 | rs2065982 | C/T | 0.06 | 0.09 | 0.81 | 0.70 | 0.24 | 0.10 | 0.17 | 0.47 | 0.57 |
|
Population abbreviations: EUA=European-American; AFR=African; AMI=American Indian; EAS=East Asian; SAS=South Asian; AFA=African-American; PRN=Puerto Rican; MAM=Mexican-American; MXN=Mexican
|
|
Figure 1.
Population log scores for YRI samples based on their genotypes at the 16 loci.
|
 |
|
Figure 2.
Population log scores for CEU samples based on their genotypes at the 16 loci.
|
 |
|
Figure 3.
Population log scores for CHB+JPT samples based on their genotypes at the 16 loci.
|
 |
References
Yang, N., et al., Examination of ancestry and ethnic affiliation using highly informative diallelic DNA markers: application to diverse and admixed populations and implications for clinical epidemiology and forensic medicine. Hum Genet, 2005. 118(3-4): p. 382-92.
|