Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.
ann441j,Ba hahhahahah thats good.
Chris, its where Elmer fud and daffy duck are both private eyes they go to one of the local bars to investagate the whereabouts of the slopsterSlasher.Its a good one.
Its where Daffy duck is playing the Private Eye.
Hay Chris where do you think i got the name SlopsterSlasher ? From the Bugs Bunny show didn't you know
The specific objectives of this proposed collection is to establish a freely-shared case-control collection of DNA samples as a resource for studying genetic predictors of adverse drug reactions. Identifying genetic variants that influence susceptibility to adverse reactions will advance understanding of the molecular basis of adverse drug reactions and may also lead to the development of tests that can predict individual susceptibility to adverse reactions, with obvious benefits to human health. This study has received infrastructure funding for 3 years (starting Jan 03) from the EC 5th Framework Quality of Life Program
jcryan19,ha, not interested in beating stocky already have long,long ago.LOL But you see in there where they wont start trials till 2003 and its slated for the next 3 years that put it completed in 2006. Didn't Tony say something about 2006 and 2007 in the wall street interview.Could there be a tie in, Maybe.
jcryan19, Ya, your right, sorry. I though the info. on London was a good one but as you say were all falling a sleep catching up, but I wanted to set the back drop for the last 2 posts because by going back on the other posts you can see who's playing patty fingures with who.
That's not the point, to me the point is what is Dnap doing now and do they have something that works,will the public buy their product will they make us money,who's involved,whats in volved.
well if no cares i guess its old news sorry i wont post anymore of this old news. Got to go.
Here is the other site {these men are tied in with Dr. mark Shriver} - http://www.lshtm.ac.uk/eu/genetics/admix.html
Here is the site - http://www.lshtm.ac.uk/eu/genetics/index.html#admix
My last 2 posts are from the University of London.
ADMIXMAP - a program to model admixture using marker genotype data
Description of ADMIXMAP
Applications of ADMIXMAP
Running ADMIXMAP
Options specified by the user
Input files
Output files
Future enhancements of ADMIXMAP
Contact
--------------------------------------------------------------------------------
Description of program
ADMIXMAP is designed to analyse datasets that consist of trait measurements and genotype data on a sample of individuals from an admixed or stratified population. Although the name of the program reflects its origins as a program designed for admixture mapping, it has wider uses, especially in genetic association studies. The study design can be a cross-sectional survey of a quantitative trait or binary outcome, a case-control study or a cohort study. For admixture to be modelled efficiently, at least some of the loci typed should be "ancestry-informative markers": markers chosen to have large allele frequency differentials between the ancestral subpopulations that underwent admixture. The program can deal with any number of ancestral subpopulations and any number of linked marker loci. In its present version, the program handles only data from samples of unrelated individuals.
The program is based on a hybrid of Bayesian and classical approaches. A Bayesian full probability model is specified, assigning vague prior distributions to parameters for the distribution of admixture in the population and the stochastic variation of ancestry along hybrid chromosomes. The posterior distribution of all unobserved variables given the observed genotype and trait data, is generated by Markov chain Monte Carlo simulation. These unobserved variables include the ancestry at each locus and the ancestry-specific allele frequencies at each locus. For a description of the theory underlying this approach, see the following papers:-
Hoggart, C.J., Parra, E.J., Shriver, M.D., Bonilla, C., Kittles, R.A., Clayton, D.G. and McKeigue, P.M. Control of confounding of genetic associations in stratified populations. Am J Hum Genet. 2003; 721492-1504.
McKeigue, P.M., Carpenter, J., Parra, E.J., Shriver, M.D.. Estimation of admixture and detection of linkage in admixed populations by a Bayesian approach: application to African-American populations. Annals of Human Genetics 2000;64: 171-86.
Applications of the program
1. Modelling the dependence of a disease or quantitative trait upon individual admixture
For a binary trait, such as presence of disease, the program fits a logistic regression model of the trait upon individual admixture, mean of admixture proportions of both parents. For a continuous trait, such as skin pigmentation, the program fits a linear regression model of the trait value on individual admixture. Covariates such as age, sex and socioeconomic status can be included in the regression model. The program output includes posterior means and 95% credible intervals for the regression coefficients. Alternatively, the program can be used to test a null hypothesis of no association of disease risk or trait level effect with individual admixture as described below.
2. Controlling for confounding of genetic associations in stratified populations
For more details of this application, see Hoggart et al (2003). The program calculates a score test for association of the disease or trait with alleles or haplotypes at each locus, adjusting for individual admixture and other covariates in a regression model. Where there is evidence for association of a trait with individual admixture, the posterior distribution of the regression coefficient can be estimated in a further analysis. For this application, the dataset should include at least 30 markers informative for ancestry.
3. Admixture mapping: localizing genes that underlie ethnic differences in disease risk.
Where differences in disease risk have a genetic basis, testing for association of the disease with locus ancestry by conditioning on parental admixture can localize genes underlying these differences. This approach is an extension of the principles underlying linkage analysis of an experimental cross. To exploit the full power of admixture mapping, 1000 or more markers informative for ancestry across the genome will be required.
4. Detecting population stratification and identifying admixed individuals
Where no information about the demographic background of the population under study is available, ADMIXMAP can be used to test for population stratification, to determine how many subpopulations are required to model this stratification, and to identify admixed individuals. This is useful when assembling panels of unadmixed individuals to be used for estimating allele frequencies. We emphasize that when the program is run without supplying prior information about allele frequencies in each subpopulation, the subpopulations are not identifiable in the model. Thus inference should be based only on the posterior distribution of variables that are unaffected by permuting the labels of the subpopulations.
5. Testing for associations of a trait with haplotypes and estimating haplotype frequencies from a sample of unrelated individuals
Where two or more loci in the same gene have been typed, ADMIXMAP will model the unobserved haplotypes, conditional on the observed unordered genotypes. Score tests for association of haplotypes with the trait can be obtained, and samples from the posterior distribution of haplotype frequencies can be obtained. This application of the program is not limited to admixed or stratified populations: for a population that is not stratified, the user can simply specify the option populations=1
For each of these applications, score tests of the appropriate null hypotheses are built into the program.
--------------------------------------------------------------------------------
Modelling admixture and trait values
Any run of two or more loci spanning less than 0.1 cM is modelled as a single "compound locus". Thus if L “simple loci” (SNPs, insertion/deletion polymorphisms or microsatellites) have been typed, and three of these simple loci are in the same gene, the model will have L – 2 compound loci. The program assumes that on any gamete, the ancestry state is the same at all loci within a compound locus. The program allows for allelic association within any compound locus that contains two or more simple loci, and models the unobserved haplotypes at this compound locus.
For each parent of each individual, admixture proportions are defined by a vector with k co-ordinates, where k is the number of ancestral subpopulations that contributed to the admixed population under study. For instance, in a Caribbean population it may be possible to model the gene pool of the admixed population as a mixture of three subpopulations: African, European and Native American. The model of admixture is described by the following hierarchy:-
The population distribution from which the parental admixture proportions are drawn is modelled by a Dirichlet distribution.
The allele or haplotype frequencies at each compound locus are modelled by a Dirichlet distribution, with prior parameters specified by the user.
Locus ancestry is modelled by a multinomial distribution, with cell probabilities specified by the admixture of both parents.
The probabilities of observing each allele or haplotype at each locus on each gamete, given the ancestry of the gamete at that locus, are modelled by a multinomial distribution, with parameters given by the ancestry-specific allele (or haplotype) frequencies.
The stochastic variation of ancestral states along the chromosomes transmitted from each parent is modelled as a mixture of independent Poisson arrival processes with intensities a, b, g per Morgan (for three-way admixture). For given values of parental admixture, it is only necessary to specify a single parameter for the sum of intensities s = a + b + g .
If an outcome variable is supplied, ADMIXMAP fits a regression model (logistic regression for a binary trait, linear regression for a quantitative trait) with individual admixture proportions and any covariates supplied by the user as explanatory variables.
Modelling allele/haplotype frequencies
The program can be run with ancestry-specific allele / haplotype frequencies specified either as fixed or as random variables. If option allelefreqfile is specified, the allele frequencies are specified as fixed at the values supplied. If option populations is specified, the allele frequencies are specified as random variables with reference (uninformative) prior distributions. If option priorallelefreqsfile is specified, the allele frequencies are specified as random variables with a prior distribution given by the values in this file. This option is used where allele frequencies have been estimated from samples of unadmixed modern descendants of the ancestral subpopulations that contributed to the admixed population under study. For instance, in a study of a population of mixed European and west African ancestry, allele frequencies at some or all of the loci typed may have been estimated in samples from modern unadmixed west African and European populations. The program will use this information to estimate the ancestry-specific allele frequencies from the unadmixed and admixed population samples simultaneously, allowing for sampling error.
If no information about allele frequencies in the ancestral subpopulations is provided, the ancestry-specific allele frequencies are estimated only from the admixed population under study. If no information about allele frequencies is provided at any locus, the subpopulations are not identifiable in the model. This does not matter when the program is used only to control for confounding by hidden population stratification, as described in Hoggart et al. (2003).
The file priorallelefreqfile specifies the parameters of a Dirichlet prior distribution for the allele frequencies at each locus in each subpopulation. Where the alleles or haplotypes have been counted directly in samples from unadmixed modern descendants, these parameter values should be specified by adding 0.5 to the observed counts of each allele or haplotype in each subpopulation. These parameter values specify the Dirichlet posterior distribution that we would obtain by combining a reference prior with the observed counts. Using this as a prior distribution when analysing data from the admixed population is equivalent to estimating the allele frequencies simultaneously from the admixed and unadmixed population samples, with a reference prior.
For compound loci, where haplotype frequencies have been estimated from unordered genotypes rather than by counting phase-known gametes, the user cannot specify the prior distribution by adding 0.5 to the observed counts of each haplotype in a sample of unadmixed modern descendants. Instead, the posterior distribution of haplotype frequencies in the unadmixed population, given a reference prior and the observed unordered genotyped data, should be computed. This can be implemented by providing the genotype data from each unadmixed subpopulation that has been sampled as input to ADMIXMAP, specifying options populations=1 and allelefreqoutputfile to sample the posterior distribution of the haplotype frequencies. ADMIXMAP computes the Dirichlet distribution that most closely approximates the posterior distribution (from the sampled values) and writes the parameter estimates to the file AlleleFreqPosteriorParams.txt. These parameter estimates can then be entered into the file priorallelefreqfile. The parameters are estimated by equating the means and the determinants of the covariance matrix of a Dirichlet distribution with the posterior means and determinant of the posterior covariance matrix of the allele frequencies.
With options allelefreqfile or priorallelefreqfile, the program fits a model in which the allele frequencies in modern unadmixed descendants of the ancestral subpopulations are identical to the corresponding ancestry-specific allele frequencies in the admixed population under study. The option dispersiontestfile will generate a diagnostic test of this assumption.
With option historicallelefreqfile, the program fits a more general model in which there is dispersion of allele frequencies between the unadmixed and admixed populations.
Inference
ADMIXMAP exploits two approaches to make statistical inference:-
(1) Hypothesis testing. Departures from the model fit by ADMIXMAP are tested against alternatives by using score tests. For a description of the theory underlying this approach, see Hoggart et al. (2003). Several score tests are built into the program, and are described below. Additional score tests can be constructed by the user.
(2) Bayesian inference can be made on parameters included in the model, such as the effect of individual admixture on the trait. from the posterior samples generated by ADMIXMAP.
Comparison with other programs for modelling admixture
The program STRUCTURE (available from http://pritch.bsd.uchicago.edu/) fits a similar hierarchical model for population admixture, given genotype data on admixed and unadmixed individuals, if you specify the “popalphas” option (see documentation for this program at http://pritch.bsd.uchicago.edu/software/readme_structure2.pdf).
The main differences between ADMIXMAP and STRUCTURE are:-
STRUCTURE does not model the dependence of the outcome variable on individual admixture and thus cannot adjust for the effect of individual admixture on the outcome variable.
STRUCTURE does not allow the user to supply prior distributions for the allele frequencies. To use allele frequency data from unadmixed individuals in STRUCTURE, individuals sampled from unadmixed and admixed populations have to be included in the same model. This is not recommended, either with STRUCTURE or ADMIXMAP, because the model assumes a unimodal distribution of individual admixture values in the population. If samples from both unadmixed and admixed populations are included in the same model, the distribution of individual admixture values will generally not be unimodal, and the fit of the model will be poor.
STRUCTURE does not allow for allelic association (other than that generated by admixture), and is therefore unsuitable for analysis of datasets in which two or more tightly-linked loci (for instance SNPs in the same gene) have been typed. ADMIXMAP allows for allelic association (if the distance between loci is coded as zero) and models the unobserved haplotypes.
--------------------------------------------------------------------------------
Running the program
ADMIXMAP runs under Linux and Windows and requires R to be installed. The Windows version comes with an interface which allows the user to specify the location of the input files and gives default names to the output files. The Linux version runs from a Perl script, see for example admixmap.pl provided with the tutorial, the user options can be changed by editing this file. The working folder from which the Perl script is run should have a sub-folder named “data” containing the data files. Results will be output to a folder named “results” one level below the current folder. At the end of the simulation summary statistics and graphs are automatically generated by R.
--------------------------------------------------------------------------------
Options specified by the user
The program requires a list of options to be specified by the user as command-line arguments. This is most conveniently done using the Perl script (“admixmap.pl”) provided. A list of these arguments is given in the following table.
Option name, command-line argument. Required arguments in bold.
Settings
samples
Integer specifying total number of iterations of the Markov chain, including burn-in. A run of at least 20 000 iterations is recommended for inference.
burnin
Integer specifying number of iterations for burn-in of the Markov chain, before posterior samples are output. A burn-in of at least 1000 iterations is recommended for inference.
every
Integer specifying the “thinning” of samples from the posterior distribution that are written to the output files, after the burn-in period. For example, if every=10, sampled values are written to the output files every 10 iterations We recommend using a value of 10 to keep down the size of the output files. Sampling more frequently than this does not much improve the precision of results, because successive draws are not independent.
locusfortest
Integer specifying a composite locus at which posterior samples of ancestry and haplotypes are written to indadmixturefile. This option allows users to construct their own score tests (for instance to test a specific hypothesis about the effect of a pair of haplotypes). The value is specified as an offset from 1: thus to test locus 1, specify locusfortest=0.
logfile
Pathname of log file written by the program.
The program requires one of the following four options, any one of which specifies the number of subpopulations in the model: populations, allelefreqfile, priorallelefreqfile, or historicallelefreqfile. These options are mutually exclusive
populations
Integer specifying number of subpopulations that have contributed to the admixed population under study. If specified as 1, the program fits a model based on a single homogeneous population. This option is not required if information about allele frequencies is supplied in allelefreqfile, priorallelefreqfile, or historicallelefreqfile, as the number of columns in any of these files defines the number of subpopulations in the model.
allelefreqfile
Pathname of file containing the allele frequencies of the genotyped loci for each subpopulation. When this option is specified, the model treats the allele frequencies as fixed constants. This option is not recommended unless the sample sizes from which the allele frequencies have been estimated are very large, so that sampling variation can be ignored.
priorallelefreqfile
Pathname of file containing parameters of the prior distributions for allele frequencies (or haplotype frequencies) at each compound locus in each subpopulation. Where allele frequencies have been estimated from a sample of unadmixed individuals, the prior distribution parameters for the corresponding subpopulation should be specified as the observed allele counts plus 0.5. Where no allele frequency data are available, specify the prior parameters as 0.5 for each allele (“reference” prior). When this option is specified, the program fits a model in which the allele frequencies in each subpopulation are estimated simultaneously from the unadmixed samples and the admixed sample under study
historicallelefreqfile
Pathname of file containing observed allele counts at the genotyped loci from samples of unadmixed individuals in each subpopulation. When this option is specified, the program fits a model that allows the “historic” allele frequencies in the unadmixed population to vary from the corresponding ancestry-specific allele frequencies in the admixed population under study
genotypesfile, outcomevarfile, locusfile, covariatesfile
Pathnames of other input files: details of file formats below under Input files
targetindicator
Integer specifying column in outcomevarfile that contains the outcome variable to be modelled. This column number should be specified as an offset from column 1: thus to select the variable in column 1, specify targetindicator=0
(default is 0).
analysistypeindicator
Integer specified as one of the following:-
0 – Affected-only design
1 – No outcome variable
2 - Continuous outcome variable (quantitative trait). The program fits a linear regression model unless option admixturescorefile is specified.
3 - Binary outcome variable (e.g. unaffected / affected, coded as 0 / 1). The program fits a logistic regression model unless option admixturescorefile is specified. This option is appropriate for a cross-sectional study of a binary trait, a case-control study, or a cohort study in which individuals have been classified by disease status only once at follow-up.
5 – Two outcome variables, assumed to be independent given individual admixture proportions and covariates.
paramfile, ergodicaveragefile indadmixturefile, allelefreqoutputfile, admixturescorefile, allelicassociationscorefile, ancestryassociationscorefile, affectedsonlyscorefile, allelefreqscorefile, stratificationtestfile, dispersiontestfile, haplotypeassociationscorefile.
Pathnames of output files, details of file formats in Output files.
admixturescorefile
Pathname of file to which results of a score test for the association of the trait with individual admixture will be written. This option is valid only for a study in which an outcome variable has been measured. This option is used only to obtain a formal test of the null hypothesis of no association between the trait and individual admixture. If admixturescorefile is specified, the regression model will not include individual admixture proportions as explanatory variables, and tests for allelic association or linkage will not be adjusted for the effect of individual admixture.
Unless option admixturescorefile is specified or analysistypeindicator is specified as 0 or 1, the program will fit a regression model with the outcome variable as dependent variable and individual admixture proportions (plus any covariates specified in inputfile) as explanatory variables.
The options below specify additional tests or output,but do not change the model itself
allelicassociationscorefile
Pathname of output file containing score tests for association of the outcome variable with alleles at each simple locus, adjusting for individual admixture.
haplotypeassociationscorefile
Pathname of output file containing score tests for association of the outcome variable with haplotypes for all compound loci containing haplotypes, adjusting for individual admixture.
ancestryassociationscorefile
Pathname of output file containing score tests at each compound locus for linkage with genes underlying ethnic variation in the trait. This is a test for association of the trait with locus ancestry, adjusting for individual admixture and covariates. This test should be used in a cross-sectional or cohort study design. For a case-control study of a rare disease, the affected-only test below has greater statistical power.
affectedonlyscorefile
Pathname of output file containing score tests at each compound locus for linkage with ancestry, based on comparing the observed and expected proportions of gene copies at this locus that have ancestry from each subpopulation. This test is calculated from affected individuals only: individuals are their own controls. Even when the sample includes both cases and controls, this test is more powerful than the regression model score test in ancestryassociationscorefile if the disease is rare.
allelefreqscorefile
Pathname of output file containing score tests of mis-specified ancestry specific allele frequencies. This option is valid only when the allele frequencies are fixed, i.e. when option allelefreqfile is specified.
allelefreqoutputfile
Pathname of output file containing samples from the posterior distribution of ancestry-specific allele frequencies. Valid only when the allele frequencies are specified as random variables, i.e. when one of the two options priorallelefreqfile or historicallelefreqfile is specified.
stratificationtestfile
Pathname of output file containing test for residual population stratification (stratification not accounted for by the fitted model).
dispersiontestfile
Pathname of output file containing test for dispersion of allele frequencies between the unadmixed populations sampled and the corresponding ancestry-specific allele frequencies in the admixed population under study. This is evaluated for each subpopulation at each locus, and as a global test over all loci. This option is valid only if option priorallelefreqfile is specified. The results are “Bayesian p-values”, as above.
FSToutputfile
Pathname of output file containing posterior samples of Wright’s FST for each locus and each subpopulation. This option is valid only if option historicallelefreqfile is specified. For each subpopulation and each locus, the FST statistic measures the variation between the allele frequencies in the “historic” population (the unadmixed population that was sampled to obtain the allele counts given in the file historicallelefreqfile) and the corresponding ancestry-specific allele frequencies in the admixed population under study.
--------------------------------------------------------------------------------
Input files
Input files should contain no tabs, spaces only as single spaces, no spaces at the beginning or end of a line, and no blank lines (including blank lines at the end of the file).
--------------------------------------------------------------------------------
genotypesfile
The first row of the file is a header row listing locus names, enclosed in quotes and separated by spaces. Locus names should be exactly the same as in the file locusfile. Loci must be ordered by their map positions on the genome. Each subsequent row contains genotype data for a single individual. Each line contains the individual ID, followed by observed genotypes at each locus. Genotypes are coded as strings enclosed in quotes, separated by a single space. Each string consists of two numbers separated by a comma, with no spaces. Where there are a alleles at a locus, the alleles should be coded as numbers from 1 to a. Missing genotypes are coded as empty strings (“”).
For compatibility with existing datasets, we plan to change this file format to one that is similar to the PEDFILE format used with LINKAGE.
--------------------------------------------------------------------------------
locusfile
File contains information about each simple locus: that is, each locus that is typed. The first row of the file is ignored by the program, and can be used as a header. Each subsequent row contains values of three variables: locus name, number of alleles at this locus, and genetic map distance in Morgans between this locus and the previous locus. Loci must be ordered by their map positions on the genome. Locus names should begin with a letter, and contain only alphanumeric characters (no spaces, dots or hyphens). If the previous locus is unlinked, the genetic map distance in Morgans should be coded as 100. For two or more loci that are so close together that they should be analysed as a single compound locus (as with DRD2Bcl and DRD2Taqd) in the example below, map distance should be coded as 0.
--------------------------------------------------------------------------------
allelefreqfile
This file contains the ancestry-specific allele frequencies at each compound locus in each ancestral subpopulation. The first row contains headers in quotes, separated by spaces. The first string in this row is ignored. Subsequent strings in the first row specify the names of the ancestral subpopulations contributing to the admixed population under study. Subsequent rows specify the ancestry-specific allele frequencies (usually estimated by from sampling modern descendants of the subpopulations that underwent admixture. The first column in each row gives the name of the compound locus, in quotes
For biallelic loci, only the frequency of allele 1 in each population is specified. For each locus with k alleles, there are k - 1 rows specifying frequencies of alleles 1 to k - 1.
Where two or more loci are to be analysed as a single haplotype, the ancestry-specific frequency of each haplotype must be specified. Thus in the example files below, there are two SNPs in the DRD2 gene, giving four possible haplotypes) and four lines specifying the ancestry-specific frequencies of haplotypes 11, 12, 21, 22. The loci in the haplotype are ordered by their map position on the genome, and the haplotypes are ordered by incrementing a counter from right to left. For instance if there were three loci A, B, C, with 4, 2 and 3 alleles the haplotypes would be listed in the following order: 111, 112, 113, 121, 122, 123, 211, …., 422, 423.
Note: we plan to change the format of this file to make it compatible with the format used in historicallelefreqfile
--------------------------------------------------------------------------------
priorallelefreqfile
This file contains parameter values for the Dirichlet prior distribution of the allele or haplotype frequencies at each compound locus in each subpopulation. At each compound locus with k alleles or k possible haplotypes, a Dirichlet prior distribution is specified by a vector of k positive numbers. Where these alleles or haplotypes have been counted directly in samples from an unadmixed subpopulation, the parameter values should be specified as 0.5 plus the observed counts of each allele. Where no information is available about allele or haplotype frequencies at a compound locus in a subpopulation, or no copies of the allele have been observed in the sample from that subpopulation, specify 0.5 in the corresponding cells. Specifying 0.5 in all cells, with columns for b subpopulations, is equivalent to specifying the option populations = b
Where haplotype frequencies at a compound locus have been estimated from unordered genotypes, the user should supply the parameters of the Dirichlet distribution that most closely approximates the posterior distribution of haplotype frequencies given the observed genotypes and a reference prior, as described above. The first row is a header row, consisting of strings in quotes, separated by spaces. The first string in this row is ignored, and the subsequent strings specify the names of the ancestral subpopulations contributing to the admixed population).
After the header row, there is one row for each allele (or haplotype) at each compound locus. The first column in each row gives the name of the compound locus in quotes. Subsequent columns give the prior parameters for the frequency of the allele (or haplotype) in each subpopulation, separated by a single space.
If the compound locus consists of two or more simple loci, (see notes above), the rows list prior parameters for the haplotypes in the order defined by incrementing a counter from right to left. For instance if there were three loci A, B, C, with 4, 2 and 3 alleles the haplotypes would be listed in the following order: 1-1-1, 1-1-2, 1-1-3, 1-2-1, 1-2-2, 1-2-3, 2-1-1, …., 4-2-2, 4-2-3. Estimated counts should be given for all possible haplotypes, however rare: the program will include all possible haplotypes in the model, but will omit rare haplotypes when constructing test statistics.
--------------------------------------------------------------------------------
historicallelefreqfile
This file contains observed counts of alleles or haplotypes at each compound locus in samples from unadmixed subpopulations. The format of this file is exactly the same as the format of priorallelefreqfile described above. The only difference between the two files is that in historicallelefreqfile 0.5 is not added to the observed counts.
--------------------------------------------------------------------------------
outcomevarfile
This file is required if analysistypeindicator has been specified as 2 or 3. The file contains values of one or more outcome variables. After the header row, the file has one row per individual. Binary variables should be coded as 1 = affected, 0 = unaffected. The header row contains the variable labels in quotes separated by spaces. A target file is required if analysistypeindicator has been specified as 2 (cross-sectional study with continuous outcome) or 3 (binary outcome). If the file contains more than one outcome variable, the column containing the variable of interest should be specified by the command-line option targetindicator.
--------------------------------------------------------------------------------
covariatesfile
This file contains values of covariates to be included in the regression model: it is used only if analysistypeindicator has been specified as 2 or 3, and is optional even then. The header row contains covariate names in quotes, separated by spaces. Subsequent rows contain the observed values of these variables. For computational reasons, the values of the covariates should be centred about their sample means.
--------------------------------------------------------------------------------
Output files
paramfile – Posterior draws of population-level variables, at intervals determined by option every. After the header line, each line contains one draw of the following variables: -
Parameters of the Dirichlet distribution for parental admixture: one for each subpopulation
Sum of intensities for the stochastic process of transitions of ancestry on hybrid chromosomes
Intercept and slope parameters of the regression model
Precision (the inverse of the residual variance) in the regression model.
Allele frequency dispersion parameters: one for each subpopulation.
These dispersion parameters are written only if option historicallelefreqfile has been specified.
Median and 95% credible intervals for these parameters are written to the file PosteriorQuantiles.txt.
indadmixturefile – Posterior draws of individual-level variables, at intervals determined by option every. After the header line, each line contains one draw of the following variables:
individual admixture proportions: one number for each subpopulation
predicted value of the outcome variable in the regression model
ancestry of the paternal and maternal gametes (coded as 0, 1, 2, … for subpopulations 1, 2, 3, …)
at the compound locus specified by the option locusfortest.
paternal and maternal haplotypes at this locus.
This file is formatted to be read into R as a three-way array (indexed by variables, individuals, draws).
allelefreqoutputfile – Posterior draws of the ancestry-specific allele or haplotype frequencies for each state of ancestry at each compound locus, at intervals determined by option every. These results can be used to compute new parameters for the prior distributions specified in priorallelefreqfile which can be used in subsequent studies with independent samples
ergodicaveragefile – Cumulative posterior means over all iterations (“ergodic averages”) for the variables in paramfile, output at intervals of 10 ´ every iterations. Monitoring these ergodic averages allows the user to determine whether the sampler has been run long enough for the posterior means to have been estimated accurately.
The output files admixturescorefile, allelicassociationscorefile, ancestryassociationscorefile, affectedsonlyscorefile contain results of score tests obtained by averaging over the posterior distribution. Each table of score test results, based on cumulative averages for the score and information over all posterior samples obtained after the burn-in period, is output at intervals of 10 × every. Monitoring these repeated updates allows the user to determine when the sampler has been run long enough for the test results to be computed accurately. For inference, only the last table in the output file, which is based on the entire posterior sample, is used. The file allelicassociationscorefile is formatted to be read into R as a three-way array (indexed by loci, test statistics, output number.
For univariate null hypotheses (testing the effect of one allele, one haplotype, or one subpopulation against all others) the test statistic is the score divided by the square root of the observed information, which has a standard normal distribution under the null hypothesis. The percent of information extracted (the ratio of observed information to complete information) measures the information obtained about the parameter under test, in comparison the information that would be obtained if individual admixture, haplotypes at each locus, and gamete ancestry at each locus were measured without error.
For composite null hypotheses, the score U is a vector, the observed information V is a matrix, and the test statistic (UV-1U/) has a chi-squared distribution under the null hypothesis.
admixturescorefile – test for association of trait with individual admixture. The null hypothesis is no effect of individual admixture in a regression model, with covariates as explanatory variables if specified. The test statistic is computed for the effect of each subpopulation separately, with a summary chi-square test over all subpopulations if there are more than two subpopulations.
allelicassociationscorefile – tests for allelic association at each locus. The null hypothesis is no effect of the alleles or haplotypes in a regression analysis with individual admixture (and covariates if specified) as explanatory variables. The test statistic is computed for each allele or haplotype separately, with a summary chi-square statistic over all alleles or haplotypes at each locus if there are more than two alleles or haplotypes. Rare alleles or haplotypes are grouped together.
This test is appropriate when testing for association of the trait with alleles or haplotypes in a candidate gene.
ancestryassociationscorefile – tests for linkage of each locus with genes underlying ethnic variation in disease risk or trait values. This is a test for association of the trait with ancestry at each compound locus, conditional on parental admixture. The null hypothesis is no effect of locus ancestry in a regression analysis with individual admixture (and covariates if specified) as explanatory variables. The test statistic is computed for the effect of each subpopulation separately, with a summary chi-square statistic over all subpopulations at each locus if there are more than two subpopulations. The proportion of information extracted depends upon the information content for ancestry of the marker locus and other nearby loci. This test is appropriate when the objective of the study is to exploit admixture to localize genes underlying ethnic variation in the trait value, using ancestry-informative markers rather than candidate gene polymorphisms.
affectedsonlyscorefile – tests for linkage of each locus with genes underlying the ethnic difference in disease risk, using only the affected individuals. The null hypothesis is that the risk ratio between populations that the locus accounts for is 1. This test statistic is computed for the effect of each subpopulation at each locus. The test compares at each locus the observed and expected proportion of gene copies that have ancestry from the high-risk subpopulation. This is the only test that can be used if the sample consists only of affected individuals. Even if a control group has been typed, for a rare disease the affected-only test is more efficient than the test given in ancestryassociationscorefile based on a regression model. This is because for a rare disease, the observed and expected proportion of gene copies that have ancestry from the high-risk subpopulation will not differ by very much in unaffected individuals.
allelefreqscorefile - tests for mis-specification of ancestry-specific allele frequencies.
This test is computed only if allele frequencies have been specified as fixed with option allelefreqfile. For each compound locus and each subpopulation, a score test is computed for the null hypothesis that the frequencies of all alleles have been specified correctly. A summary test over all k subpopulations is also computed at each locus.
An R script (AdmixmapOutput.R) is supplied that processes these output files to produce tables of posterior quantiles, frequency plots of the posterior distribution, and plots of the cumulative posterior means for the variables that are output to paramfile. The R script also calculates a summary slope parameter for the effect of admixture from each subpopulation, versus the others. This R script is run automatically from the Perl script (admixmap.pl) that is supplied as a wrapper for the program.
--------------------------------------------------------------------------------
Interpretation of output from the program
These notes are based on the output produced by using the Perl script admixmap.pl to run the main program. Output files produced by the main are processed by the R script AdmixmapOutput.R. This produces several text files, and a file plots.ps containing graphs in postscript format
Evaluating the sampler
The adequacy of the burn-in period can be evaluated by the test statistics in file BurninTestStats.txt, which are based on comparing the posterior draws during the first 10% and the last 50% of the sample. If the burn-in period is adequate, the numbers in this table should have approximately a standard normal distribution.
The mixing of the MCMC sampler can be evaluated by examining the autocorrelation plots. Autocorrelation extending beyond 20 iterations (2 thinned draws if every = 10 ) indicates slow mixing.
The adequacy of the total number of iterations can be evaluated by examining a plot of the statistic of interest calculated from all iterations since the end of the burn-in period, against the iteration number. Where inference is based on the mean of a parameter, this statistic is an “ergodic average” (cumulative average) over all iterations to that point. Plots of ergodic averages of the population-level parameters are given in file Plots.ps.
Evaluating the fit of the model
The file stratificationtestfile contains results of a diagnostic test for residual population stratification that is not explained by the fitted model. For details of how this test is calculated, and a discussion of how to interpret it, see Hoggart (2003). The test is based on testing for allelic association between unlinked loci that is not explained by the model. The results is a “Bayesian p-value”: p<< 0.5 indicates lack of fit. The “Bayesian p-value” calculated by this test is more conservative than a classical p-value. Our experience has been that a test p-value of 0.3 or less is fairly strong evidence for residual stratification. Where this statistic yields evidence of lack of fit, the model should be specified with more subpopulations, unless there is some other reason for lack of fit such as mis-specified allele frequencies.
The file dispersiontestfile contains results of a diagnostic test for variation between the allele frequencies in the unadmixed populations that have been sampled to calculate the prior parameter values in priorallelefreqfile and the corresponding ancestry-specific allele frequencies in the admixed population under study. Again the results are “Bayesian p-values”, for which the deviation of the test p-value from its expected value of 0.5 does not provide an absolute measure of the strength of evidence for lack of fit. For each subpopulation, the test statistic is calculated as a summary test over all loci and for each locus separately. Examination of the test statistic for each locus may reveal errors in coding, or errors in specifying the prior allele frequencies.
The option dispersiontestfile is meaningful only where option priorallelefreqfile has been specified. Where allele frequencies have been specified as fixed, option allelefreqscorefile should be specified and the output file should be examined.
No diagnostic test for lack of fit of the distribution of individual admixture proportions to the model is yet implemented. However the plots in file Plots.ps can be examined to compare the estimated distribution of individual admixture proportions (based on the the posterior means for individual admixture) with an estimate for the distribution of individual admixture values in the population (based on the posterior means for the Dirichlet parameters of this distribution).
Future enhancements of the ADMIXMAP program
Within the next few months we plan to add the following capabilities:-
Ability to read the input file genotypesfile in SPLINK or LINKAGE formats.
A parallelized version that runs on a cluster of Linux workstations
Analysis of X-linked genotype data
Analysis of data on nuclear pedigrees
Genetic Epidemiology Group
Professor of Metabolic and Genetic Epidemiology: Paul McKeigue, PhD, FFPHM
Clinical Lecturer: Mariam Molokhia, MRCGP
Research Fellow: Clive Hoggart, PhD
Research Fellow: Nigel Wetters
Contents of this page
Current research
ADMIXMAP program
EUDRAGENE
Recent Publications
Current research
The main focus of our research is on admixture mapping. This is a novel approach to finding genes that underlie ethnic variation in disease risk, based on studying populations of mixed descent. Admixture mapping is based on the same principles as linkage analysis of an experimental cross between inbred strains. Using panels of markers that are chosen to be highly informative for ancestry, it is possible in principle to extend this approach to admixed human populations where the history of admixture is not under experimental control and the ancestral populations are not inbred strains [19]. Our work in this area is an extension of earlier work on the epidemiology of ethnic variation in risk of cardiovascular disease and diabetes.
The advantage of admixture mapping, in comparison with conventional approaches to localizing disease genes based on family linkage studies, is that in principle it has far greater power than family linkage studies to detect genes of modest effect. This is because admixture mapping is a based on a direct (fixed effects) comparison, whereas family linkage studies are based on an indirect (random effects) statistical comparison. Applying admixture mapping to search the genome requires development of statistical methods that can be applied to phenotypic traits and marker data to extract the information about genetic linkage that is generated by admixture. We have developed a first working version of ADMIXMAP, a statistical analysis program that can be used to model admixture and to test for linkage. This program is based on a Bayesian approach in which the posterior distribution of parental admixture and individual ancestry at each locus is generated by Markov chain Monte Carlo simulation [8].
Our research on admixture mapping is supported by the US National Institutes of Health, the UK Medical Research Council, the Arthritis Research Campaign, and GlaxoSmithKline. We are working closely with the Department of Anthropology, Penn State University on the application of admixture mapping to African-American populations, and on the development of marker sets for admixture mapping. We are collaborating with other researchers in the Caribbean region.
ADMIXMAP program
This is a general-purpose program for modelling admixture, using marker genotypes and trait data on a sample of individuals from an admixed population (such as African-Americans), where the markers have been chosen to have extreme differentials in allele frequencies between two or more of the ancestral populations between which admixture has occurred. The main difference between ADMIXMAP and classical programs for estimation of admixture such as ADMIX is that ADMIXMAP is based on a multilevel model for the distribution of individual admixture in the population and the stochastic variation of ancestry on hybrid chromosomes. This makes it possible to model the associations of ancestry between linked marker loci, and the association of a trait with individual admixture or with ancestry at a linked marker locus.
Possible uses of the ADMIXMAP program
Modelling the distribution of individual admixture values and the history of admixture (inferred by modelling the stochastic variation of ancestry along chromosomes).
Case-control, cross-sectional or cohort studies that test for a relationship between disease risk and individual admixture
Localizing genes underlying ethnic differences in disease risk by admixture mapping
Controlling for population structure (variation in individual admixture) in genetic association studies so as to eliminate associations with unlinked genes
Reconstructing the genetic structure of an ancestral population where unadmixed modern descendants are not available for study
ADMIXMAP can model admixture between more than two populations, and can use data from multi-allelic or biallelic marker polymorphisms. The program has been developed for application to admixed human populations, but can also be used to model admixture in livestock or for fine mapping of quantitative trait loci in outbred stocks of mice.
A manual for the program is available which describes the statistical model in more detail. Downloads of the program compiled for various platforms are also available. We recommend that before trying to run the program, you consult us first about your requirements.
Download ADMIXMAP
ADMIXMAP documentation
Download ADMIXMAP for Linux admix-1.2-linux.tar.gz
Download ADMIXMAP for Windows
ADMIXMAP tutorial for Windows
ADMIXMAP tutorial (HTML) (pdf)
EUDRAGENE: European collaboration to establish a case-control DNA collection for studying the genetic basis of adverse drug reactions
The specific objectives of this proposed collection is to establish a freely-shared case-control collection of DNA samples as a resource for studying genetic predictors of adverse drug reactions. Identifying genetic variants that influence susceptibility to adverse reactions will advance understanding of the molecular basis of adverse drug reactions and may also lead to the development of tests that can predict individual susceptibility to adverse reactions, with obvious benefits to human health. This study has received infrastructure funding for 3 years (starting Jan 03) from the EC 5th Framework Quality of Life Program.
Adverse drug reactions (ADRs) are important causes of morbidity and mortality, limit the usefulness of many otherwise effective drugs, and are under strong genetic influence. Identifying genetic variants that influence susceptibility to ADRs has obvious practical applications, and more generally will contribute to understanding of the molecular basis of adverse drug reactions. Research in this area is hampered by the lack of a resource in which to study genetic determinants of susceptibility to ADRs. As most such ADRs are rare, a case-control design is the only feasible approach, and a multicentre European collaboration is necessary as no single country will generate enough cases of any given ADR within a reasonable time.
We propose to establish a freely-shared resource consisting of clinical data and DNA samples from cases of ADRs, together with a control group. In the first year we plan to select for study an initial set of six ADRs that are important because they cause serious illness in a small minority of those exposed to drugs that are otherwise more effective than any alternative, and that are easily identified because they have distinctive manifestations that are not related to the disease for which the drug was prescribed. At least 500 cases of each ADR will be collected, together with an equal number of controls. The collection will be extended to include more ADRs after the first 1-2 years, based on problems of current concern.
Recent publications
Hoggart CJ, Parra EJ, Shriver MD, Bonilla C, Kittles RA, Clayton DG, McKeigue PM. Control of confounding of genetic associations in stratified populations. Am J Hum Genet. 2003, in press.
Colhoun HM, McKeigue PM, Davey Smith G. Problems of reporting genetic associations with complex outcomes. Lancet. 2003 Mar 8;361(9360):865-72. Review.
Shriver MD, Parra EJ, Dios S, Bonilla C, Norton H, Jovel C, Pfaff C, Jones C, Massac A, Cameron N, Baron A, Jackson T, Argyropoulos G, Jin L, Hoggart CJ, McKeigue PM, Kittles RA. Skin pigmentation, biogeographical ancestry and admixture mapping. Hum Genet. 2003 Apr;112(4):387-99.
Molokhia M, Hoggart C, Patrick AL, Shriver M, Parra E, Ye J, Silman AJ, McKeigue PM. Relation of risk of systemic lupus erythematosus to west African admixture in a Caribbean population. Hum Genet. 2003 Mar;112(3):310-8.
Reynolds RM, Chapman KE, Seckl JR, Walker BR, McKeigue PM, Lithell HO. Skeletal muscle glucocorticoid receptor density and insulin resistance. JAMA. 2002 May 15;287(19):2505-6.
Clayton D, McKeigue PM. Epidemiological methods for studying genes and environmental factors in complex diseases. Lancet. 2001 Oct 20;358(9290):1356-60. Review.
Molokhia M, McKeigue PM, Cuadrado M, Hughes G. Systemic lupus erythematosus in migrants from west Africa compared with Afro-Caribbean people in the UK. Lancet. 2001 May 5;357(9266):1414-5.
McKeigue PM, Carpenter JR, Parra EJ, Shriver MD. Estimation of admixture and detection of linkage in admixed populations by a Bayesian approach: application to African-American populations. Ann Hum Genet. 2000 Mar;64(Pt 2):171-86.
Parra EJ, Kittles RA, Argyropoulos G, Pfaff CL, Hiester K, Bonilla C et al. Ancestral proportions and admixture dynamics in geographically defined African-Americans living in South Carolina. American Journal of Physical Anthropology 2001;114:18-29.
Pfaff CL, Parra EJ, Bonilla C, Hiester K, McKeigue PM, Kamboh MI, Hutchinson RG, Ferrell RE, Boerwinkle E, Shriver MD. Population structure in admixed populations: effect of admixture dynamics on the pattern of linkage disequilibrium. Am J Hum Genet. 2001 Jan;68(1):198-207.
McKeigue PM. Multipoint admixture mapping. Genet Epidemiol. 2000 Dec;19(4):464-7.
Molokhia M, McKeigue P. Risk for rheumatic disease in relation to ethnicity and admixture. Arthritis Res. 2000;2(2):115-25. Review.
McKeigue PM. Efficiency of estimation of haplotype frequencies: use of marker phenotypes of unrelated individuals versus counting of phase-known gametes. Am J Hum Genet. 2000 Dec;67(6):1626-7.
Aitman TJ, Cooper LD, Norsworthy PJ, Wahid FN, Gray JK, Curtis BR, McKeigue PM, Kwiatkowski D, Greenwood BM, Snow RW, Hill AV, Scott J. Malaria susceptibility and CD36 mutation. Nature. 2000 Jun 29;405(6790):1015-6.
Zoratti R, Godsland IF, Chaturvedi N, Crook D, Crook D, Stevenson JC, McKeigue PM. Relation of plasma lipids to insulin resistance, nonesterified fatty acid levels, and body fat in men from three ethnic groups: relevance to variation in risk of diabetes and coronary disease. Metabolism. 2000 Feb;49(2):245-52.
Davey G, Ramachandran A, Snehalatha C, Hitman GA, McKeigue PM. Familial aggregation of central obesity in Southern Indians. Int J Obes Relat Metab Disord. 2000 Nov;24(11):1523-7.
Forouhi N, Jenkinson G, Thomas EL, Mierisova S, Bhonsle J, McKeigue PM et al. Relation of triglyceride stores in skeletal muscle cells to central obesity and insulin sensitivity in South Asian and European men. Diabetologia 1999;42:932-5.
McKeigue PM. Ethnic variation in insulin resistance and risk of Type 2 diabetes. In: Reaven G, Laws A, eds. Insulin Resistance, Totowa, NJ: Humana, 1999: 35-51.
Al-Mahroos F, McKeigue PM. High prevalence of diabetes mellitus in Bahrainis: associations with ethnicity and raised plasma cholesterol. Diabetes Care 1998; 21: 936-42.
McKeigue PM. Mapping genes that underlie ethnic differences in disease risk: methods for detecting linkage in admixed populations by conditioning on parental admixture. American Journal of Human Genetics 1998; 63: 241-51.
McKeigue PM. Mapping genes underlying ethnic differences in disease risk by linkage disequilibrium in recently admixed populations. American Journal of Human Genetics 1997; 60: 188-96.
Back to top
WORKING WITH DNAP IS - Pennington Center for Biomedical Research, Baton Rouge, La., USA
WORKING WITH DNAP IS - National Human Genome Center, Howard University, Washington, DC 20060, USA
WORKING WITH DNAP IS - Department of Anthropology, Penn State University, 409 Carpenter Bld., University Park, PA 16802, USA
Do you think its bogas?
AncestryByDNA 3.0
As far an AncestryByDNA is concerned, we know from a previous press release about the scope of this product:
"DNAPrint scientists are collaborating with Dr. Shriver to develop more advanced versions of ANCESTRYbyDNA that may be useful for discerning regional heritage proportions in individuals. For example, ANCESTRYbyDNA 3.0 is expected to be capable in the near future of determining whether an individual is of Irish/British, Middle European (French, German), Scandinavian, Mediterranean (Italian, Greek, Spanish) or Eastern European heritage as well as of Western/Central versus East African heritage or of Japanese, Chinese or Korean heritage."
There are some comments by Tony Frudakis, and others, that appeared on the RootsWeb threads about AncestryByDNA 3.0 in April that tell us a little more about the product:
Yesterday you made a number of references to the Version 3.0 of the DNAPrint test; and similarly there is mention of it in the FAQ section of your website. If it possible at this point to reveal to us what is in the works - in other words what will this test do that Version 2.0 cannot? For example, will it differentiate between European populations and those from, say, Pakistan? Will you be able to separate Koreans from Polynesians or will the latter remain with their "parental stock"? Are there any "ancestral informative markers" that could differentiate between Italians (i.e., Mediterranean populations) from Norwegians (i.e., Nordic populations). I am sure that I can come up with a lot more questions, but the bottom line is, what are you able to say about the new version of the test at present?
Yes, that is the end goal - intracontinental resolution. Chinese from Japanese among East Asians for example, or Scandinavian from Mediterranean from Indo-Pakistani for example within the IndoEuropean group.
Ancestrybydna 3.0 would need to be a DNA chip based product.
3.0 will be able to resolve between intracontinental groups, such as Japanese, Chinese, Korean, Pacific Islander etc. within the East Asian group. A much larger collection of markers will be needed to do this and the test will necessarily be more expensive. We know these markers exist, and we know a reasonable test can be developed to do this, we just havent finished the work yet but we are working on it. Given the larger number of markers that need to be screened from the genome, a DNA chip based approach is needed which is similar to the approach we now use, but we can cover more ground in a shorter period of time with the chip. Its a technical thing - dont worry about it. Suffice it to say, we need to screen much more of the genome to find these rare markers...
Basically, the allele frequencies for some of the Japanese vs. Chinese markers will he high, and for others low. Finding these markers among the 30,000 or so Ancestry Informative Markers in the human genome is the challenge we at DNAPrint face, but once they are found, they will work exactly as the current set do.
BTW, I notice that Tony has stopped subscribing to the RootsWeb threads. I must say I do not blame him. There are some nonsensical converations there about the Ancestry test, some axes being well and truly ground, and pretty obvious agendas from some of the regular (dare I say, to borrow a word, "purist") contributors. Here are a couple of interesting quotes from Tony:
Nonetheless there will probably be a "wrong" result reported on some board like this, or in one of the more left leaning newspapers somewhere in the country. How? There is a contingent of "scientists" who are more politicians than scientists, and who do not practice pan-genome screening, who feel our test is immoral and politically unwise, so do not be surprised if one of them "takes" the test and "reports" completely bogus answers. The reason that this is just going to be a matter of time is that some of these people have already said things in the media (to PBS) that are not direct lies but that are very clear attempts to obfuscate the fact that biogeographical ancestry is written in the DNA - they know it is but they do not want tests like ours being sold so to them the means (lying) justifies the end (what they think will be more harmony and world peace) . If they will lie on TV to the American Public, they will lie to make their point.
Anyone that says our results are not accurate for minor ancestry either continues to fail to understand the mathematical and molecular biological foundation of the test or does not want to understand. Possibly, there is a personal agenda that is being tended to. A test dealing with "race" is sure to stir up the activist in some people...." For example, if ours was a test for tuberculosis that was accurate for detecting low levels of TB within a few percent, would there be so much obstinance from people to accept the results from such a test? No. It would be used, and used to the benefit of those it was used on! It is simply because this test deals with race that people (some from exceedingly "activist" oriented colleges or places) refuse to accept our test.
Genetics sleuth
DNAPrint Genomics of Sarasota is thrust into the spotlight after its genetics test determines the race of a man sought in serial slayings.
BY MARGARET ANN MIILLE
SARASOTA -- Louisiana authorities had swabbed the mouths of more than 1,000 white men in a DNA dragnet but seemed no closer to finding their serial killer.
They abruptly switched gears after a genetics test conducted by Sarasota's DNAPrint Genomics Inc. indicated the man they sought in the murders of at least five women was 85 percent African and 15 percent American Indian.
In other words, he probably was black.
That information led to a break in the case and resulted days later in the arrest of a suspect.
The many twists and turns of the investigation were documented last month in a "Primetime Thursday" report with Diane Sawyer. The publicity thrust the Sarasota research and development company onto the national forensics radar screen, and boosted the price of the company's stock.
The over-the-counter shares were selling for 7.8 cents at the close of regular trading Friday, up nearly 2 cents.
"There has been a continuous stream of interest from all the major cities because of the case," said Tony Frudakis, DNAPrint's chief science officer. "Detectives from all around the country are saying they want to learn more about it."
The test, marketed to forensics experts as "DNAWitness," analyzes bodily fluids to determine within a few percentage points to what extent a person is of Native American, East Asian, Indo-European and sub-Saharan African heritage. Genealogy buffs who want to explore ambiguous parts of their family trees can buy the same test under the name "ANCESTRYbyDNA2.0."
What makes the Louisiana case so unusual is that it appears to be the first time in U.S. history that DNA was used to cull details of a criminal's physical appearance.
Still, DNAWitness may prove to be as controversial as it is useful.
"There are certain areas of genetic testing that are hot buttons," said Fred Paola, a bioethicist and associate professor of medicine at the University of South Florida in Tampa.
"Race is one of them; sexual preference is another. People who want to find differences between the races will point to a test like this . This is high-tech fodder for them."xxx
New information
For decades, white chromosome and mitochondrial tests have been used to track genetic relationships back generations by following paternal and maternal lineages. Those tests show only whether people are related; they don't reveal racial compositions.
DNAPrint's test does, by focusing on the 0.1 percent of DNA that makes us uniquely individual. The test targets regions of DNA that house single nucleotide polymorphisms, or SNPs. They are gene sequences, also known as "ancestry informative markers" because they vary by race.
Mark Shriver, who developed the test with DNAPrint, challenges concerns that test results could be used to promote racial stereotyping. In fact, he said, these measurements show how superficial racial categories are because they define a person only as a member of one race. A person's DNA ancestry is more precise because it reveals a mixture.
"It doesn't make race more than it is," said Shriver, an assistant professor of anthropology and genetics at Pennsylvania State University.
"What we see is a continuum of genetic variation in all populations. I can definitely see using proportional measures as a way to replace simply classifying people as black and white and Hispanic and Asian. It makes a lot more sense to express their proportional ancestry."
DNAWitness costs $1,000, compared with the $158 charged for ANCESTRY. The forensics test is much more involved. Strict documentation is required each time a DNA sample is moved, and there is an assumed risk that the results could be challenged in court.
From five to 20 blind samples are used in each DNAWitness test.
Altogether, DNAPrint so far has tested more than 3,300 samples.
Broadening the search
Derrick Todd Lee, the man arrested in Atlanta in late May in the Louisiana serial slayings, remains in a Baton Rouge-area prison.
A grand jury has indicted him on one charge of first-degree murder in East Baton Rouge. The district attorney's office there says the state will ask for the death penalty and intends to seek indictments related to three other murders in that parish.
The district attorney's office in another Louisiana parish intends to do the same in the case of a fifth murder that occurred there. A warrant for Lee's arrest also is being prepared in a sixth murder in West Baton Rouge Parish.
Before Lee's capture, law enforcement agencies were seeking a 25- to 35-year-old white man, based on contradictory witness accounts and an FBI profile that says serial killers are usually white loners.
DNAPrint offered to help, and analyzed DNA that was extracted by authorities from bodily fluid found at one of the crime scenes.
Mary Ann Godawa, a Baton Rouge police corporal and spokeswoman for the Multi-Agency Homicide Task Force, said that while DNAPrint's test was useful, authorities proceeded with it cautiously.
Rather than narrow their search, they expanded it.
"This is so new and it's so cutting-edge," Godawa said. "We could not completely turn this case around on that. We felt more comfortable explaining to the public that we wanted to broaden the search, that it may not be a white male as we had said before."
Godawa said authorities also asked tipsters to focus not only on color and ethnicity, but on behavior.
About 10 law enforcement agencies, primarily in the Southeast, already were using DNAPrint's test before Lee's arrest hit the national news.
Sales expected to climb
Frudakis, DNAPrint's science officer, said he expects sales to climb because of the heightened visibility of the test, partly because of the potential appeal of using it to help solve "cold" cases.
"You can make some very worthwhile general inferences about physical appearance knowing the ancestry," said Frudakis. It's possible, for example, to identify a range of likely physical characteristics, including nose shape and skin color.
But high interest in the test is tempered with concerns about its potential racial overtones, which have caused some law enforcement agencies to balk.
"Race is a hot potato because of political correctness," Frudakis said. "It's a shame. But every product goes through an acceptance cycle, especially this one."
For now, DNAWitness -- or ANCESTRY, depending on who's using it -- is DNAPrint's sole offering.
In the works are upgraded versions that would further pinpoint the geographical areas of a person's heritage. So are other tests that would help forensic specialists determine eye and hair colors from DNA left at crime scenes.
Forensics is what put DNAPrint on the map, but it's not the only area in which the company delves. DNAPrint researchers continue to explore ways in which genetic information could be used to predict how individuals will react to certain prescription drugs.
Nonetheless, some experts are concerned that DNA testing as a whole can trample on the civil rights of people whose samples have been collected.
One of them is Barry Scheck, director of the Innocence Project at the Cardozo School of Law in New York, which uses DNA to reverse false convictions.
Scheck is unfamiliar with the specifics of DNAPrint's test, but wasn't surprised that a company had found a way to reliably identify genetic markers for race.
His reservations focus on DNA archiving as a whole. Genetic data obtained from crime scenes, convicted offenders and even volunteers are stored -- in the name of quality assurance -- in various governmental databanks long after the purpose for which they were collected is served, Scheck said.
"My concern is that you can go back to those samples and to begin to do DNA analysis looking at other genes, and for what purpose? We now have information that certain genes taken together will show that somebody is a pedophile or has a propensity for violence or mental problems. That is what we are concerned about.
"People don't know that their profiles as well as their original DNA samples are being kept. This opens up a potential for serious abuse."
Noah Rosenberg University of Southern California noahr@usc.edu March 19, 2003 - - http://chimolecularmed.com/postconpdf/mmmtrack1/noah_rosenberg.pdf
My research uses ancestry informative markers to identify genes influencing obesity that have been transmitted to admixed individuals through linkage disequilibrium. - - http://main.uab.edu/show.asp?durki=45336
Genes, Medicine, and the New Race Debate - - http://www.technologyreview.com/forums/forum.asp?forumid=264
Carrie Pfaff looks like a KEY player for us - http://www.rps.psu.edu/0201/genetic.html
Your going to like this read - http://ancestrybydna.com/AnneHart2.pdf
Skin pigmentation, biogeographical ancestry and admixture mapping
Mark D. Shriver1 , Esteban J. Parra1, 7, Sonia Dios1, Carolina Bonilla1, Heather Norton1, Celina Jovel1, Carrie Pfaff1, Cecily Jones2, Aisha Massac2, Neil Cameron3, Archie Baron3, Tabitha Jackson3, George Argyropoulos4, Li Jin5, Clive J. Hoggart6, Paul M. McKeigue6 and Rick A. Kittles2
(1)
Department of Anthropology, Penn State University, 409 Carpenter Bld., University Park, PA 16802, USA
(2)
National Human Genome Center, Howard University, Washington, DC 20060, USA
(3)
Takeway Media, London, EC1R OBD, UK
(4)
Pennington Center for Biomedical Research, Baton Rouge, La., USA
(5)
Department of Environmental Health University of Cincinnati, Cincinnati, Ohio, USA
(6)
Department of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, WC1E 7HT, UK
(7)
Present address: Department of Anthropology, University of Toronto at Mississauga, Mississauga, Ontario, Canada
Abstract Ancestry informative markers (AIMs) are genetic loci showing alleles with large frequency differences between populations. AIMs can be used to estimate biogeographical ancestry at the level of the population, subgroup (e.g. cases and controls) and individual. Ancestry estimates at both the subgroup and individual level can be directly instructive regarding the genetics of the phenotypes that differ qualitatively or in frequency between populations. These estimates can provide a compelling foundation for the use of admixture mapping (AM) methods to identify the genes underlying these traits. We present details of a panel of 34 AIMs and demonstrate how such studies can proceed, by using skin pigmentation as a model phenotype. We have genotyped these markers in two population samples with primarily African ancestry, viz. African Americans from Washington D.C. and an African Caribbean sample from Britain, and in a sample of European Americans from Pennsylvania. In the two African population samples, we observed significant correlations between estimates of individual ancestry and skin pigmentation as measured by reflectometry (R2=0.21, P<0.0001 for the African-American sample and R2=0.16, P<0.0001 for the British African-Caribbean sample). These correlations confirm the validity of the ancestry estimates and also indicate the high level of population structure related to admixture, a level that characterizes these populations and that is detectable by using other tests to identify genetic structure. We have also applied two methods of admixture mapping to test for the effects of three candidate genes (TYR, OCA2, MC1R) on pigmentation. We show that TYR and OCA2 have measurable effects on skin pigmentation differences between the west African and west European parental populations. This work indicates that it is possible to estimate the individual ancestry of a person based on DNA analysis with a reasonable number of well-defined genetic markers. The implications and applications of ancestry estimates in biomedical research are discussed.
Elecetronic database information: URLs for the data in this article are as follows:
dbSNP web page, http://www.ncbi.nlm.nih.gov/SNP/
Shriver Lab web page, http://anthro.psu.edu/rsrch/biolab/index.html
McKeigue Lab web site, http://www.lshtm.ac.uk/eph/eu/GeneticEpidemiologyGroup.htm
OMIM web site, http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM
Characterizing Genome Structure of Admixed Populations for Disease Association Studies
Neil A. Hattangadi
HST MD 2005
David Altshuler, MD, PhD
Whitehead Institute Center for Genome Research
Massachusetts General Hospital, Molecular Biology Department
It is believed that common genetic variants play a significant role in susceptibility to common diseases, such as hypertension, cancer, and autoimmune disorders.1 The difficulty of finding these common disease-associated polymorphisms arises because the disease phenotypes are complex, with a strong environmental component, and the genetic contribution is multiallelic with varying, incomplete penetrance.2
The approach we are using to identify disease-associated polymorphisms is to analyze historically divergent populations with varying disease rates. A number of diseases show large differences in prevalence between European and African populations; for example, multiple sclerosis is much more common in Europeans and prostate cancer in Africans. We can search for markers that are associated with these diseases by studying the genomes of African-American individuals, which represent the admixture of European and African genomes. We compute the probability that a given chromosomal region of an individual African-American genome descends from European or African ancestry. Regions of the African-American genome which are enriched for European ancestry in patients with multiple sclerosis, but not in those without the disease, may be associated with MS; the converse is true for prostate cancer.
In this project, a quantitative genotyping methodology has been developed which permits rapid identification of ancestry-informative markers. We have also developed a Hidden Markov Model to measure the size of "ancestry blocks" – regions of the African-American genome of continuous ancestry due to linkage disequilibrium. We have found the ancestry blocks to be in excess of 15 Mb, suggesting far fewer markers are needed for admixture-based disease studies than conventional genetic association studies. We also use the HMM to compute the expected ancestral origin at any point on the admixed chromosome. The model is being applied to two large case-control populations for multiple sclerosis and prostate cancer to identify potential disease-associated loci.
1Lander, E.S. (1996) The new genomics: global views of biology. Science 274, 536-539.
2Altshuler, D., et al. (2000) Guilt by association. Nature Genetics 26, 135-137.
3Stephens, J.C., et al. (1994) Mapping by admixture linkage disequilibrium in human populations: limits and guidelines. Am. J. Hum. Genet. 55, 809-824.
Previous
Next
Using Genetic Admixture to Explain Racial Differences in Insulin-Related Phenotypes - - http://www-hsc.usc.edu/~goran/PDF%20papers/130.pdf
Genetic estimation of biogeographical ancestry. C.L. Pfaff, E.J. Parra, M.D. Shriver. Pennsylvania State University, University Park, PA.
Ethnicity is comprised of both biological and cultural components. Biogeographical ancestry (BGA) refers to the component of ethnicity that is biologically determined and can be estimated using genetic markers that have distinctive allele frequencies for the populations in question (referred to as population-associated alleles - PAAs). We have developed a method that uses a maximum likelihood (ML) approach to estimate the primary population source(s) of an unknown DNA sample. Once the potential source populations have been narrowed to the two or three populations with the highest log likelihood ratios (LLR), individual admixture proportions are estimated for the multilocus genotype observed in order to characterize the proportional ancestry of the sample.
We have explored the potential of this method by examining the multilocus genotypes of African, European, and African-American DNA samples using a panel of 10 PAAs that have high allele frequency differences between Africans and Europeans. In each of the 906 African and European samples BGA was correctly estimated using a maximum likelihood approach. In 863 cases the LLR for the estimation was > 3 (avg. LLR = 4.7), indicating a strong confidence in the estimation of ancestry. However, as expected, the ML estimate is less precise for African-American samples. In these cases the inaccurate and low-confidence estimates tend to be for individuals with relatively high admixture proportions, making population distinctions more difficult. A second source of inaccuracy is the relatively low number of informative markers currently available. In order to examine this cause, we simulated 2000 individuals with multilocus genotypes at 20 loci. Of these, ancestry estimation was correct in every case, and only 2 individuals had LLR <3. While the utility of this method is currently limited by the restricted number of PAAs available for various populations, it is clear that as larger numbers of ancestry-informative markers become available, estimation of BGA may become a powerful tool for the elucidation of an individual's genetic and population history, as well as the identification of unknown samples in forensic cases.
Genetic Identity testing - http://www.dnatesting.biz/Ancestry_Testing/ancestry_testing.html
Ancestry Mix May Be One Key To Obesity -
http://www.sciencedaily.com/releases/2003/07/030711092231.htm
Ancestry and Obesity and dnap -
http://www.sciencentral.com/articles/view.php3?language=english&type=article&article_id=2183...
Ancestry Mix May Be One Key to Obesity
Press Release - July 10, 2003
University Park, Pa. - Estimating proportions of ancestry may provide clues to genetic influences on obesity, osteoporosis and metabolism, and help public health professionals better educate populations, according to an international team of researchers.
"Studies show that women with European-American ancestry have a higher occurrence of osteoporosis than African-American women," says Dr. Mark D. Shriver, assistant professor of anthropology at Penn State. "African-American women have a higher prevalence of obesity, but it appears not to be as dangerous to their health as it is for European-American women."
Although socially many Americans identify with only one racial/ethnic group, U.S. residents are actually highly mixed. All human populations are closely related and there are very few genes that are different between any two populations. Despite this, there are significant health disparities in the United States with historically disadvantaged populations generally suffering higher prevalences of chronic diseases.
Although it is clear to most biomedical researchers that the primary reason for this difference is differences in wealth, power and privilege, there may be some genetic basis for part of the difference and finding these genes might help researchers to better understand the progression of these diseases and potentially lead to new diagnoses and treatments, according to Shriver.
The Penn State scientist has developed a method to estimate the ancestral proportions of individuals, the percentage of genes that are West African, Native American or European. Researchers can then use this information to investigate the relationships between admixture and various diseases.
In the study, the researchers looked at body mass index - a measure of weight and height, resting metabolic rate, fat mass, fat-free mass and bone mineral density in 145 African-American women from Birmingham, Ala., Baltimore and New York City. They also determined West African ancestry proportion for these women. Determination of West African ancestry was done at Penn State, using 18 ancestry-informative genetic markers.
In the July issue of Obesity Research, the researchers report their analysis of the physical traits and the measure of West African ancestry using two statistical methods. The results showed that there is an association between body mass index -- a measure of obesity -- and West African ancestry. One analytical approach suggests there is also an association between fat mass and fat-free mass and West African ancestry, while the association with bone mineral density is less clear.
"These results support the use of ancestry informative markers when studying differences among admixed populations in complex biomedical traits, particularly when exploring genetic factors influencing these differences," says Dr. Jose Fernandez, assistant professor of nutrition, University of Alabama at Birmingham, and lead author of the published study. "The differences in the prevalence of obesity-related phenotypes among African American females and European American Females could be partly due to genetic factors."
The comparison of resting metabolic rate did not show a significant association, but the researchers believe that the sample size was too small to make this an accurate measure.
Previous studies have explored the relationships of risk for systemic lupus erythematosus and genetic admixture and diabetes and genetic admixture.
"This study supports two things. First, that we are a very mixed country and proportional ancestry is a much more realistic and scientific means to study human variation than is categorical race," says Shriver. "Second, that admixture mapping can and should be an important tool to study both the genetic and environmental causes of obesity."
Beside Shriver and Fernandez, the team included T. Mark Beasley, Nashua Rafla-Demetrious, Jeanine Albu, Roland L. Weinsier, David B. Allison, University of Alabama at Birmingham; Esteban Parra, Penn State; Barbara Nicklas, Wake Forest University and University of Maryland; Alice S. Ryan, University of Maryland and Paul M McKeigue and Clive L. Hoggart, London School of Hygiene and Tropical Medicine.
The researchers suggest that expanding the study to larger samples, including environmental measures, increasing the number of markers and looking at other admixed populations like the African Caribbeans is necessary.
NIH supported the study.
--- **aem**
Dr. Shriver is at 814-863-1078 or at mds17@psu.edu by e-mail. Dr. Fernandez is at 205-934-2029 or at jose@uab.edu by e-mail.
This press release courtesy of Penn State's Department of Public Information
Return to News Index
IVRT Ya thats a good one i'll chande it to wishbone.LOL
DougS, ya I did forget to take my Flintstones today.Your right about Stocky and his training Video but its not to get in shape less you like the wip. LOL
oh i'am out of breath.
Tony a 100 employees in a year i'll be happy with 50.
Its not just about money its about catching the Bad guy.
Wait till the Mayor finds out how much you spent of the Dept.'s money on Dna testing only to come up with zip. When you could have made the Mayor look good and the Dept. by going to Dnaprint Genomics for a cheaper price and a much better profile. Man are you going to be in trouble.
What !! you want to pay more and get less. Tell me have you taken your prozack today. Dnap is cheaper and you get more for the dollar and DEALS.
Eye color & Hair color = $$$$$$$$$