GCTA: A Tool for Genome-wide Complex Trait Analysis (2024)

  • Journal List
  • Am J Hum Genet
  • v.88(1); 2011 Jan 7
  • PMC3014363

As a library, NLM provides access to scientific literature. Inclusion in an NLM database does not imply endorsem*nt of, or agreement with, the contents by NLM or the National Institutes of Health.
Learn more: PMC Disclaimer | PMC Copyright Notice

GCTA: A Tool for Genome-wide Complex Trait Analysis (1)

Guide for AuthorsAbout this journalExplore this journalAmerican Journal of Human Genetics

Am J Hum Genet. 2011 Jan 7; 88(1): 76–82.

PMCID: PMC3014363

PMID: 21167468

Jian Yang,1, S. Hong Lee,1 Michael E. Goddard,2,3 and Peter M. Visscher1

Author information Article notes Copyright and License information PMC Disclaimer

Abstract

For most human complex diseases and traits, SNPs identified by genome-wide association studies (GWAS) explain only a small fraction of the heritability. Here we report a user-friendly software tool called genome-wide complex trait analysis (GCTA), which was developed based on a method we recently developed to address the “missing heritability” problem. GCTA estimates the variance explained by all the SNPs on a chromosome or on the whole genome for a complex trait rather than testing the association of any particular SNP to the trait. We introduce GCTA's five main functions: data management, estimation of the genetic relationships from SNPs, mixed linear model analysis of variance explained by the SNPs, estimation of the linkage disequilibrium structure, and GWAS simulation. We focus on the function of estimating the variance explained by all the SNPs on the X chromosome and testing the hypotheses of dosage compensation. The GCTA software is a versatile tool to estimate and partition complex trait variation with large GWAS data sets.

Main Text

Despite the great success of genome-wide association studies (GWAS), which have identified hundreds of SNPs conferring the genetic variation of human complex diseases and traits,1 the genetic architecture of human complex traits still remains largely unexplained. For most traits, the associated SNPs from GWAS only explain a small fraction of the heritability.2,3 There has not been any consensus on the explanation of the “missing heritability.” Possible explanations include a large number of common variants with small effects, rare variants with large effects, and DNA structural variation.2,4 We recently proposeda method of estimating the total amount of phenotypic variance captured by all SNPs on the current generation of commercial genotyping arrays and estimated that ∼45% of the phenotypic variance for human height can be explained by all common SNPs.5 Thus, most of the heritability for height is hiding rather than missing because of many SNPs with small effects.5,6 In contrast to single-SNP association analysis, the basic concept behind our method is to fit the effects of all the SNPs as random effects by a mixed linear model (MLM),

y=Xβ+Wu+ɛwithvar(y)=V=WWσu2+Iσɛ2,

(Equation 1)

where y is an n × 1 vector of phenotypes with n being the sample size, β is a vector of fixed effects such as sex, age, and/or one or more eigenvectors from principal component analysis (PCA), u is a vector of SNP effects with uN(0,Iσu2), I is an n × n identity matrix, and ɛ isa vector of residual effects with ɛN(0,Iσɛ2). W isa standardized genotype matrix with the ijth element wij=(xij2pi)/2pi(1pi), where xij is the number of copies of the reference allele for the ith SNP of the jth individual and pi is the frequency of the reference allele. If we define A =WW/N and define σg2 as the variance explained by all the SNPs, i.e.,σg2=Nσu2, with N being the number of SNPs, then Equation 1 will be equivalent to:7–9

y=+g+ɛwithV=Aσg2+Iσɛ2,

(Equation 2)

where g is an n × 1 vector of the total genetic effects of the individuals with gN(0,Aσg2), and A is interpreted as the genetic relationship matrix (GRM) between individuals. We can therefore estimate σg2 by the restricted maximum likelihood (REML) approach,10 relying on the GRM estimated from all the SNPs. Here we report a versatile tool called genome-wide complex trait analysis (GCTA), which implements the method of estimating variance explained by all SNPs, and extend the method to partition the genetic variance onto each of the chromosomes and also to estimate the variance explained by the X chromosome and test for dosage compensation in females. We developed GCTA in five function domains: data management, estimation of the GRM from a set of SNPs, estimation of the variance explained by all the SNPs on a single chromosome or the whole genome, estimation of linkage disequilibrium (LD) structure, and simulation.

Estimation of the Genetic Relationship from Genome-wide SNPs

One of the core functions of GCTA is to estimate the genetic relationships between individuals from the SNPs. From the definition above, the genetic relationship between individuals j and k can be estimated by the following equation:

Ajk=1Ni=1N(xij2pi)(xik2pi)2pi(1pi).

(Equation 3)

We provide a function to iteratively exclude one individual of a pair whose relationship is greater than a specified cutoff value, e.g., 0.025, while retaining the maximum number of individuals in the data. For data collected from family or twin studies, we recommend that users estimate the genetic relationships with all of the autosomal SNPs and then use this option to exclude close relatives. The reason for exclusion is that the objective of the analysis is to estimate genetic variation captured by all the SNPs, just as GWAS does for single SNPs. Including close relatives, such as parent-offspring pairs and siblings, would result in the estimate of genetic variance being driven by the phenotypic correlations for these pairs (just as in pedigree analysis), and this estimate could be a biased estimate of total genetic variance, for example because of common environmental effects. Even if the estimate is not biased, its interpretation is different from the estimate from “unrelated” individuals: a pedigree-based estimator captures the contribution from all causal variants (across the entire allele frequency spectrum), whereas our method captures the contribution from causal variants that are in LD with the genotyped SNPs.

As a by-product, we provide a function in GCTA to calculate the eigenvectors of the GRM, which is asymptotically equivalent to those from the PCA implemented in EIGENSTRAT11 because the GRM (Ajk) defined in GCTA is approximately half of the covariance matrix (Ψjk) used in EIGENSTRAT. The only purpose of developing this function is to calculate eigenvectors and then include them in the model as covariates to capture variance due to population structure. More sophisticated analyses of the population structure can be found in programs such as EIGENSTRAT11 and STRUCTURE.12

Estimation of the Variance Explained by Genome-wide SNPs by REML

The GRM estimated from the SNPs can be fitted subsequently in an MLM to estimate the variance explained by these SNPs via the REML method.10 Previously, we included only one genetic factor in the model. Here we extend the model in a general form as

y=Xβ+i=1rgi+ɛ,

where gi is a vector of random genetic effects, which could be the total genetic effects for the whole genome or for a single chromosome. In this model, the phenotypic variance (σP2) is partitioned into the variance explained by each of the genetic factors and the residual variance,

V=i=1rAiσi2+Iσɛ2,

where σi2 is the variance of the ith genetic factor with its corresponding GRM, Ai.

In GCTA, we provide flexible options to specify different genetic models. For example:

(1) To estimate the variance explained by all autosomal SNPs, we can specify the model as y = + g + ɛ with V=Agσg2+Iσɛ2, where g is an n × 1 vector of the aggregate effects of all the autosomal SNPs for all of the individuals and Ag is the GRM estimated from these SNPs. This model is the same as Equation 2.

(2) To estimate the variance of genotype-environment interaction effects (σge2), we can specify the model as y= + g + ge + ɛ with V=Agσg2+Ageσge2+Iσɛ2, where ge is a vector of genotype-environment interaction effects for all of the individuals with Age = Ag for the pairs of individuals in the same environment and with Age = 0 for the pairs of individuals in different environments.

(3) To partition genetic variance onto each of the 22autosomes, we can specify the model as y=Xβ+i=122gi+ɛ with V=i=122Aiσi2+Iσɛ2, where gi is a vector of genetic effects attributed to the ith chromosome and Ai is the GRM estimated from the SNPs on the ith chromosome.

GCTA implements the REML method via the average information (AI) algorithm.13 In the REML iteration process, the estimates of variance components from the tth iteration are updated by θ(t+1) =θ(t) +(AI(t))−1∂ L/∂ θ|θ(t), where θ is a vector of variance components (σ12, …, σr2 and σɛ2); L is the log likelihood function of the MLM (ignoring the constant), L =−1/2(log|V| +log|XV−1X| +yPy) with P =V−1V−1X(XV−1X)−1XV−1; AI is the average of the observed and expected information matrices, AI=1/2[yPA1PA1PyyPA1PArPyyPA1PPyyPArPA1PyyPArPArPyyPArPPyyPPA1PyyPPArPyyPPPy]; and ∂ L/∂ θ is a vector of first derivatives of the log likelihood function with respect to each variance component, L/θ=1/2[tr(PA1)yPA1Pytr(PAr)yPArPytr(P)yPPy].13 At the beginning of the iteration process, all of the components are initialized by an arbitrary value, i.e., σi2(0)=σP2/(r+1), which is subsequently updated by the expectation maximization (EM) algorithm, σi2(1)=[σi4(0)yPAiPy+tr(σi2(0)Iσi4(0)PAi)]/n. The EM algorithm is used as an initial step to determine the direction of the iteration updates because it is robust to poor starting values. After one EM iteration, GCTA switches to the AI algorithm for the remaining iterations until the iteration converges with the criteria of L(t + 1)L(t) < 10−4, where L(t) is the log likelihood of the tth iteration. In the iteration process, any component that escapes from the parameter space (i.e., its estimate is negative) will be set to 10−6 × σP2. If a component keeps escaping from the parameter space, it will be constrained at 10−6 × σP2.

From the REML analysis, GCTA has an option to provide the best linear unbiased prediction (BLUP) of the total genetic effect for all individuals. BLUP is widely used by plant and animal breeders to quantify the breeding value of individuals in artificial selection programs14 and also by evolutionary geneticists.15 Consider Equations 1 and 2, i.e., y = + Wu + ɛ and y = + g + ɛ. Because these two models are mathematically equivalent,7–9 the BLUP of g can be transformed to the BLUP of u by u^=WA1g^/N. Here the estimate of ui corresponds to the coefficient wij,which is then rescaled for the original xij by u^i=u^i/2pi(1pi). We could obtain the BLUP of SNP effects in a discovery set by GCTA and predict genetic values of the individuals in a validation set (g^new=Wnewu^). For example, GCTA could be used to predict SNP effects in a discovery set, and the SNP effects could be used in PLINK to predict whole-genome profiles via the scoring approach in a validation set. If the predictions are unbiased, then the regression slope of the observed phenotypes on the predicted genetic values is 1.14 In that case, the genetic value calculated based on the BLUP of SNP effects is an unbiased predictor of the true genetic value in the validation set (gnew), in the sense that E(gnew|g^new)=g^new.16,17 Prediction analyses of human complex traits have demonstrated that many SNPs that do not pass the genome-wide significance level have substantial contribution to the prediction.18,19 This option is therefore useful for the whole-genome prediction analysis with all of the SNPs, irrespective of their association p values.

Estimation of the Variance Explained by the SNPs on the X Chromosome

The method of estimating the genetic relationship from the X chromosome is different to that for the autosomal SNPs, because males have only one X chromosome. We modified Equation 3 for the X chromosome as:

AjkM=i=1N(xijMpi)(xikMpi)pi(1pi)foramale-malepair,

AjkF=i=1N(xijF2pi)(xikF2pi)2pi(1pi)forafemale-femalepair,and

AjkMF=i=1N(xijMpi)(xikF2pi)2pi(1pi)foramale-femalepair,

where xijM and xijF are the number of copies of the reference allele for an X chromosome SNP for a male anda female, respectively.

Assuming the male-female genetic correlation to be 1, the X-linked phenotypic covariance between a pair of individuals is:20

covX(yjM,ykM)=E(AjkM)σX(M)2foramale-malepair,

covX(yjF,ykF)=E(AjkF)σX(F)2forafemale-femalepair,and

covX(yjM,ykF)=E(AjkMF)σX(M)σX(F)foramale-femalepair,

where σX(M)2 and σX(F)2 are the genetic variance attributed to the X chromosome for males and females, respectively.

The relative values of σX(M)2 and σX(F)2 depend on the assumption made regarding dosage compensation for X chromosome genes. There are two alleles per locus in females, but only one in males. If we assume that each allele has a similar effect on the trait (i.e., no dosage compensation), the genetic variance on the X chromosome for females is twice that for males: i.e., σX2=σX(F)2=2σX(M)2. Thus,

covX(yjM,ykM)=12E(AjkM)σX2foramale-malepair,

covX(yjF,ykF)=E(AjkF)σX2forafemale-femalepair,and

covX(yjM,ykF)=12E(AjkMF)σX2foramale-femalepair.

This can be implemented by redefining GRM for the X chromosome as AXND=1/2AX for male-male pairs, AXND=AX for female-female pairs, and AXND=1/2AX for male-female pairs. If we assume that each allele in females has only half the effect of an allele in males (i.e., full dosage compensation), the X-linked genetic variance for females is half that for males: i.e., σX2=σX(F)2=1/2σX(M)2. Thus,

covX(yjM,ykM)=2E(AjkM)σX2foramale-malepair,

covX(yjF,ykF)=E(AjkF)σX2forafemale-femalepair,and

covX(yjM,ykF)=2E(AjkMF)σX2foramale-femalepair.

Therefore, the raw AX matrix should be parameterized as AXFD=2AX for male-male pairs, AXFD=AX for female-female pairs, and AXND=2AX for male-female pairs. Thethird possibility is to assume equal genetic variance on the X chromosome for males and females, i.e., σX2=σX(F)2=σX(M)2, in which case the AX matrix is not redefined at all.

We can estimate σX2 by fitting the model y =Xβ +gX +g +ɛ, where gX is a vector of genetic effectsattributable to the X chromosome, with var(gX)=AXNDσX2 assuming no dosage compensation, var(gX)=AXFDσX2 assuming full dosage compensation, and var(gX)=AXσX2 assuming equal X-linked genetic variance for males and females. Test of dosage compensation can be achieved by comparing the likelihoods of model fitting under the three assumptions.

Estimation of the Variance Explained by Genome-wide SNPs for a Case-Control Study

The methodology described above is also applicable for case-control data, for which the estimate of variance explained by the SNPs corresponds to variation on the observed 0–1 scale. Under the assumption of a threshold-liability model for a disease, i.e., disease liability on the underlying scale follows standard normal distribution,21 the estimate of variance explained by the SNPs on the observed 0–1 scale can be transformed to that on the unobserved continuous liability scale by a linear transformation.22 The relationship between additive genetic variance on the observed 0–1 and unobserved liability scales was proposed more than a half century ago,23,24 and we recently extended this transformation to account for ascertainment bias in a case-control study, i.e., a much higher proportion of cases in the sample than in the general population (unpublished data). We provide options in GCTA to analyze a binary trait and to transform the estimate on the 0–1 scale to that on the liability scale with an adjustment for ascertainment bias. There is an important caveat in applying the methods described herein to case-control data. Any batch, plate, or other technical artifact that causes allele frequencies between case and control on average to be more different than that under the null hypothesis stating that the samples come from the same population will contribute to the estimation of spurious genetic variation, because cases will appear to be more related to other cases than to controls. Therefore, stringent quality control is essential when applying GCTA to case-control data. Quantitative traits are less likely to suffer from technical genotyping artifacts because they will generally not lead to spurious association between continuous phenotypes and genotypes.

Estimation of the Inbreeding Coefficient from Genome-wide SNPs

Apart from estimating the genetic relatedness between individuals, GCTA also has a function to estimate the inbreeding coefficient (F) from SNP data, i.e., the relationship between haplotypes within an individual. Two estimates have been used: one based on the variance of additive genetic values (diagonal of the SNP-derived GRM) and the other based on SNP hom*ozygosity (implemented in PLINK).25 Let (1 – pi)2 + pi(1 – pi)F, 2pi(1 – pi)(1 – F), and pi2 + pi(1 – pi)F be the frequencies of the three genotypes of a SNP i and let hi = 2pi(1 – pi). The estimate based on the variance of additive genotype values is

F^iI=[xiE(xi)]2/hi1=(xi2pi)2/hi1andvar(F^iI|F)=(1hi)/hi+7(12hi)F/hiF2,

where xi is the number of copies of the reference allele for the ith SNP. This is a special case of Equation 3 for a single SNP when j = k. The estimate based upon excess hom*ozygosity is

F^iII=[O(#hom)E(#hom)]/[1E(#hom)]=1xi(2xi)/hiandvar(F^iII|F)=(1hi)/hi(12hi)F/hiF2,

where O(#hom) and E(# hom) are the observed and expected number of hom*ozygous genotypes in the sample, respectively. Both estimators are unbiased estimates of F in the sense that E(F^iI|F)=E(F^iII|F)=F, but their sampling variances are dependent on allele frequency, i.e., var(F^iI)=var(F^iII)= (1 – hi) / hi if F = 0. In addition, the covariance between the two estimators is (3hi – 1) / hi + (1 – 2hi)F / hiF2, so that the sampling covariance between the estimators is (3hi – 1) / hi and the sampling correlation is (3hi– 1) / (1 – hi) when F = 0. We proposed an estimator based upon the correlation between uniting gametes:5

F^iIII=[xi2(1+2pi)xi+2pi2]/hiandvar(F^iIII|F)=1+2(12hi)F/hiF2.

F^iIII is also an unbiased estimator of F in the sense that E(F^iIII|F)=F. If F = 0, var(F^iIII)=1 regardless of allele frequency, which is smaller than the sampling variance of F^iI and F^iII, i.e., 1≤ (1 – hi) / hi. When 0< F < 1/3, F^iIII also has a smaller variance than F^iI and F^iII. In GCTA, we use 1+ F^iIII rather than 1+ F^iI to calculate the diagonal of the GRM. For multiple SNPs, we average the estimates over all of the SNPs, i.e., F^=1/Ni=1NF^i.

Estimating LD Structure

In a standard GWAS, particularly with a large sample size, the mean (λmean) or median (λmedian) of the test statistics for single-SNP associations often deviates from its expected value under the null hypothesis of no association between any SNP and the phenotype, which is usually interpreted as the effect due to population stratification and/or cryptic relatedness.11,26,27 An alternative explanation is that polygenic variation causes the observed inflated test statistic.18 To predict the genomic inflation factors, λmean and λmedian, from polygenic parameters such as the total amount of variance that is explained by all SNPs, we need to quantify the LD structure between SNPs and putative causal variants (unpublished data). GCTA provides a function to search for all the SNPs in LD with the “causal variants” (mimicked by a set of SNPs chosen by the user). Given a causal variant, we use simple regression to test for SNPs in LD with the causal variant within d Mb distance in either direction. PLINK has an option (“show targets”) to select SNPs in LD with a set of target SNPs with LD r2 larger than a user-specified cutoff value. This function is very useful to distinguish independent association signals but less suited to predict λmean and λmedian, because the test statistics of the SNPs in modest LD with causal variants (SNPs at Mb distance with low r2) will also be inflated to a certain extent, and these test statistics will contribute to the genomic inflation factors.

GWAS Simulation

We provided a function to simulate GWAS data based on the observed genotype data. For a quantitative trait, the phenotypes are simulated by the simple additive genetic model y= Wu + ɛ, where the notation is the same as above. Given a set of SNPs assigned as causal variants, the effects of the causal variants are generated from a standard normal distribution, and the residual effects are generated from a normal distribution with mean of 0 and variance of σg2(1/h21), where σg2 is the empirical variance of Wu and h2 is the user specified heritability. For a case-control study, assuming a threshold-liability model, disease liabilities are simulated in the same way as that for the phenotypes of a quantitative trait. Any individual with disease liability exceeding a certain threshold T is assigned to be a case and a control otherwise, where T is the threshold of normal distribution truncating the proportion of K (disease prevalence). The only purpose of this function is to do a simple simulation based on the observed genotype data. More complicated simulation can be performed with programs such as ms,28 GENOME,29 FREGENE,30 and HAPGEN.31

Data Management

We chose the PLINK25 compact binary file format (.bed, .bim, and .fam) as the input data format for GCTA because of its popularity in the genetics community and its efficiency of data storage. For the imputed dosage data, we use the output files of the imputation program MACH32 (.mldose.gz and .mlinfo.gz) as the inputs for GCTA. For the convenience of analysis, we provide options to extract a subset of individuals and/or SNPs and to filter SNPs based on certain criteria, such as chromosome position, minor allele frequency (MAF), and imputation R2 (for the imputed data). However, we do not provide functions for a thorough quality control (QC) of the data, such as Hardy-Weinberg equilibrium test and missingness, because these functions have been well developed in many other genetic analysis packages, e.g., PLINK, GenABEL,33 and SNPTEST.34 We assume that the data have been cleaned by a standard QC process before entering into GCTA.

Estimating Total Heritability

The method implemented in GCTA is to estimate the variance explained by chromosome- or genome-wide SNPs rather than the trait heritability. Estimating the heritability (i.e., variance explained by all the causal variants), however, relies on the genetic relationship at causal variants that is predicted with error by the genetic relationship derived from the SNPs as a result of imperfect tagging. We have previously established that the prediction error is c + 1 / N, with c depending on the distribution of the MAF of causal variants. We therefore developed a method based on simple regression to correct for the prediction error by

Ajk={1+β(Ajj1),j=kβAjk,jk,

where β =1−(c +1/N)/var(Ajk). The estimate of variance explained by all of the SNPs after such adjustment is an unbiased estimate of heritability only if the assumption about the MAF distribution of causal variants is correct.

Efficiency of GCTA Computing Algorithm

GCTA implements the REML method based on the variance-covariance matrix V and the projection matrix P. Insome of the mixed model analysis packages, such as ASREML,35 to avoid the inversion of the n × n V matrix, people usually use Gaussian elimination of the mixed model equations (MME) to obtain the AI matrix based on sparse matrix techniques. The SNP-derived GRM matrix, however, is typically dense, so the sparse matrix technique will bring an extra cost of memory and CPU time. Moreover, the dimension of MME depends on the number of random effects in the model, whereas the V matrix does not. For example, when fitting the 22 chromosomes simultaneously in the model, the dimension of MME is 22n × 22n (ignoring the fixed effects), whereas the dimension of V matrix is still n × n. We compared the computational efficiency of GCTA and ASREML. When the sample size is small, e.g., n < 3000, both GCTA and ASREML take a few minutes to run. When the sample size is large, e.g., n > 10,000, especially when fitting multiple GRMs, it takes days for ASREML to finish the analysis, whereas GCTA needs only a few hours.

System Requirements

We have released executable versions of GCTA for the threemajor operating systems: MS Windows, Linux/Unix, andMac OS. We have also released the source codes so that users can compile them for some specific platforms. GCTA requires a large amount of memory when calculating the GRM or performing an REML analysis with multiple genetic components. For example, it requires ∼4.8 GB memory to calculate the GRM for a data set with 3925 individuals genotyped by 294,831 SNPs, and it takes ∼4 CPU hours (AMD Opteron 2.8 GHz) to finish the computation. We therefore recommend using the 64-bit version of GCTA for large memory support.

Nonadditive Genetic Variance

The analysis approach we have adapted is a logical extension of estimation methods based on pedigrees. It allows estimation of additive genetic variation that is captured by SNP arrays and is therefore informative with respect to the genetic architecture of complex traits. The estimate of variance captured by all of the SNPs obtained in GCTA is directly comparable to the heritability estimated from pedigree analysis in family and twin studies, as well as the variance explained by GWAS hits, so that missing and hiding heritability can be quantified.5 Other sources of genetic variations such as dominance, gene-gene interaction, and gene-environment interaction are also important for complex trait variation but are less relevant to the “missing heritability” problem if the total heritability refers to the narrow-sense heritability, i.e., the proportion of phenotypic variance due to additive genetic variance. The current version of GCTA only provides functions to estimate and partition the variances of additive and additive-environment interaction effects. It is technically feasible to extend the analysis to include dominance and/or gene-gene interaction effects in the future. However, the power to detect the high-order genetic variation will be limited, i.e., the sampling variance of estimated variance components will be very large. Future developments will also include options to do multivariate analyses, to read genotype or imputed probability data in different formats, and to implement other applications of whole-genome or chromosome segment approaches.

In summary, we have developed a versatile tool to estimate genetic relationships from genome-wide SNPs that can subsequently be used to estimate variance explained by SNPs via a mixed model approach. We provide flexible options to specify different genetic models to partition genetic variance onto each of the chromosomes. We developed methods to estimate genetic relationships from the SNPs on the X chromosome and to test the hypotheses of dosage compensation. GCTA is not limited to the analysis of data on human complex traits, but in this report we only use examples and specifications (e.g., the number of autosomes) for humans.

Acknowledgments

We thank Bruce Weir for discussions on the sampling variance of estimators of inbreeding coefficients. We thank Allan McRae and David Duffy for discussions and Anna Vinkhuyzen for software testing. We acknowledge funding from the Australian National Health and Medical Research Council (grants 389892 and 613672) and the Australian Research Council (grants DP0770096 and DP1093900).

Web Resources

The URLs for data presented herein are as follows:

References

1. Hindorff L.A., Sethupathy P., Junkins H.A., Ramos E.M., Mehta J.P., Collins F.S., Manolio T.A. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA. 2009;106:9362–9367. [PMC free article] [PubMed] [Google Scholar]

2. Manolio T.A., Collins F.S., Cox N.J., Goldstein D.B., Hindorff L.A., Hunter D.J., McCarthy M.I., Ramos E.M., Cardon L.R., Chakravarti A. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. [PMC free article] [PubMed] [Google Scholar]

3. Maher B. Personal genomes: The case of the missing heritability. Nature. 2008;456:18–21. [PubMed] [Google Scholar]

4. Eichler E.E., Flint J., Gibson G., Kong A., Leal S.M., Moore J.H., Nadeau J.H. Missing heritability and strategies for finding the underlying causes of complex disease. Nat. Rev. Genet. 2010;11:446–450. [PMC free article] [PubMed] [Google Scholar]

5. Yang J., Benyamin B., McEvoy B.P., Gordon S., Henders A.K., Nyholt D.R., Madden P.A., Heath A.C., Martin N.G., Montgomery G.W. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 2010;42:565–569. [PMC free article] [PubMed] [Google Scholar]

6. Gibson G. Hints of hidden heritability in GWAS. Nat. Genet. 2010;42:558–560. [PubMed] [Google Scholar]

7. Hayes B.J., Visscher P.M., Goddard M.E. Increased accuracy of artificial selection by using the realized relationship matrix. Genet. Res. 2009;91:47–60. [PubMed] [Google Scholar]

8. Strandén I., Garrick D.J. Technical note: Derivation of equivalent computing algorithms for genomic predictions and reliabilities of animal merit. J. Dairy Sci. 2009;92:2971–2975. [PubMed] [Google Scholar]

9. VanRaden P.M. Efficient methods to compute genomic predictions. J. Dairy Sci. 2008;91:4414–4423. [PubMed] [Google Scholar]

10. Patterson H.D., Thompson R. Recovery of inter-block information when block sizes are unequal. Biometrika. 1971;58:545–554. [Google Scholar]

11. Price A.L., Patterson N.J., Plenge R.M., Weinblatt M.E., Shadick N.A., Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 2006;38:904–909. [PubMed] [Google Scholar]

12. Falush D., Stephens M., Pritchard J.K. Inference of population structure using multilocus genotype data: Linked loci and correlated allele frequencies. Genetics. 2003;164:1567–1587. [PMC free article] [PubMed] [Google Scholar]

13. Gilmour A.R., Thompson R., Cullis B.R. Average information REML: An efficient algorithm for variance parameters estimation in linear mixed models. Biometrics. 1995;51:1440–1450. [Google Scholar]

14. Henderson C.R. Best linear unbiased estimation and prediction under a selection model. Biometrics. 1975;31:423–447. [PubMed] [Google Scholar]

15. Kruuk L.E. Estimating genetic parameters in natural populations using the “animal model” Philos. Trans. R. Soc. Lond. B Biol. Sci. 2004;359:873–890. [PMC free article] [PubMed] [Google Scholar]

16. Goddard M.E., Wray N.R., Verbyla K., Visscher P.M. Estimating effects and making predictions from genome-wide marker data. Stat. Sci. 2009;24:517–529. [Google Scholar]

17. de Los Campos G., Gianola D., Allison D.B. Predicting genetic predisposition in humans: The promise of whole-genome markers. Nat. Rev. Genet. 2010;11:880–886. [PubMed] [Google Scholar]

18. Purcell S.M., Wray N.R., Stone J.L., Visscher P.M., O'Donovan M.C., Sullivan P.F., Sklar P., International Schizophrenia Consortium Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;460:748–752. [PMC free article] [PubMed] [Google Scholar]

19. Lango Allen H., Estrada K., Lettre G., Berndt S.I., Weedon M.N., Rivadeneira F., Willer C.J., Jackson A.U., Vedantam S., Raychaudhuri S. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature. 2010;467:832–838. [PMC free article] [PubMed] [Google Scholar]

20. Kent J.W., Jr., Dyer T.D., Blangero J. Estimating the additive genetic effect of the X chromosome. Genet. Epidemiol. 2005;29:377–388. [PMC free article] [PubMed] [Google Scholar]

21. Lynch M., Walsh B. Sinauer Associates; Sunderland, MA: 1998. Genetics and Analysis of Quantitative Traits. [Google Scholar]

22. Falconer D.S. The inheritance of liability to certain diseases, estimated from the incidence among relatives. Ann. Hum. Genet. 1965;29:51–76. [Google Scholar]

23. Dempster E.R., Lerner I.M. Heritability of threshold characters. Genetics. 1950;35:212–236. [PMC free article] [PubMed] [Google Scholar]

24. Robertson A., Lerner I.M. The heritability of all-or-none traits; viability of poultry. Genetics. 1949;34:395–411. [PMC free article] [PubMed] [Google Scholar]

25. Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A., Bender D., Maller J., Sklar P., de Bakker P.I., Daly M.J., Sham P.C. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. [PMC free article] [PubMed] [Google Scholar]

26. Campbell C.D., Ogburn E.L., Lunetta K.L., Lyon H.N., Freedman M.L., Groop L.C., Altshuler D., Ardlie K.G., Hirschhorn J.N. Demonstrating stratification in a European American population. Nat. Genet. 2005;37:868–872. [PubMed] [Google Scholar]

27. Cardon L.R., Palmer L.J. Population stratification and spurious allelic association. Lancet. 2003;361:598–604. [PubMed] [Google Scholar]

28. Hudson R.R. Gene genealogies and the coalescent process. Oxford Surveys in Evolutionary Biology. 1990;7:1–44. [Google Scholar]

29. Liang L., Zöllner S., Abecasis G.R. GENOME: Arapid coalescent-based whole genome simulator. Bioinformatics. 2007;23:1565–1567. [PubMed] [Google Scholar]

30. Hoggart C.J., Chadeau-Hyam M., Clark T.G., Lampariello R., Whittaker J.C., De Iorio M., Balding D.J. Sequence-level population simulations over large genomic regions. Genetics. 2007;177:1725–1731. [PMC free article] [PubMed] [Google Scholar]

31. Spencer C.C., Su Z., Donnelly P., Marchini J. Designing genome-wide association studies: Sample size, power, imputation, and the choice of genotyping chip. PLoS Genet. 2009;5:e1000477. [PMC free article] [PubMed] [Google Scholar]

32. Li Y., Abecasis G.R. Mach 1.0: Rapid Haplotype Reconstruction and Missing Genotype Inference. Am. J. Hum. Genet. 2006;S79:2290. [Google Scholar]

33. Aulchenko Y.S., Ripke S., Isaacs A., van Duijn C.M. GenABEL: An R library for genome-wide association analysis. Bioinformatics. 2007;23:1294–1296. [PubMed] [Google Scholar]

34. Wellcome Trust Case Control Consortium Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678. [PMC free article] [PubMed] [Google Scholar]

35. Gilmour A.R., Gogel B.J., Cullis B.R., Thompson R. VSN International; Hemel Hempstead, UK: 2006. ASReml User Guide Release 2.0. [Google Scholar]

Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

GCTA: A Tool for Genome-wide Complex Trait Analysis (2024)
Top Articles
Latest Posts
Article information

Author: Velia Krajcik

Last Updated:

Views: 6053

Rating: 4.3 / 5 (54 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Velia Krajcik

Birthday: 1996-07-27

Address: 520 Balistreri Mount, South Armand, OR 60528

Phone: +466880739437

Job: Future Retail Associate

Hobby: Polo, Scouting, Worldbuilding, Cosplaying, Photography, Rowing, Nordic skating

Introduction: My name is Velia Krajcik, I am a handsome, clean, lucky, gleaming, magnificent, proud, glorious person who loves writing and wants to share my knowledge and understanding with you.