Challenges and Opportunities for Genome-Wide Association Studies

Written by Jason Shen on 5/28/2009 for Bio 209, a grad level course at Stanford University


Ever since genes were discovered as the building blocks to life, scientists have tried to understand the role genetics play in human health. In recent years the concept of "personalized medicine" has gone mainstream with companies like 23andMe providing genetic information about patients and how those relate to their future health. Much of their data relies on Genome-wide association studies (GWAS), which look at genetic data from thousands of individuals to find associations with various genetic diseases and traits. GWAS have gained significant visibility since 2005, when its use in the scientific community became widespread. This paper seeks to elaborate on what GWAS hoped to achieve, what they have provided the fields of genetics and medicine, and what challenges and opportunities lay ahead for it.


Beginning in the 1980's geneticists were harnessing the natural variation on human population to identify genetic markers for many rare Mendelian diseases/ Cystic Fibrosis. Huntington's Disease. These and many others were found to have a single mutation or set of mutations that led to the disease outbreak. But many common disease do not have such simple genetic causes or indicators.

As early as 1996, it was suggested that sequencing many individuals and mining that data for correlations could provide the key to understanding these complex, common diseases (1). Even before the Human Genome Project was completed in 2003, work began on the HaMap Project to grasp the scope of human variation across Nigerians, Japanese, Chinese, and European Americans. The result were 10 million single nucleotide polymorphisms (SNPs), which were commonly occurring variations in the human genome, and could be clustered into 300,000 genetic markers or loci (2). The Common-variant hypothesis suggested that most of the genetic influences for a disease would be found in these 300k markers.

Small scale GWAS began in 2005 and the first major GWAS was published in Nature on the search for Type 2 Diabetes variants. By 2009, data from over 100 GWAS has been published, yielding more than 250 loci of significance for various genetic traits and diseases (3). In the following paragraphs, we'll take a look a look at three different GWAS looking for loci associated with Type 2 diabetes, Psoriasis, or warfarin sensitivity.

Influential Papers

Type 2 Diabetes – In 2007, Scott et. al did a GWAS on Type 2 Diabetes  and found one variant in an intergenic region of  chromosome 11p12 and confirmed variants near 6 other genes (including the first, TCF7L2), were associated with type 2 diabetes (4). They examined 1160 Finnish Type 2 Diabetes and 1171 non-cases, looking at over 315k SNPs generated from the HapMap plus an additional 2M autosomal SNPs.

As with most GWAS, individual genetic data was collected for two stages of testing. Stage 1 data were scrubbed for quality control, and then analyzed for genetic association with Type 2 Diabetes. There were a number of SNPs that had statistically significant correlation (p value less than 5 X 10^-8) with Type 2 Diabetes, even controlling for sex, age category, birthplace, BMI, waist, and systolic blood pressure. Those SNPs were then tested in Stage 2 data to, in effect, back-test the variants against new data - and found an association with Type 2 Diabetes. examined in more detail to identify more exact locations of the genes targeted.

The authors point out that these results are significant because we now  had confirmed a total of 10 separate loci that are associated with Type 2 Diabetes, a disease once called "the geneticist's nightmare”. The authors were confident that the genes identified thus far would lead to further understanding of the pathways that cause Type 2 Diabetes.

Psoriasis – Psoriasis is a disease that causes red scaly patches on skin and is much less common than Type 2 Diabetes. A paper published by Michelle Cargill and others in 2006 looked 1446 Americans with Psoriasis and 1432 controls, scanning roughly 250k SNPs, looking for gene/disease associations. (5) After following essentially the same steps as Scott did in Type 2 Diabetes (two stages, data scrubbing, statistical analysis, etc), three genes were found associated with psoriasis – IL12B, IL23R and IL38. These results indicated that the IL-23 pathway might be an appropriate target for intervention and combined with other results on inflammatory bowel disease, suggest “common genetic variants that contribute to general immune dysregulation” (5).

Warfarin Sensitivity – Warfarin is a drug used to combat stroke and other circulatory issues and GWAS were designed to help researchers understand the genetic basis behind the variability in dose requirement. Caucasians taking warfarin need vary dose sizes 20x in range. One paper looked at 1,053 Swedish patients and found that SNPs near VKORC1, CYP2C9, and CYP4F2 contributed to a total of 42% of the variability in warfarin sensitivity (6). The first two genes have been known markers of sensitivity, but the last gene was a new discovery. This paper indicated studies using genetic testing to provide better warfarin dose prescriptions would be a useful policy and also that GWAS can provide valuable information about patient drug interactions. Those interactions may well turn out to be the most important things to come out of GWAS – and may relieve some of the disappointment surrounding these studies.

Current Realities

The goal of GWAS was two-fold: 1) Illuminate genetic markers that would lead to disease and trait mechanisms and 2) Identify a significant portion of the basis for disease and trait variability and aid with disease prediction and prevention. Many researchers believe that while GWAS has been a great technical achievement, it has failed to live up to these promises.

David Goldstein finds it troubling that “most common gene variants that are implicated by such studies are responsible for only a small fraction of the genetic variation that we know exists” (7) He points out that the top seven genetic markers for Type 2 Diabetes only accounts for a fraction of the total variability. A sibling with TCF7L2 gene, the strongest known indicator, is associated with a relative risk of just 1.02, while siblings of a Type 2 Diabetic in general are 3x more likely to have the condition.

Similar lack of variability explanation exists for height: the top 20 genetic markers for height account for less than 3% of the genetic variability (8) – an extrapolation of the data would suggest that 93,000 common variant SNPs are required to explain 80% of the population variation. While GWAS has certainly risen to the occasion in terms of producing comprehensive, reproducible and affordable results for common variant genetic markers, more must be done to reach the goal of identifying the major indicators disease risk and understanding disease mechanism.

Future Opportunities

More Common Variants – The most obvious solution to the GWAS’s challenges is simply doing more studies with a larger number of SNPs. The 1000 Genome project seeks to do a thousand full genome sequencing and provide all the SNPs that are found in 1% or more in the human genome (9). This information, combined with the constant reduction in prices for SNP scans means future GWAS studies can be even more comprehensive and lead to further genetic marker discoveries.

Exogenous DNA - Another challenge that GWAS faces is in understanding the disease associated genetic markers that map to non-coding DNA. As Hardy and Singleton point out, this could mean that a lot of genetic variability lies in the impact on transcription or translation, rather than in the actual proteins produced (10). If this line of research is pursued, it could lead to great advances in understanding in disease mechanism and in genetics more broadly.

Rare Variants – A key to identifying more the risk factors for disease is searching for rare (as opposed to common) variants. Goldstein points out that most the significant common variants linked to various traits and disease have already been identified. Depending on how rare the variants are, we may have to wait for large-scale sequencing efforts to provide the data and sophisticated analytic tools to be developed that can interpret this data before we can really search for the rarer variants associated with genetic traits and disease.

Environment & Epigenetics – Clearly, genetics are just part of the picture when it comes to understanding human traits and disease. All phenotypes are an interaction with genes and the environment (which can be taken to mean things external to the genes like methylation and the histone code as well as broader environment of things external to the organism like temperature and stimuli). We must develop better tools to measure the effects of the outside environment as well as the epigenetic information of our patients when conducting association studies.

Pharmacogenetics – As indicated in the warfarin study (and many others), perhaps one of the greatest things that will come out of all the GWAS research is a better personalization of medicine for patients. People can have markedly different reactions to drugs and identifying when a particular compound has the opportunity to do significant good or harm would very value in saving money and saving lives.


            The amount of knowledge generated by genome-wide association studies has been staggering in quantity and pace. Only four years after the first GWAS, we have identified hundreds of new loci that could lead to insights, diagnostics and therapies. Like much scientific research, GWAS has provided more questions than answers. We still need to understand why common variants play such a small role in the variability of genetic diseases and traits. We must invest in developing technology to do more indepth and comprehensive reading of the genetic and epigenetic information contained within our cells. If we do, the coming years will exciting as we create new and refined methods of studying, understanding and protecting human life.



1) Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science 1996;273:1516-1517
2) National Human Genome Research Institute. "2006 Release: About Whole Genome Association Studies" National Institutes of Health. Accessed May 14th, 2009
3) Altshuler D, Daly MJ, Lander ES. Genetic mapping in human disease. Science 2008;322:881-888
4) Scott L.J., Mohlke K.L., Bonnycastle L.L., Willer C.J., Li Y., Duren W.L., Erdos M.R., Stringham H.M., Chines P.S., Jackson A.U., et al. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science (2007) 316:1341–1345

5) Cargill M, Schrodi SJ, Chang M, Garcia VE, Brandon R, Callis KP et al. (2007) A large-scale genetic association study confirms IL12B and leads to the identification of IL23R as psoriasis-risk genes. Am J Hum Genet 80:273–290

6) Takeuchi F, McGinnis R, Bourgeois S, Barnes C, Eriksson N, et al. 2009 A Genome-Wide Association Study Confirms VKORC1, CYP2C9, and CYP4F2 as Principal Genetic Determinants of Warfarin Dose. PLoS Genet 5(3): e1000433. doi:10.1371/journal.pgen.1000433

7) Goldstein DB. Common genetic variation and human traits. N Engl J Med 2009;360:1696-1698.

8) Weedon MN, Lango H, Lindgren CM, et al. Genome-wide association analysis identifies 20 loci that influence adult height. Nat Genet 2008;40:575-583

9) “Project Overview” The 1000 Genomes Project Accessed May 14th, 2009

10) Hardy, J., Singleton, A. (2009). Genomewide Association Studies and Human Disease. N Engl J Med 360: 1759-1768