Using prior information from the medical literature in GWAS of oral cancer identifies novel susceptibility variant on chromosome 4: the AdAPT method

Mattias Johansson, Angus Roberts, Dan Chen, Yaoyong Li, Manon Delahaye-Sourdeix, Niraj Aswani, Mark A Greenwood, Simone Benhamou, Pagona Lagiou, Ivana Holcátová, Lorenzo Richiardi, Kristina Kjaerheim, Antonio Agudo, Xavier Castellsagué, Tatiana MacFarlane, Luigi Barzan, Cristina Canova, Nalin S Thakker, David I Conway, Ariana ZnaorClaire M Healy, Wolfgang Ahrens, David Zaridze, Neonilia Szeszenia-Dabrowska, Jolanta Lissowska, Eleonóra Fabiánová, Ioan Nicolae Mates, Vladimir Bencko, Lenka Foretova, Vladimir Janout, Maria Paula Curado, Sergio Koifman, Ana Menezes, Victor Wünsch-Filho, Jose Eluf-Neto, Paolo Boffetta, Silvia Franceschi, Rolando Herrero, Leticia Fernandez Garrote, Renato Talamini, Stefania Boccia, Pilar Galan, Lars Vatten, Peter Thomson, Diana Zelenika, Mark Lathrop, Graham Byrnes, Hamish Cunningham, Paul Brennan, Jon Wakefield, James D McKay

Research output: Contribution to journalArticlepeer-review

15 Citations (Scopus)
10 Downloads (Pure)


BACKGROUND: Genome-wide association studies (GWAS) require large sample sizes to obtain adequate statistical power, but it may be possible to increase the power by incorporating complementary data. In this study we investigated the feasibility of automatically retrieving information from the medical literature and leveraging this information in GWAS.

METHODS: We developed a method that searches through PubMed abstracts for pre-assigned keywords and key concepts, and uses this information to assign prior probabilities of association for each single nucleotide polymorphism (SNP) with the phenotype of interest--the Adjusting Association Priors with Text (AdAPT) method. Association results from a GWAS can subsequently be ranked in the context of these priors using the Bayes False Discovery Probability (BFDP) framework. We initially tested AdAPT by comparing rankings of known susceptibility alleles in a previous lung cancer GWAS, and subsequently applied it in a two-phase GWAS of oral cancer.

RESULTS: Known lung cancer susceptibility SNPs were consistently ranked higher by AdAPT BFDPs than by p-values. In the oral cancer GWAS, we sought to replicate the top five SNPs as ranked by AdAPT BFDPs, of which rs991316, located in the ADH gene region of 4q23, displayed a statistically significant association with oral cancer risk in the replication phase (per-rare-allele log additive p-value [p(trend)] = 2.5×10(-3)). The combined OR for having one additional rare allele was 0.83 (95% CI: 0.76-0.90), and this association was independent of previously identified susceptibility SNPs that are associated with overall UADT cancer in this gene region. We also investigated if rs991316 was associated with other cancers of the upper aerodigestive tract (UADT), but no additional association signal was found.

CONCLUSION: This study highlights the potential utility of systematically incorporating prior knowledge from the medical literature in genome-wide analyses using the AdAPT methodology. AdAPT is available online (url:

Original languageEnglish
Article numbere36888
Number of pages10
JournalPloS ONE
Issue number5
Publication statusPublished - 25 May 2012


  • Bayes theorem
  • chromosomes, human, Pair 4
  • computational biology
  • genetic predisposition to disease
  • genome-wide association study
  • humans
  • internet
  • lung neoplasms
  • mouth neoplasms
  • polymorphism, single nucleotide
  • reproducibility of results


Dive into the research topics of 'Using prior information from the medical literature in GWAS of oral cancer identifies novel susceptibility variant on chromosome 4: the AdAPT method'. Together they form a unique fingerprint.

Cite this