Detailed reference list to SPECIAL 1. Hoffman, G.E. Correcting for

Detailed reference list to SPECIAL
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
Hoffman, G.E. Correcting for population structure and kinship using the linear mixed model:
theory and extensions. PLoS One 8, e75707 (2013).
Pritchard, J.K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus
genotype data. Genetics 155, 945-959 (2000).
Menozzi, P., Piazza, A. & Cavalli-Sforza, L. Synthetic maps of human gene frequencies in
Europeans. Science 201, 786-792 (1978).
Ma, J. & Amos, C.I. Theoretical formulation of principal components analysis to detect and
correct for population stratification. PLoS One 5(2010).
Corander, J., Marttinen, P., Siren, J. & Tang, J. Enhanced Bayesian modelling in BAPS software
for learning genetic structures of populations. BMC Bioinformatics 9, 539 (2008).
Intarapanich, A., et al. Iterative pruning PCA improves resolution of highly structured
populations. BMC Bioinformatics 10, 382 (2009).
Novembre, J. & Peter, B.M. Recent advances in the study of fine-scale population structure in
humans. Curr Opin Genet Dev 41, 98-105 (2016).
Fugel, H.J., Nuijten, M., Postma, M. & Redekop, K. Economic Evaluation in Stratified
Medicine: Methodological Issues and Challenges. Front Pharmacol 7, 113 (2016).
Fraser, H.B., Lam, L.L., Neumann, S.M. & Kobor, M.S. Population-specificity of human DNA
methylation. Genome Biol 13, R8 (2012).
Martin, A.R., et al. Transcriptome sequencing from diverse human populations reveals
differentiated regulatory architecture. PLoS Genet 10, e1004549 (2014).
Zhernakova, A., et al. Population-based metagenomics analysis reveals markers for gut
microbiome composition and diversity. Science 352, 565-569 (2016).
Cleynen, I., et al. Molecular reclassification of Crohn's disease by cluster analysis of genetic
variants. PLoS One 5, e12952 (2010).
Satten, G.A., Flanders, W.D. & Yang, Q. Accounting for unmeasured population substructure
in case-control studies of genetic association using a novel latent-class model. Am J Hum
Genet 68, 466-477 (2001).
Bailey, P., et al. Genomic analyses identify molecular subtypes of pancreatic cancer. Nature
531, 47-52 (2016).
Wang, B., et al. Similarity network fusion for aggregating data types on a genomic scale. Nat
Methods 11, 333-337 (2014).
Hoadley, K.A., et al. Multiplatform analysis of 12 cancer types reveals molecular classification
within and across tissues of origin. Cell 158, 929-944 (2014).
Chen, R., et al. Personal omics profiling reveals dynamic molecular and medical phenotypes.
Cell 148, 1293-1307 (2012).
Maus, B., et al. Molecular reclassification of Crohn's disease: a cautionary note on population
stratification. PLoS One 8, e77720 (2013).
Alwan, H., et al. Epidemiology of masked and white-coat hypertension: the family-based
SKIPOGH study. PLoS One 9, e92522 (2014).
van der Ende, M.Y., et al. The LifeLines Cohort Study: Prevalence and treatment of
cardiovascular disease and risk factors. Int J Cardiol 228, 495-500 (2017).
Kuznetsova, T., et al. Quality control of the blood pressure phenotype in the European
Project on Genes in Hypertension. Blood Press Monit 7, 215-224 (2002).
Tigchelaar, E.F., et al. Cohort profile: LifeLines DEEP, a prospective, general population cohort
study in the northern Netherlands: study design and baseline characteristics. BMJ Open 5,
e006772 (2015).
Chaichoompu, K., et al. IPCAPS: an R package for iterative pruning to capture population
structure (http://www.montefiore.ulg.ac.be/~chaichoompu/download/thesis_papers/).
(2017).
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
Chaichoompu, K., et al. Determining fine population structure using iterative pruning
(http://www.montefiore.ulg.ac.be/~chaichoompu/download/icg2017/poster_Kridsadakorn_I
CG2017_08032017.pdf). (2017).
Limpiti, T., et al. Study of large and highly stratified population datasets by combining
iterative pruning principal component analysis and structure. BMC Bioinformatics 12, 255
(2011).
Lebret, R., et al. Rmixmod: The R Package of the Model-Based Unsupervised, Supervised and
Semi-Supervised Classification Mixmod Library. Journal of Statistical Software 67, 241-270
(2015).
Price, A.L., et al. Principal components analysis corrects for stratification in genome-wide
association studies. Nature genetics 38, 904-909 (2006).
Lawson, D.J., Hellenthal, G., Myers, S. & Falush, D. Inference of population structure using
dense haplotype data. PLoS Genet 8, e1002453 (2012).
Stephens, M. & Scheet, P. Accounting for decay of linkage disequilibrium in haplotype
inference and missing-data imputation. Am J Hum Genet 76, 449-462 (2005).
Stephens, M., Smith, N.J. & Donnelly, P. A new statistical method for haplotype
reconstruction from population data. Am J Hum Genet 68, 978-989 (2001).
Scheet, P. & Stephens, M. A fast and flexible statistical model for large-scale population
genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum
Genet 78, 629-644 (2006).
Browning, S.R. & Browning, B.L. Rapid and accurate haplotype phasing and missing-data
inference for whole-genome association studies by use of localized haplotype clustering. Am
J Hum Genet 81, 1084-1097 (2007).
Howie, B.N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method
for the next generation of genome-wide association studies. PLoS Genet 5, e1000529 (2009).
Durbin, R. Efficient haplotype matching and storage using the positional Burrows-Wheeler
transform (PBWT). Bioinformatics 30, 1266-1272 (2014).
Lunter, G. Fast haplotype matching in very large cohorts using the Li and Stephens model. .
bioRxiv (2016).
Linting, M. & van der Kooij, A. Nonlinear principal components analysis with CATPCA: a
tutorial. J Pers Assess 94, 12-25 (2012).
Alanis-Lobato, G., Cannistraci, C.V., Eriksson, A., Manica, A. & Ravasi, T. Highlighting
nonlinear patterns in population genetics datasets. Sci Rep 5, 8140 (2015).
Krzanowski, W.J. Distance Between Populations Using Mixed Continuous and Categorical
Variables. Biometrika 70(1983).
Popescu, A.A., Harper, A.L., Trick, M., Bancroft, I. & Huber, K.T. A novel and fast approach for
population structure inference using kernel-PCA and optimization. Genetics 198, 1421-1431
(2014).
Chen, G. Deep Learning with Nonparametric Clustering. arXiv:1501.03084 (2015).
Popescu, A.A. & Huber, K.T. PSIKO2: a fast and versatile tool to infer population stratification
on various levels in GWAS. Bioinformatics 31, 3552-3554 (2015).
Frichot, E., Mathieu, F., Trouillon, T., Bouchard, G. & François, O. Fast inference of admixture
coefficients using sparse non-negative matrix factorization algorithms. Genetics 196, 973–
983 (2014).
Sheehan, S. & Song, Y.S. Deep Learning for Population Genetic Inference. PLoS Comput Biol
12, e1004845 (2016).
Karatzoglou A., S.A., Hornik K., Zeileis A. kernlab - An S4 Package for Kernel Methods R.
Journal of Statistical Software 11, 1-20 (2004).
Langfelder, P., Zhang, B. & Horvath, S. Defining clusters from a hierarchical cluster tree: the
Dynamic Tree Cut package for R. Bioinformatics 24, 719-720 (2008).
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
Fouladi, R., Bessonov, K., Van Lishout, F. & Van Steen, K. Model-Based Multifactor
Dimensionality Reduction for Rare Variant Association Analysis. Hum Hered 79, 157-167
(2015).
Kondor, R.I. & Lafferty, J. Diffusion kernels on graphs and other discrete structures. in Proc.
Int'l Conf. on Machine Learning (ICML) (2002).
Ding, K., et al. The effect of haplotype-block definitions on inference of haplotype-block
structure and htSNPs selection. Mol Biol Evol 22, 148-159 (2005).
Van Lishout, F., Gadaleta, F., Moore, J.H., Wehenkel, L. & Van Steen, K. gammaMAXT: a fast
multiple-testing correction algorithm. BioData Min 8, 36 (2015).
Cai, Q. & Chan, H.P. A Double Application of the Benjamini-Hochberg Procedure for Testing
Batched Hypotheses. Methodol Comput Appl Probab (2016).
Anderson, M.J., Ellingsen, K.E. & McArdle, B.H. Multivariate dispersion as a measure of beta
diversity. Ecol Lett. 9, 683-693 (2006).
Pouladi, N., Cowper-Sallari, R. & Moore, J.H. Combining functional genomics strategies
identifies modular heterogeneity of breast cancer intrinsic subtypes. BioData Min 7, 27
(2014).
Anderson, M.J. Distance-based tests for homogeneity of multivariate dispersions. Biometrics
62, 245-253 (2006).
Weiss, S., et al. Correlation detection strategies in microbial data sets vary widely in
sensitivity and precision. ISME J 10, 1669-1681 (2016).
Boulesteix, A.-L., De Bin, R., Jiang, X. & Fuchs, M. IPF-LASSO: integrative L1-penalized
regression with penalty factors for prediction based on multi-omics data. (Department of
Statistics, University of Munich, http://www.stat.uni-muenchen.de, 2015).
Song, L., Langfelder, P. & Horvath, S. Random generalized linear model: a highly accurate and
interpretable ensemble predictor. BMC Bioinformatics 14, 5 (2013).
Nalpathamkalam, T., Derkach, A., Paterson, A.D. & Merico, D. Genetic Analysis Workshop 18
single-nucleotide variant prioritization based on protein impact, sequence conservation, and
gene annotation. BMC Proc. 8(Suppl 1)(2014).
Li, J., et al. eSNPO: An eQTL-based SNP Ontology and SNP functional enrichment analysis
platform. Sci Rep 6, 30595 (2016).
Eden, E., Navon, R., Steinfeld, I., Lipson, D. & Yakhini, Z. GOrilla: a tool for discovery and
visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics 10, 48 (2009).
Holm, K., et al. An integrated genomics analysis of epigenetic subtypes in human breast
tumors links DNA methylation patterns to chromatin states in normal mammary cells. Breast
Cancer Res 18, 27 (2016).
Van Steen, K. & Malats, N. Perspectives on Data Integration in Human Complex Disease
Analysis. in Big Data Analytics in Bioinformatics and Healthcare (eds. Wang, B., Li, R. &
Perrizo, W.) 284-322 (IGI Global, 2014).
Zhang, Y. & Li, T. Consensus Clustering + Meta Clustering = Multiple Consensus Clustering. in
Twenty-Fourth International Florida Artificial Intelligence Research Society Conference (2011).
Steinley, D. Properties of the Hubert-Arabie adjusted Rand index. Psychol Methods 9, 386396 (2004).
Langfelder, P., Luo, R., Oldham, M.C. & Horvath, S. Is my network module preserved and
reproducible? PLoS Comput Biol 7, e1001057 (2011).