Midwest Bioinformatics Showcase

Connecting Researchers Across the Midwest

Evaluating Ancestry Adjustment in Multi-Ancestry Epigenome-Wide Analysis

Yueming Liu

Yueming Liu, PhD Candidate

Division of Biostatistics, Northwestern University

11:00 AM Eastern Time, April 24, 2026

Add to Calendar

DNA methylation Epigenome-wide association study (EWAS) Ancestry Epigenetics EPISTRUCTURE

Abstract

Yueming Liu1, Alan Kuang1, Marie-France Hivert2,3,4, William L Lowe Jr5, Jami L Josefson6,7, Denise M Scholtens1

1. Division of Biostatistics and Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine; 2. Department of Medicine, Massachusetts General Hospital; 3. Department of Population Medicine, Harvard Pilgrim Health Care Institute, Harvard Medical School; 4. Department of Medicine, Faculty of Medicine and Health Sciences, Université de Sherbrooke; 5. Department of Medicine, Northwestern University Feinberg School of Medicine; 6. Division of Endocrinology, Department of Pediatrics, Ann and Robert H. Lurie Children's Hospital of Chicago; 7. Department of Pediatrics, Northwestern University Feinberg School of Medicine

Background: Proper adjustment for population substructure is essential in epigenome-wide association studies (EWAS), particularly in cohorts with diverse ancestries. EPISTRUCTURE offers a genotype-free approach to ancestry inference, originally developed using a European reference population from the Cooperative Health Research in the Region of Augsburg (KORA) study. However, its effectiveness in genetically diverse, multi-ancestry cohorts remains insufficiently evaluated. Methods: For EWAS using cord-blood samples from the Hyperglycemia and Adverse Pregnancy Outcome (HAPO) study, we systematically assessed the ancestry adjustment performance of EPISTRUCTURE principal components (PCs) derived from the widely used KORA-based reference set versus new reference sets generated from genotyping data of the multi-ancestry HAPO cohort. HAPO-based reference sets were defined by varying SNP–CpG $R^2$ thresholds (e.g., RS30: $R^2 > 0.3$) to identify ancestry-informative CpGs. We applied these reference sets for population substructure adjustment in EWAS of three newborn adiposity traits: birthweight, cord C-peptide, and sum of skinfolds, to evaluate their impact on association detection and biological interpretation. Results: Compared to the KORA reference, the HAPO RS30 reference consistently produced lower genomic inflation and identified more biologically relevant associations for birthweight and cord blood C-peptide in EWAS of HAPO cord blood samples (n = 3,116). Pathway enrichment analyses revealed strong immune and metabolic signals, including pathways uniquely captured by EWAS when using the HAPO-derived reference for ancestry adjustment. Trait enrichment using the EWAS Catalog further confirmed associations with fetal growth, maternal metabolic traits, and glucose regulation. Conclusion: Our findings demonstrate that reference sets derived from multi-ancestry cohorts like HAPO better capture underlying population substructure and improve ancestry adjustment in diverse EWAS settings.


Bio coming soon.