Unsupervised Discovery of Ancestry Informative Markers and Genetic Admixture Proportions

Поделиться
HTML-код
  • Опубликовано: 22 авг 2024
  • Abstract: Admixture estimation is crucial in ancestry inference and genomewide association studies (GWAS). Computer programs such as ADMIXTURE and STRUCTURE are commonly employed to estimate the admixture proportions of sample individuals. However, these programs can be overwhelmed by the computational burdens imposed by the 10^5 to 10^6 samples and millions of markers commonly found in modern biobanks. An attractive strategy is to run these programs on a set of ancestry informative SNP markers (AIMs) that exhibit substantially different frequencies across populations. Unfortunately, existing methods for identifying AIMs require knowing ancestry labels for a subset of the sample. This supervised learning approach creates a chicken and the egg scenario. This talk presents an unsupervised, scalable framework that seamlessly carries out AIM selection and likelihood-based estimation of admixture proportions. The simulated and real data examples show that this approach is scalable to modern biobank data sets.
    About the speaker: Dr. Seyoon Ko is a Postdoctoral Scholar working with Dr. Ken Lange and Dr. Hua Zhou in the Department of Computational Medicine. Dr. Ko’s research interests include large-scale computational methods in biostatistics and bioinformatics using parallel and distributed computing. He earned a Ph.D. degree in Statistics from Seoul National University in South Korea, as well as a M.S. degree in Computational Sciences and a B.S. degree in Physics, Mathematical Sciences, and Computational Science.

Комментарии •