GPU-accelerated generalized linear mixed models for biobank-scale association studies

Date:

  • Time : 10:30 - 12:00 (GMT+9, Time zone in South Korea)
  • Venue : 27-220, Seoul National University
  • Speaker : 김영대 (UNIST)

Generalized linear mixed models are the statistical foundation of rigorous biobank-scale genetic association studies, enabling null model fitting that accounts for sample relatedness and population stratification across binary and quantitative traits. While their computational cost is substantial for a single genome-wide association study, phenome-wide analysis amplifies this burden further by requiring independent null model fits across thousands of phenotypes, making each individual model solve as fast as possible essential for scientific discovery at scale. We present a GPU-accelerated framework targeting the dominant computational bottlenecks across the full genome-wide association pipeline: streaming GPU kernels for packed genomic preprocessing, block conjugate gradient for stochastic trace estimation exploiting shared matrix structure, and blocked GPU association testing with hybrid CPU-GPU routing that preserves full statistical validity. Our contributions yield more than 10x end-to-end speedup on the Million Veteran Program dataset, one of the largest biobank datasets available, with portability validated across multiple GPU architectures.