Hua Zhou







Feel free to email me if you have any comments.

D Kim, A Jensen, K Jones, S Raghavan, L Phillips, A Hung, Y Sun, G Li, P Reaven, H Zhou, and J Zhou. (2023) A platform for phenotyping disease progression and associated longitudinal risk factors in large-scale EHRs, with application to incident diabetes complications in the UK Biobank, JAMIA Open, 6(1):ooad006. [pdf]

D Kim, A Binder, H Zhou, and S Jung. (2023) DNA methylation patterns associated with breast cancer prognosis that are specific to tumor subtype and menopausal status, Frontier in Genetics, 14:1133443. [pdf]

X Zhou, Q Heng, E Chi, and Hua Zhou. (2024) Proximal MCMC for Bayesian inference of constrained and regularized estimation. American Statistician, in press.

M Xu, H Zhou, Y Hu, and L Duan. (2023) Bayesian inference using the proximal mapping: uncertainty quantification under varying dimensionality. Journal of American Statistical Association (Theory and Methods), in press. arXiv

Q Heng, H Zhou, and E Chi. (2023) Bayesian trend filtering via proximal Markov Chain Monte Carlo, Journal of Computational and Graphical Statistics, 32(3):938-949. [pdf]

B Chu, S Ko, JJ Zhou, A Jensen, H Zhou, JS Sinsheimer, and K Lange. (2023) Multivariate genome-wide association analysis by iterative hard thresholding, Bioinformatics, 39(4):btad193. [pdf]

S Ko, B Chu, D Peterson, C Okenwa, J Papp, D Alexander, E Sobel, H Zhou, and K Lange. (2023) Unsupervised discovery of ancestry informative markers and genetic admixture proportions in biobank-scale data sets, The American Journal of Human Genetics, 110:314-325. [pdf]

K Lange and H Zhou. (2022) A lagacy of EM algorithms. International Statistical Review, 92(S1): S52-S66. [pdf]

J Won, T Zhang, and H Zhou. (2022) Orthogonal trace-sum maximization: tightness of the semidefinite relaxation and guarantee of locally optimal solutions. SIAM Journal of Optimization 32(3):2180-2207. [pdf]

A Landeros, O Madrid Padilla, H Zhou, and K Lange. (2022) Extensions to the proximal distance method of constrained optimization, Journal of Machine Learning Research, 23(182):1-45. [pdf]

S Ko, C German, A Jensen, J Shen, A Wang, D Mehrotra, Y Sun, J Sinsheimer, H Zhou, and J Zhou. (2022) GWAS of longitudinal trajectories at biobank scale, The American Journal of Human Genetics, 109(3):433-445. [pdf]

J Kim, A Jensen, S Ko, S Raghavan, L Philips, A Hung, Y Sun, H Zhou, P Reaven, and J Zhou. (2022) Systematic heritability and heritability enrichment analysis for diabetes complications in UK Biobank and ACCORD studies, Diabetes, 71(5):1137-1148. [pdf]

X Zhou, J Zhou, and H Zhou. (2022) Bag of little bootstraps for massive and distributed longitudinal data, Statistical Analysis and Data Mining, 15(3):314-321. [pdf]

K Doubleday, J Zhou, H Zhou, and H Fu. (2022) Risk controlled decision trees and random forests for precision medicine, Statistics in Medicine, 41(4):719-735. [pdf]

S Li, N Li, H Wang, J Zhou, H Zhou, and G Li. (2022) Efficient algorithms and implementation of a semiparametric joint model for longitudinal and competing risks data with applications to massive biobank data, Computational and Mathematical Methods in Medicine, 2022:1362913. [pdf]

B Chu, E Sobel, R Wasiolek, S Ko, J Sinsheimer, H Zhou, and K Lange. (2021) A fast data-driven method for genotype imputation, phasing, and local ancestry inference: MendelImpute.jl, Bioinformatics, 37(24):4756-4763. [pdf]

S Ko, H Zhou, J Zhou, and J Won. (2022) High-performance statistical computing in the computing environments of the 2020s. Statistical Science, 37(4):494-518. [pdf]

K Lange, J Won, A Landeros, and H Zhou. (2021) Nonconvex optimization via MM algorithms: convergence theory. Wiley StatsRef: Statistics Reference Online. [pdf]

J Kim, J Shen, A Wang, D Mehrotra, S Ko, J Zhou, and H Zhou. (2021) VCSEL: Prioritizing SNP-set by penalized variance component selection, Annals of Applied Statistics, 15(4):1652-1672. [pdf]
Penalization methods for selecting variance components.

C German, J Sinsheimer, J Zhou, and H Zhou. (2021) WiSER: robust and scalable estimation and inference of within-subject variances from intensive longitudinal data, Biometrics, 78(4):1313-1327. [pdf]
Efficient method and software for studying the mean and within-subject variability of longitudinal measurements.

J Won, H Zhou, and K Lange. (2021) Orthogonal trace-sum maximization: applications, local algorithms, and global optimality. SIAM Journal on Matrix Analysis and Applications, 42(2):859-882. [pdf]
A simple certificate for checking when a local solution is global optimal for generalized canonical correlation analysis (CCA), generalized procrustes problem, and other problems.

W Yu, H Zhou, Y Choi, J Goldin, P Teng, W Wong, M McNitt-Gray, M Brown, and G Kim (2022). Multi-scale, domain knowledge-guided attention + random forest: a two-stage deep learning-based multi-scale guided attention models to diagnose idiopathic pulmonary fibrosis from computed tomography images, Medical Physics, 50(2):894-905. [pdf]

W Yu, H Zhou, J Goldin, W Wong, and G Kim. (2021) End-to-end domain knowledge assisted automatic diagnosis of idiopathic pulmonary fibrosis (IPF) using high resolution computed tomography (HRCT), Medical Physics, 48(5):2458–2467. [pdf]
Deep learning tool for diagnosing IPF from CT scans.

S Ji, C German, K Lange, J Sinsheimer, H Zhou, J Zhou, and E Sobel. (2021) Modern simulation utilities for genetic analysis, BMC Bioinformatics, 22:228. [pdf]

M Zhang, Y Liu, H Zhou, J Watkins, and J Zhou. (2021) A novel nonlinear dimension reduction approach to infer population structure for low-coverage sequencing data, BMC Bioinformatics, 22:348. [pdf]

G Nam, ZF Zhang, JY Rao, H Zhou, and SY Jung. (2021) Interactions between adiponectin-pathway polymorphisms and obesity on postmenopausal breast cancer risk among African American women: the WHI SHARe Study, Frontiers in Oncology, 11:698198. [pdf]

E Chi, B Gaines, W Sun, H Zhou, and J Yang. (2020) Provable convex co-clustering of tensors. Journal of Machine Learning Research, 21(214):1-58. [pdf]
Convex co-clustering of tensor data.

J Zhou, J Zhai, H Zhou, Y Chen, S Guerra, I Robey, G Weinstock, E Weinstock, Q Dong, K Knox, and H Twigg III. (2020) Supraglottic lung microbiome taxa are associated with pulmonary abnormalities in an HIV longitudinal cohort, American Journal of Respiratory and Critical Care Medicine, 202(12):1727-1731. [pdf]

I Ipsen, H Zhou. (2020) Probabilistic error analysis for inner products, SIAM Journal of Matrix Analysis, 41(4):1726-1741. [pdf]

J Day, H Zhou. (2020) OnlineStats.jl: A Julia package for statistics on data streams, Journal of Open Source Software, 5(46):1816. [pdf]
A highly efficient and extensible framework for performing statistical estimation and inference on streaming data.

K Keys, H Zhou, K Lange. (2019) Proximal distance algorithms: theory and practice, Journal of Machine Learning Research, 20(66):1-38. [pdf]
A class of algorithms for constrained optimization based on distance majorization.

C German, J Sinsheimer, Y Klimentidis, H Zhou, and J Zhou. (2020) Ordered multinomial regression for genetic association analysis of ordinal phenotypes at Biobank scale, Genetic Epidemiology, 44:248-260. [pdf]
GWAS tool for ordinal traits.

B Chu, K Keys, C German, H Zhou, J Sinsheimer, and K Lange. (2020) Iterative hard thresholding in GWAS: generalized linear models, prior weights, and double sparsity, GigaScience, 9(6):gia044. pdf

B vonHoldt, A DeCandia, E Heppenheimer, H Janowitz-Koch, R Shi, H Zhou, C German, K Brzeski, K Cassidy, D Stahler, and J Sinsheimer. (2020) Heritability of inter‐pack aggression in a wild pedigreed population of North American gray wolves, Molecular Ecology, 29(10):1764-1775. [pdf]

W Hu, W Shen, H Zhou, and D Kong. (2020) Matrix linear discriminant analysis, Technometrics, 62(2):196-205. [pdf]
Linear discriminant analysis for matrix-variate data.

H Zhou, L Hu, J Zhou, and K Lange. (2019) MM algorithms for variance components models. Journal of Computational and Graphical Statistics, 28(2):350-361. [pdf]
A new optimization algorithm for find MLE of variance components models.

H Zhou, J Sinsheimer, D Bates, B Chu, C German, S Ji, K Keys, J Kim, S Ko, G Mosher, J Papp, E Sobel, J Zhai, J Zhou, and K Lange. (2020) OpenMendel: a cooperative programming project for statistical genetics, Human Genetics, 139(1):61-71. [pdf]
Overview of the OpenMendel project.

K Doubleday, H Zhou, H Fu, and J Zhou. (2018) An algorithm for generating individualized treatment decision trees and random forests, Journal of Computational and Graphical Statistics, 27(4):849-860. [pdf]

B Gaines, J Kim, and H Zhou. (2018) Algorithms for fitting the constrained lasso, Journal of Computational and Graphical Statistics, 27(4):861-871. [pdf]

L Hu, W Lu, J Zhou, and H Zhou. (2018) MM algorithms for variance component estimation and selection in logistic linear mixed models, Statistica Sinica, 28:1585-1605. [pdf]

EJ Min, E Chi, and H Zhou. (2020) Tensor canonical correlation analysis. STAT, 8:e253. [pdf]
Canonical correlation analysis (CCA) with tensor data.

X Zhang, L Li, H Zhou, Y Zhou, D Shen, and ADNI. (2019) Tensor generalized estimating equations for longitudinal imaging analysis, Statistica Sinica, 29:1977-2005. [pdf]
GEE for tensor data.

X Li, D Xu, H Zhou, and L Li. (2018) Tucker tensor regression and neuroimaging analysis, Statistics in Biosciences, 10(3):520-545. [pdf]
Tucker version of the tensor regression.

J Zhai, J Kim, K Knox, H Twigg, H Zhou, and J Zhou. (2018) Variance component selection with applications to microbiome taxonomic data, Frontiers in Microbiology, 9:509. [pdf]
Variance component penalization method for microbiome taxa association study.

J Zhai, K Knox, H Twigg, H Zhou, and J Zhou. (2019) Exact variance component tests for longitudinal microbiome studies, Genetic Epidemiology, 43:250-262. [pdf]

J Kim, Y Zhang, J Day, and H Zhou. (2018) MGLM: an R package for multivariate categorical data analysis. The R Journal, 10(1):73-90. [pdf]

C Li and H Zhou. (2017) svt: Singular value thresholding in MATLAB, Journal of Statistical Software, 81(2):1-13. [pdf]

J Zhou, T Hu, D Qiao, M Cho, and H Zhou. (2016) Boosting gene mapping power and efficiency with efficient exact variance component tests of SNP sets, Genetics, 204(3):921-931. [pdf]
Exact score and likelihood ratio tests for testing SNP sets.

Y Zhang, H Zhou, J Zhou, and W Sun. (2017) Regression models for multivariate count data. Journal of Computational and Graphical Statistics, 26(1):1-13. [pdf]
Regression model using Dirichlet-multinomial, negative multinomial, and generalized Dirichlet-multinomial distributions.

H Zhou, J Blangero, T Dyer, K Chan, K Lange, and E Sobel. (2017) Fast genome-wide QTL association mapping on pedigree and population data, Genetic Epidemiology, 41(3):174-186. [pdf]
Method for Mendel software Option 29 (Pedigree GWAS).

B Zhang, H Zhou, L Wang, and C Sung. (2017) Classification based on neuroimaging data by tensor boosting, International Joint Conference on Neural Networks (IJCNN), 1174-1179. [pdf]

H Zhou, J Zhou, T Hu, E Sobel, and K Lange. (2016) Genome-wide QTL and eQTL analyses using Mendel, BMC Proceedings, 10(Suppl 7):10. [pdf]
QTL and eQTL analyses of the GAW19 data.

N Zhao, J Chen, IM Carroll, T Ringel-Kulka, MP Epstein, H Zhou, J Zhou, Y Ringel, HZ Li, and MC Wu. (2015) Testing in microbiome profiling studies with the microbiome regression-based kernel association test (MiRKAT). The American Journal of Human Genetics, 96(5):797-807. [pdf]

W Sun, Y Liu, JJ Crowley, TH Chen, H Zhou, H Chu, S Huang, PF Kuan, Y Li, D Miller, G Shaw, Y Wu, V Zhabotynsky, L McMillan, F Zou, PF Sullivan and FPM de Villena. (2015) IsoDOT detects differential RNA-isoform expression/usage with respect to a categorical or continuous covariate with high sensitivity and specificity, Journal of American Statistical Association, 110:975-986. [pdf]

H Zhou and K Lange. (2015) Path following in the exact penalty method of convex programming, Computational Optimization and Applications, 61(3):609-634. [pdf]

W Xiao, Y Wu and H Zhou. (2015) ConvexLAR: an extension of least angle regression, Journal of Computational and Graphical Statistics, 24(3):603–626. [pdf]
Least angle regression (LAR) with a general convex loss.

H Zhou and Y Wu. (2014) A generic path algorithm for regularized statistical estimation, Journal of American Statistical Association, 109(506):686-699. [pdf]
Path following for a convex loss plus a generalized lasso penalty.

E Chi, H Zhou and K Lange. (2014) Distance majorization and its applications, Mathematical Programming Series A, 146(1-2):409-436. [pdf]

K Lange, E Chi, and H Zhou. (2014) A brief survey of modern optimization for statisticians (with discussions by Atchade and Michailidis, Hunter, Robert, and rejoinder), International Statistical Review, 82(1):46-70. [pdf]

H Zhou and L Li. (2014) Regularized matrix regression, Journal of Royal Statistical Society Series B, 76(2):463-483. [pdf][software]
Soft-thresholding for regression with matrix covariates.

K Lange, JC Papp, JS Sinsheimer, R Sripracha, H Zhou, and E Sobel. (2013) Mendel: the Swiss army knife of genetic analysis programs, Bioinformatics, 29(12):1568-1570. [pdf][Mendel]
Summary of the new version of the comprehensive genetic analysis software Mendel.

E Chi, G Allen, H Zhou, O Kohannim, K Lange, and P Thompson. (2013) Imaging genetics via sparse canonical correlation analysis, Biomedical Imaging (ISBI), 2013 IEEE 10th International Symposium on, pp740-743. [pdf]
Sparse canonical correlation analysis (CCA) for the tensor data.

H Zhou, J Zhou, E Sobel, and K Lange. (2014) Fast genome-wide pedigree QTL analysis using Mendel, BMC Proceedings, 8(Suppl 1):S93. [pdf]
Multivariate QTL analysis of the GAW18 data.

H Zhou, L Li, and H Zhu. (2013) Tensor regression with applications in neuroimaging data analysis, Journal of American Statistical Association, 108(502):540-552. [pdf][software]
Traditional regression takes a vector of covariates. We consider regression that takes an array, aka tensor, of covariates, such as in neuroimaging studies.

E Chi, H Zhou, G Chen, D Ortega, and K Lange. (2013) Genotype imputation via matrix completion, Genome Research, 23:509-518. [pdf][Mendel Impute]
We successfully applied matrix completion method to the difficult genotype imputation problem. Similar imputation accuracy is achieved in order of magnitude less time than current methods.

K Lange and H Zhou. (2014) MM algorithms for geometric and signomial programming, Mathematical Programming Series A, 143(1-2):339-356. [pdf]
A simple algorithm for minimizing posynomials and signomials.

L Riley, H Zhou, K Lange, J Sinsheimer, and M Sehl. (2012) Determining duration of HER2-targeted therapy using stem cell extinction models, PLoS ONE, 7(12):e46613. [pdf]
Prediction of optimal duration of stem-cell target treatment based on a stochastic model.

H Zhu, L Li, and H Zhou. (2012) Nonlinear dimension reduction with Wright-Fisher kernel for genotype aggregation and association mapping, Bioinformatics, 28:i375–i381. [pdf]
Nonlinear dimension reduction using Markov chain kernels.

H Zhou and K Lange. (2013) A path algorithm for constrained estimation, Journal of Computational and Graphical Statistics, 22(2):261-283. [pdf]
Path following for a quadratic criterion plus a generalized lasso penalty.

H Zhou and Y Zhang. (2012) EM vs MM: a case study, Computational Statistics & Data Analysis, 56:3909–3920. [pdf]
A comparison of EM (expectation-maximization) and MM (minorization-maximization) algorithms in the case of MLE for the Dirichlet-Multinomial distribution.

M Sehl, H Zhou, J Sinsheimer, and K Lange. (2011) Extinction models for cancer stem cell therapy, Mathematical Biosciences, 234(2):132-146. [pdf]
A Markov process model for the stem cells in cancer patients under treatment. A mix of techniques, e.g., extreme value theory, fast Fourier transform, and orthogonal polynomial expansions, are used.

H Zhou, D Alexander, and K Lange. (2011) A quasi-Newton acceleration for high-dimensional optimization algorithms, Statistics and Computing, 21(2):261-273. [pdf]
A new quasi-Newton acceleration scheme particularly suitable for EM/MM algorithms on high dimensional problems.

H Zhou and K Lange. (2011) A fast procedure for calculating importance weights in bootstrap sampling, Computational Statistics & Data Analysis, 55(1):26-33. [pdf]
The use of importance sampling can dramatically reduce the variance of boostrap estimates. However nowadays the sample size is often huge, say at order of 10^6-10^9. Then the calculation of importance weights becomes nontrivial. Here we propose a fast procedure that scales well with sample size.

H Zhou, D Alexander, M Sehl, J Sinsheimer, E Sobel, and K Lange. (2011) Penalized regression for genome-wide association screening of sequence data, Pacific Symposium of Biocomputing, 16:106-117. [pdf][Mendel]
This companion paper introduces weights to calibrate the penalties and covers more details on the implementation of penalized regression in the statistical genetics analysis software Mendel.

H Zhou, M Sehl, J Sinsheimer, and K Lange. (2010) Association screening of common and rare genetic variants by penalized regression, Bioinformatics, 26(19):2357-2382. [pdf]
Application of penalized linear and logistic regressions to genome-wide association studies (GWAS). A mixture of lasso and group penalties are used to select causal rare variants present in sequence data.

H Zhou, K Lange, and M Suchard. (2010) Graphics processing units and high-dimensional optimization, Statistical Science, 25:311-324. [pdf]
The marriage of MM principle and modern graphical processing unit (GPU) technology gives a boost to the classical EM/MM type algorithms widely used in statistics.

H Zhou and K Lange. (2010) On the bumpy road to the dominant mode, Scandinavian Journal of Statistics, 37:612-631. [pdf]
We propose several variants of deterministic annealing for finding the dominant mode in maximum likelihood estimation with some classical statistical problems.

H Zhou and K Lange. (2010) MM algorithms for some discrete multivariate distributions, Journal of Computational and Graphical Statistics, 19(3):656-665. [pdf]
We designed MM algorithms (a generalization of the EM algorithm) for maximum likelihood estimation of some multivariate distributions frequently occurring in applications. Specific examples include Dirichlet-Multinomial, generalized Dirichlet-Multinomial, negative multinomial, a distribution due to Neerchal and Morel, the Ewens and Pitman sampling distributions, and the zero-truncated and zero-inflated version of these distributions.

M Sehl, J Sinsheimer, H Zhou, and K Lange. (2009) Differential destruction of stem cells: implications for targeted cancer stem cell therapy, Cancer Research, 69(24):9481-9489. [pdf]
Clinical implications of the cancer extinction model are summarized in this paper.

H Zhou and K Lange. (2009) Rating movies and rating the raters who rate them, The American Statistician, 63(4):297–307. [pdf]
If you ever heard of Netflix grand prize, you might be interested in reading this short report. Instead of the prediction problem challenged by Netflix, we focus more on the modeling. Our simple model is able to identify quirky raters, supply a ranking of movies, and should be able to predict unseen ratings. We only fit the model to the MovieLens data set. Interested readers should try this on the (much larger) Netflix data set.

H Zhou and K Lange. (2009) Composition Markov chains of multinomial type, Advances in Applied Probability, 41(1):270-291. [pdf]
We describe a class of Markov chains that take a system of multivariate Krawtchouk polynomials constructed by Robert Griffiths as eigenfunctions.

K Khare and H Zhou. (2009) Rates of convergence of some multivariate Markov chains with polynomial eigenfunctions, Annals of Applied Probability, 19(2):737-777. [pdf]
We obtaine the convergence rates of the multivariate versions of several classical Markov chains using spectral method. Specific examples include the multivariate Moran process in population genetics and its variants in community ecology, Dirichlet-Multinomial Gibbs sampler, generalizations of Ehrenfest chains, multivariate normal autoregressive processes, and so on.

H Zhou and K Dorman. (2005) A branching process model of drug resistant HIV, book chapter in Deterministic and stochastic models of AIDS epidemics and HIV infections with intervention, 457-496, World Sci. Publ., Hackensack, NJ.
We apply the numerical methods for continuous-time multi-type branching processes with immigration to study the development of drug-resistant HIV in vivo.