Please feel free to email me email@example.com if you have any comments.
K Doubleday, H Zhou, H Fu, and J Zhou. (2018) A novel algorithm for generating individualized treatment decision trees and random forests, Journal of Computational and Graphical Statistics, in press.
B Gaines, J Kim, and H Zhou. (2018) Algorithms for fitting the constrained lasso, Journal of Computational and Graphical Statistics, in press.
L Hu, W Lu, J Zhou, and H Zhou. (2018) MM algorithms for variance component estimation and selection in logistic linear mixed models, Statistica Sinica, in press.
X Zhang, L Li, H Zhou, Y Zhou, D Shen, and ADNI. (2018) Tensor generalized estimating equations for longitudinal imaging analysis, Statistica Sinica, in press.
X Li, D Xu, H Zhou, and L Li. (2018) Tucker tensor regression and neuroimaging analysis, Statistics in Biosciences, in press.
Tucker version of the tensor regression.
J Zhai, J Kim, K Knox, H Twigg, H Zhou, and J Zhou. (2018) Variance component selection with applications to microbiome taxonomic data, Frontiers in Microbiology, 9:509. [pdf]
Variance component penalization method for microbiome taxa association study.
J Zhou, T Hu, D Qiao, M Cho, and H Zhou. (2016) Boosting gene mapping power and efficiency with efficient exact variance component tests of SNP sets, Genetics, 204(3):921-931. [pdf]
Exact score and likelihood ratio tests for testing SNP sets.
Y Zhang, H Zhou, J Zhou, and W Sun. (2017) Regression models for multivariate count data. Journal of Computational and Graphical Statistics, 26(1):1-13. [pdf]
Regression model using Dirichlet-multinomial, negative multinomial, and generalized Dirichlet-multinomial distributions.
H Zhou, J Blangero, T Dyer, K Chan, K Lange, and E Sobel. (2017) Fast genome-wide QTL association mapping
on pedigree and population data, Genetic Epidemiology, 41(3):174-186. [pdf]
Method for Mendel software Option 29 (Pedigree GWAS).
B Zhang, H Zhou, L Wang, and C Sung. (2017) Classification based on neuroimaging data by tensor boosting, International Joint Conference on Neural Networks (IJCNN), 1174-1179. [pdf]
N Zhao, J Chen, IM Carroll, T Ringel-Kulka, MP Epstein, H Zhou, J Zhou, Y Ringel, HZ Li, and MC Wu. (2015) Testing in microbiome profiling studies with the microbiome regression-based kernel association test (MiRKAT). The American Journal of Human Genetics, 96(5):797-807. [pdf]
W Sun, Y Liu, JJ Crowley, TH Chen, H Zhou, H Chu, S Huang, PF Kuan, Y Li, D Miller, G Shaw, Y Wu, V Zhabotynsky, L McMillan, F Zou, PF Sullivan and FPM de Villena. (2015) IsoDOT detects differential RNA-isoform expression/usage with respect to a categorical or continuous covariate with high sensitivity and specificity, Journal of American Statistical Association, 110:975-986. [pdf]
W Xiao, Y Wu and H Zhou. (2015) ConvexLAR: an extension of least angle regression, Journal of Computational and Graphical Statistics, 24(3):603–626. [pdf]
Least angle regression (LAR) with a general convex loss.
H Zhou and Y Wu. (2014) A generic path algorithm for regularized statistical estimation, Journal of American Statistical Association, 109(506):686-699. [pdf]
Path following for a convex loss plus a generalized lasso penalty.
K Lange, E Chi, and H Zhou. (2014) A brief survey of modern optimization for statisticians (with discussions by Atchade and Michailidis, Hunter, Robert, and rejoinder), International Statistical Review, 82(1):46-70. [pdf]
K Lange, JC Papp, JS Sinsheimer, R Sripracha, H Zhou, and E Sobel. (2013) Mendel: the Swiss army knife of genetic analysis programs, Bioinformatics, 29(12):1568-1570. [pdf][Mendel]
Summary of the new version of the comprehensive genetic analysis software Mendel.
E Chi, G Allen, H Zhou, O Kohannim, K Lange, and P Thompson. (2013) Imaging genetics via sparse canonical correlation analysis, Biomedical Imaging (ISBI), 2013 IEEE 10th International Symposium on, pp740-743. [pdf]
Sparse canonical correlation analysis (CCA) for the tensor data.
H Zhou, L Li, and H Zhu. (2013) Tensor regression with applications in neuroimaging data analysis, Journal of American Statistical Association, 108(502):540-552. [pdf][software]
Traditional regression takes a vector of covariates. We consider regression that takes an array, aka tensor, of covariates, such as in neuroimaging studies.
E Chi, H Zhou, G Chen, D Ortega, and K Lange. (2013) Genotype imputation via matrix completion, Genome Research, 23:509-518. [pdf][Mendel Impute]
We successfully applied matrix completion method to the difficult genotype imputation problem. Similar imputation accuracy is achieved in order of magnitude less time than current methods.
K Lange and H Zhou. (2014) MM algorithms for geometric and signomial programming, Mathematical Programming Series A, 143(1-2):339-356. [pdf]
A simple algorithm for minimizing posynomials and signomials.
L Riley, H Zhou, K Lange, J Sinsheimer, and M Sehl. (2012) Determining duration of HER2-targeted therapy using stem cell extinction models, PLoS ONE, 7(12):e46613. [pdf]
Prediction of optimal duration of stem-cell target treatment based on a stochastic model.
H Zhu, L Li, and H Zhou. (2012) Nonlinear dimension reduction with Wright-Fisher kernel for genotype aggregation and association mapping, Bioinformatics, 28:i375–i381. [pdf]
Nonlinear dimension reduction using Markov chain kernels.
H Zhou and K Lange. (2013) A path algorithm for constrained estimation, Journal of Computational and Graphical Statistics, 22(2):261-283. [pdf]
Path following for a quadratic criterion plus a generalized lasso penalty.
H Zhou and Y Zhang. (2012) EM vs MM: a case study, Computational Statistics & Data Analysis, 56:3909–3920. [pdf]
A comparison of EM (expectation-maximization) and MM (minorization-maximization) algorithms in the case of MLE for the Dirichlet-Multinomial distribution.
M Sehl, H Zhou, J Sinsheimer, and K Lange. (2011) Extinction models for cancer stem cell therapy, Mathematical Biosciences, 234(2):132-146. [pdf]
A Markov process model for the stem cells in cancer patients under treatment. A mix of techniques, e.g., extreme value theory, fast Fourier transform, and orthogonal polynomial expansions, are used.
H Zhou, D Alexander, and K Lange. (2011) A quasi-Newton acceleration for high-dimensional optimization algorithms, Statistics and Computing, 21(2):261-273. [pdf]
A new quasi-Newton acceleration scheme particularly suitable for EM/MM algorithms on high dimensional problems.
H Zhou and K Lange. (2011) A fast procedure for calculating importance weights in bootstrap sampling, Computational Statistics & Data Analysis, 55(1):26-33. [pdf]
The use of importance sampling can dramatically reduce the variance of boostrap estimates. However nowadays the sample size is often huge, say at order of 10^6-10^9. Then the calculation of importance weights becomes nontrivial. Here we propose a fast procedure that scales well with sample size.
H Zhou, D Alexander, M Sehl, J Sinsheimer, E Sobel, and K Lange. (2011) Penalized regression for genome-wide association screening of sequence data, Pacific Symposium of Biocomputing, 16:106-117. [pdf][Mendel]
This companion paper introduces weights to calibrate the penalties and covers more details on the implementation of penalized regression in the statistical genetics analysis software Mendel.
H Zhou, M Sehl, J Sinsheimer, and K Lange. (2010) Association screening of common and rare genetic variants by penalized regression, Bioinformatics, 26(19):2357-2382. [pdf][Mendel]
Application of penalized linear and logistic regressions to genome-wide association studies (GWAS). A mixture of lasso and group penalties are used to select causal rare variants present in sequence data.
H Zhou, K Lange, and M Suchard. (2010) Graphics processing units and high-dimensional optimization, Statistical Science, 25:311-324. [pdf]
The marriage of MM principle and modern graphical processing unit (GPU) technology gives a boost to the classical EM/MM type algorithms widely used in statistics.
H Zhou and K Lange. (2010) On the bumpy road to the dominant mode, Scandinavian Journal of Statistics, 37:612-631. [pdf]
We propose several variants of deterministic annealing for finding the dominant mode in maximum likelihood estimation with some classical statistical problems.
H Zhou and K Lange. (2010) MM algorithms for some discrete multivariate distributions, Journal of Computational and Graphical Statistics, 19(3):656-665. [pdf]
We designed MM algorithms (a generalization of the EM algorithm) for maximum likelihood estimation of some multivariate distributions frequently occurring in applications. Specific examples include Dirichlet-Multinomial, generalized Dirichlet-Multinomial, negative multinomial, a distribution due to Neerchal and Morel, the Ewens and Pitman sampling distributions, and the zero-truncated and zero-inflated version of these distributions.
M Sehl, J Sinsheimer, H Zhou, and K Lange. (2009) Differential destruction of stem cells: implications for targeted cancer stem cell therapy, Cancer Research, 69(24):9481-9489. [pdf]
Clinical implications of the cancer extinction model are summarized in this paper.
H Zhou and K Lange. (2009) Rating movies and rating the raters who rate them, The American Statistician, 63(4):297–307. [pdf]
If you ever heard of Netflix grand prize, you might be interested in reading this short report. Instead of the prediction problem challenged by Netflix, we focus more on the modeling. Our simple model is able to identify quirky raters, supply a ranking of movies, and should be able to predict unseen ratings. We only fit the model to the MovieLens data set. Interested readers should try this on the (much larger) Netflix data set.
H Zhou and K Lange. (2009) Composition Markov chains of multinomial type, Advances in Applied Probability, 41(1):270-291. [pdf]
We describe a class of Markov chains that take a system of multivariate Krawtchouk polynomials constructed by Robert Griffiths as eigenfunctions.
K Khare and H Zhou. (2009) Rates of convergence of some multivariate Markov chains with polynomial eigenfunctions, Annals of Applied Probability, 19(2):737-777. [pdf]
We obtaine the convergence rates of the multivariate versions of several classical Markov chains using spectral method. Specific examples include the multivariate Moran process in population genetics and its variants in community ecology, Dirichlet-Multinomial Gibbs sampler, generalizations of Ehrenfest chains, multivariate normal autoregressive processes, and so on.
H Zhou and K Dorman. (2005) A branching process model of drug resistant HIV, book chapter in Deterministic and stochastic models of AIDS epidemics and HIV infections with intervention, 457-496, World Sci. Publ., Hackensack, NJ.
We apply the numerical methods for continuous-time multi-type branching processes with immigration to study the development of drug-resistant HIV in vivo.
H Zhou. (2006) Scrambling a Rubik’s cube. [pdf]
We compute the character tables of the pocket cube (2-by-2-by-2 Rubik’s cube) group and the Rubik’s cube group and considered some specific random walks on these groups. The question of “How many twists are needed to thoroughly scramble the full Rubic’s cube?” is still open.