Biostat M280 Homework 5

Due June 16 @ 11:59PM

Consider again the MLE of the Dirichlet-multinomial model. In HW4, we worked out a Newton's method. In this homework, we explore the EM and MM approach.

Q1

Show that, given iid observations $\mathbf{x}_1,\ldots,\mathbf{x}_n$, the log-likelihood can be written as $$ L(\alpha) = \sum_{i=1}^n \ln \binom{|\mathbf{x}_i|}{\mathbf{x}_i} +\sum_{i=1}^n \sum_{j=1}^d \sum_{k=0}^{x_{ij}-1} \ln(\alpha_j+k) - \sum_{i=1}^n \sum_{k=0}^{|\mathbf{x}_i|-1} \ln(|\alpha|+k). $$ Hint: $\Gamma(a + k) / \Gamma(a) = a (a + 1) \cdots (a+k-1)$.

Q2

Suppose $(P_1,\ldots,P_d) \in \Delta_d = \{\mathbf{p}: p_i \ge 0, \sum_i p_i = 1\}$ follows a Dirichlet distribution with parameter $\alpha = (\alpha_1,\ldots,\alpha_d)$. Show that $$ \mathbf{E}(\ln P_j) = \Psi(\alpha_j) - \Psi(|\alpha|), $$ where $\Psi(z) = \Gamma'(z)/\Gamma(z)$ is the digamma function and $|\alpha| = \sum_{j=1}^d \alpha_j$. Hint: Differentiate the identity $$ 1 = \int_{\Delta_d} \frac{\Gamma(|\alpha|)}{\prod_{j=1}^d \Gamma(\alpha_j)} \prod_{j=1}^d p_j^{\alpha_j-1} \, d\mathbf{p}. $$

Q3

The admixture representation of the Dirichlet-multinomial distribution suggests that we can treat the unobserved multinomial parameters $\mathbf{p}_1,\ldots,\mathbf{p}_n$ as missing data and derive an EM algorithm. Show that the Q function is $$ Q(\alpha|\alpha^{(t)}) = \sum_{j=1}^d \sum_{i=1}^n \alpha_j \left[\Psi(x_{ij}+\alpha_j^{(t)}) - \Psi(|\mathbf{x}_i|+|\alpha^{(t)}|)\right] - n \sum_{j=1}^d \ln \Gamma(\alpha_j) + n \ln \Gamma(|\alpha|) + c^{(t)}, $$ where $c^{(t)}$ is a constant irrelevant to optimization. Comment on why it is not easy to maximize the Q function.

Q4

We derive an MM algorithm for maximing $L$. Consider the formulation of the log-likelihood that contains terms $\ln (\alpha_j + k)$ and $- \ln (|\alpha|+k)$. Applying Jensen's inequality to the concave term $\ln (\alpha_j + k)$ and supporting hyperplane inequality to the convex term $- \ln (|\alpha|+k)$, show that a minorizing function to $L(\alpha)$ is $$ g(\alpha|\alpha^{(t)}) = - \sum_{k=0}^{\max_i |\mathbf{x}_i|-1} \frac{r_k}{|\alpha^{(t)}|+k} |\alpha| + \sum_{j=1}^d \sum_{k=0}^{\max_i x_{ij}-1} \frac{s_{jk} \alpha_j^{(t)}}{\alpha_j^{(t)}+k} \ln \alpha_j + c^{(t)}, $$ where $s_{jk} = \sum_{i=1}^n 1_{\{x_{ij} > k\}}$, $r_k = \sum_{i=1}^n 1_{\{|\mathbf{x}_i| > k\}}$, and $c^{(t)}$ is a constant irrelevant to optimization. Maximizing the surrogate function $g(\alpha|\alpha^{(t)})$ is trivial since $\alpha_j$ are separated. Show that the MM updates are $$ \alpha_j^{(t+1)} = \frac{\sum_{k=0}^{\max_i x_{ij}-1} \frac{s_{jk}}{\alpha_j^{(t)}+k}}{\sum_{k=0}^{\max_i |\mathbf{x}_i|-1} \frac{r_k}{|\alpha^{(t)}|+k}} \alpha_j^{(t)}, \quad j=1,\ldots,d. $$ The quantities $s_{jk}$, $r_k$, $\max_i x_{ij}$ and $\max_i |\mathbf{x}_i|$ only depend on data and can be pre-computed. Comment on whether the MM updates respect the parameter constraint $\alpha_j>0$.

Q5

Write a function for finding MLE of Dirichlet-multinomial distribution given iid observations $\mathbf{x}_1,\ldots,\mathbf{x}_n$, using MM algorithm.

In [1]:
"""
    dirmult_mm(X)

Find the MLE of Dirichlet-multinomial distribution using MM algorithm.

# Argument
* `X`: an `d`-by-`n` matrix of counts; each column is one data point.

# Optional argument  
* `alpha0`: starting point. 
* `maxiters`: the maximum allowable Newton iterations (default 100). 
* `tolfun`: the tolerance for  relative change in objective values (default 1e-6). 

# Output
# Output
* `logl`: the log-likelihood at MLE.   
* `niter`: the number of iterations performed.
# `α`: the MLE.
* `∇`: the gradient at MLE. 
* `obsinfo`: the observed information matrix at MLE. 
"""
function dirmult_mm(
        X::AbstractMatrix; 
        α0::Vector = dirmult_mom(X), 
        maxiters::Int = 100, 
        tolfun = 1e-6,
    )
    
    # your code here
    
end
Out[1]:
dirmult_mm

Q6

Re-do HW4 Q9 using your new dirmult_mm function. Compare the number of iterations and run time by MM algorithm to those by Newton's method. Comment on the efficiency of Newton's algorithm vs MM algorithm for this problem.

Q7

Finally let us re-consider the EM algorithm. The difficulty with the M step in EM algorithm can be remedied. Discuss how we can further minorize the $\ln \Gamma(|\alpha|)$ term in the $Q$ function to produce a minorizing function with all $\alpha_j$ separated. For this homework, you do not need to implement this EM-MM hybrid algorithm. Hint: $z \mapsto \ln \Gamma(z)$ is a convex function.