# A Note on Angular Central Gaussian Distribution and its Matrix Variant

notes
geometric statistics
Author

Kisung You

Published

August 12, 2022

# Introduction

Probability distribution with explicit forms of densities are core elements of statistical inference. In this post, we review angular central gaussian (ACG) distribution on a unit hypersphere $$\mathbb{S}^{p-1} \subset \mathbb{R}^p$$ and its extension - matrix angular central gaussian (MACG) - defined on Stiefel $$St(p,r)$$ and Grassman $$Gr(p,r)$$ manifolds.

# Angular Central Gaussian Distribution

On $$\mathbb{S}^{p-1}$$, the ACG distribution $$ACG_p (A)$$ as a density

$f_{ACG} (x\vert A) = |A|^{-1/2} (x^\top Ax)^{-p/2}$

for $$x \in \mathbb{S}^{p-1}$$ and $$A$$ a symmetric positive-definite matrix, i.e., $$A=A^\top \in \mathbb{R}^{p\times p}$$ with $$\lambda_{min}(A)>0$$. Let’s recap some properties of ACG distribution.

• Property 1. $$f_{ACG}(x|A) = f_{ACG}(-x|A)$$. This enables ACG as a distribution on the real projective space $$\mathbb{R}P^{p-1} = \mathbb{S}^{p-1}/\lbrace +1, -1 \rbrace$$.

• Property 2. $$f_{ACG}(x|A) = f_{ACG}(x|cA),~c>0$$. Common convention is to normalize the matrix $$A$$ by a constraint $$\textrm{tr}(A) = p$$, which is useful (or even essential) in maximum likelihood estimation of the parameter to ensure algorithmic stability. If you want to show this property, simply use the fact that $$|cA| = c^p|A|$$.

• Property 3. When $$x\sim \mathcal{N}_p (0,A) \rightarrow x/\|x\| \sim ACG_p (A)$$. This property is indeed an intuition behind its origination , which can be used for sampling.

### Maximum Likelihood Estimation

Given a random sample $$x_1, \ldots, x_p \sim ACG_p (A)$$, Tyler (1987) proposed an iterative updating scheme to estimate the parameter $$A$$ by

$\hat{A}_{k+1} = p \left\lbrace \sum_{i=1}^n \frac{1}{x_i^\top \hat{A}_k^{-1} x_i} \right\rbrace^{-1} \sum_{i=1}^n \frac{x_i x_i^\top}{x_i^\top \hat{A}_k^{-1} x_i}, \tag{1}$

where $$\hat{A}_k$$ is the $$k$$-th iterate of an estimator with an initial starting point of an identity matrix $$\hat{A}_0 = I_p$$. While Equation 1 guarantees the convergence under mild conditions and abides by the constraint $$\textrm{tr}(\hat{A}_k) = p$$, it is from the author’s previous work on $$M$$-estimation of the scatter matrix. Here, we provide a naive derivation of 2-step fixed-point iteration algorithm for pedagogical purpose.

$\hat{A}_{k'} = \frac{p}{n}\sum_{i=1}^n \frac{x_i x_i^\top}{x_i^\top \hat{A}_k^{-1} x_i}\,\,\textrm{and}\,\, \hat{A}_{k+1} = \frac{p}{\textrm{tr}(\hat{A}_{k'})} \hat{A}_{k'}. \tag{2}$

First, let’s write the log-likelihood

$\log L = -\frac{n}{2}\log\det(A) - \frac{p}{2} \sum_{i=1}^n \log (x_i^\top A^{-1} x_i),$

and recall two facts from matrix calculus that

$\frac{\partial \log\det(A)}{\partial A} = A^{-1}\,\,\textrm{and}\,\, \frac{\partial x^\top A^{-1} x}{\partial A} = -A^{-1}xx^\top A^{-1}.$

Then, the first-order condition for the log-likelihood can be written as

$\begin{gather*} \frac{\partial \log L}{\partial A} = -\frac{n}{2} A^{-1} + \frac{p}{2} \sum_{i=1}^n \frac{A^{-1} x_i x_i^\top A^{-1}}{x_i^\top A^{-1} x_i} \\ A^{-1} = \frac{p}{n} \sum_{i=1}^n \frac{A^{-1} x_i x_i^\top A^{-1}}{x_i^\top A^{-1} x_i} \\ A = \frac{p}{n} \sum_{i=1}^n \frac{ x_i x_i^\top }{x_i^\top A^{-1} x_i} \end{gather*}$

where the last equality comes from multiplying $$A$$ from left and right. Therefore, $$\hat{A}$$ is a solution of the matrix equation in a form $$X = f(X)$$ where $$f$$ is a contraction mapping under some conditions . This leads to Equation 2 while projection step is added to keep $$\text{tr}(\hat{A}_k) = p$$ for all $$k=1,2,\cdots$$.

# Matrix Angular Central Gaussian Distribution

Chikuse (1990) extended the distribution to the matrix case, namely Stiefel and Grassmann manifolds

$\begin{gather*} St(p,r) = \{X\in \mathbb{R}^{p\times r} ~\vert~ X^\top X = I_p\}\\ Gr(p,r) = \{\text{Span}(X) ~\vert~ X \in \mathbb{R}^{p\times r},~\text{rank}(X)=r\} \end{gather*}$

which are sets of orthonormal $$k$$-frames and $$k$$-subspaces. The Matrix Angular Central Gaussian (MACG) distribution $$MACG_{p,r}(\Sigma)$$ has a density

$f_{MACG}(X\vert \Sigma) = |\Sigma|^{-r/2} |X^\top \Sigma^{-1} X|^{-p/2}$

where $$\Sigma$$ is a symmetric positive-definite matrix. Note that the density is very similar to what we had before for vector-valued distribution. Likewise, it shares properties as before.

• Property 1. $$f_{MACG}(X|\Sigma) = f_{MACG}(-X|\Sigma)$$.
• Property 2. $$f_{MACG}(X|\Sigma) = f_{MACG}(X|c\Sigma),~c>0$$.
• Property 3. $$f_{MACG}(X|\Sigma) = f_{MACG}(XR|\Sigma)$$ for $$R\in O(r)$$. This property enables to consider MACG as a distribution on Grassmann manifold, which are quotient by modulo orthogonal transformation.

### Sampling from MACG

In order to draw random samples from $$MACG_{p,r}(\Sigma)$$, we need the following steps, which are common in directional statistics with Stiefel/Grassmann manifolds . First, draw $$r$$ random vectors $$x_1,\ldots,x_r \sim \mathcal{N}_p (0,\Sigma)$$ and stack them as columns $$X=[x_1|\cdots|x_r] \in \mathbb{R}^{p\times r}$$. Then,

$Y = X (X^\top X)^{-1/2} \sim MACG_{p,r}(\Sigma)$

where the negative square root for a symmetric positive-definite matrix can be obtained from eigen-decomposition,

$\begin{gather*} \Omega = UDU^\top \rightarrow \Omega^{-1/2} = UD^{-1/2} U^\top \\ \left[D^{-1/2}\right]_{ij} = \frac{1}{\sqrt{d_{ij}}} \textrm{ when } i = j \textrm{ and }0\textrm{ otherwise.} \end{gather*}$

### Maximum Likelihood Estimation

Similar to the ACG case, given a random sample $$X_1,X_2,\ldots,X_n \sim MACG_{p,r}(\Sigma)$$, we can obtain a two-step iterative scheme to estimate the parameter $$\Sigma$$,

$\begin{gather*} \hat{\Sigma}_{k'} = \frac{p}{nr} \sum_{i=1}^n X_i (X_i^\top \Sigma^{-1} X_i)^{-1} X_i \\ \hat{\Sigma}_{k+1} = \frac{p}{\text{tr}(\hat{\Sigma}_{k'})} \hat{\Sigma}_{k'}.\end{gather*} \tag{3}$

Derivation of formula Equation 3 follows the similar line of before. We need another fact from matrix calculus that

$\frac{\partial }{\partial \Sigma} \log\det(X^\top \Sigma^{-1} X) = - \Sigma^{-1} X (X^\top \Sigma^{-1} X)^{-1} X^\top \Sigma^{-1}.$

First, log-likelihood is written as

$\log L = -\frac{nr}{2}\log\det(\Sigma) - \frac{p}{2}\sum_{i=1}^n \log\det (X_i^\top \Sigma^{-1} X_i)$

where the first-order condition gives

$\begin{gather*} \frac{\partial \log L}{\partial \Sigma} = -\frac{nr}{2}\Sigma^{-1} + \frac{p}{2}\sum_{i=1}^n \left( \Sigma^{-1} X_i (X_i^\top \Sigma^{-1} X_i)^{-1} X_i^\top \Sigma^{-1} \right)\\ \frac{nr}{2} \Sigma^{-1} = \frac{p}{2}\sum_{i=1}^n \left( \Sigma^{-1} X_i (X_i^\top \Sigma^{-1} X_i)^{-1} X_i^\top \Sigma^{-1} \right) \\ \Sigma = \frac{p}{nr} \sum_{i=1}^n X_i (X_i^\top \Sigma^{-1} X_i)^{-1} X_i^\top \end{gather*}$

where the last equality comes from multiplying $$\Sigma$$ from left and right. Therefore, $$\hat{\Sigma}$$ is a solution of the matrix equation, leading to the formula of Equation 3 with an additional projection step to keep $$\text{tr}(\hat{\Sigma}_k) = p$$ for all $$k=1,2,\cdots$$. Note that this matrix equation, up to my knowledge, has not known whether the mapping is contraction or not.

# Conclusion

ACG and MACG distributions are simple yet rather little used in directional statistics. We hope that this brief note boosts probabilistic inference on corresponding manifolds at ease. An R package Riemann, which is also available on CRAN, implements density evaluation, random sample generation, and maximum likelihood estimation of the scatter parameters $$A$$ and $$\Sigma$$ in the light of expecting handy utilization of the distributions we introduced.

## Citation

BibTeX citation:
@online{you2022,
author = {You, Kisung},
title = {A {Note} on {Angular} {Central} {Gaussian} {Distribution} and
Its {Matrix} {Variant}},
date = {2022-08-12},
url = {https://kisungyou.com/posts/note003-angular-gaussian},
langid = {en}
}