Standardizing a random vector and applications

Motivation

In probability and statistics, standardization is a process that scales and centers variables. For a scalar random variable $X$, its standardization is the variable $Z$ defined by

\[\begin{equation} Z = \frac{X - \mathbb{E} X}{\sqrt{\operatorname{Var}(X)}}. \end{equation}\]

Standardization allows for easier comparison and analysis of different variables, providing a common ground for meaningful interpretations. A good example of this is the use of standardized coefficients in regression.

If $X$ is instead a random vector, the above formula falls short since $\operatorname{Var}(X)$ is a covariance matrix. In this short exposition, the above formula is generalized to the vector (a.k.a. multivariate) setting.

As an application, we consider a well-known recipe for sampling from multivariate random normal distributions using a standard random normal sampler.

Prerequisites

Standardizing a random vector

Standardization in multiple dimensions is an immediate consequence of the following result:

Proposition (First and Second Moments of Affine Transform). Let $U$ and $V$ be random vectors, $A$ be a (deterministic) matrix and $b$ be a (deterministic) vector such that

\[U = A V + b.\]

Then,

\[\begin{align*} \mathbb{E}U & =A\mathbb{E}V+b\\ \operatorname{Var}(U) & =A\operatorname{Var}(V)A^{\intercal}. \end{align*}\]

You can prove the above by direct computation, using linearity of expectation and the identity $\operatorname{Var}(Y) = \mathbb{E}[YY^\intercal] - \mathbb{E}Y\mathbb{E}Y^\intercal$.

Corollary (Standardization). Let $\mu$ be a vector and $\Sigma$ be a positive definite matrix so that it admits a Cholesky factorization $LL^\intercal = \Sigma$. Let $X$ and $Z$ be random vectors satisfying

\[\begin{equation} Z = L^{-1} \left( X - \mu \right). \end{equation}\]

Then, $X$ has mean $\mu$ and covariance matrix $\Sigma$ if and only if $Z$ has mean $\mathbf{0}$ and covariance matrix $I$.

Note that in the above, $L$ generalizes $\sqrt{\operatorname{Var}(X)}$ from the scalar case.

Bivariate case

It is useful, for the sake of reference, to derive closed forms for the Cholesky factorization in the $d = 2$ case. In this case, the covariance matrix takes the form

\[\Sigma=\begin{pmatrix}\sigma_{1}\\ & \sigma_{2} \end{pmatrix}\begin{pmatrix}1 & \rho\\ \rho & 1 \end{pmatrix}\begin{pmatrix}\sigma_{1}\\ & \sigma_{2} \end{pmatrix}.\]

It is easy to verify (by matrix multiplication) that the Cholesky factorization is given by

\[L=\begin{pmatrix}\sigma_{1}\\ & \sigma_{2} \end{pmatrix}\begin{pmatrix}1 & 0\\ \rho & \sqrt{1-\rho^{2}} \end{pmatrix}.\]

Its inverse is

\[L^{-1}=\begin{pmatrix}1 & 0\\ -\frac{\rho}{\sqrt{1-\rho^{2}}} & \frac{1}{\sqrt{1-\rho^{2}}} \end{pmatrix}\begin{pmatrix}\frac{1}{\sigma_{1}}\\ & \frac{1}{\sigma_{2}} \end{pmatrix}.\]

Sampling a multivariate random normal distribution

A multivariate random normal variable is defined as any random vector $X$ which can be written in the form

\[X = LZ + \mu\]

where $Z$ is a random vector whose coordinates are independent standard normal variables. From our results above, we know that $X$ has mean $\mu$ and covariance $\Sigma = LL^\intercal$.

Remark. This only one of many equivalent ways to define a multivariate random normal variable.

This gives us a recipe for simulating draws from an arbitrary multivariate random normal distribution given only a standard random normal sampler such as np.random.randn():

def sample_multivariate_normal(
    mean: np.ndarray,
    covariance: np.ndarray,
    n_samples: int = 1,
) -> np.ndarray:
    """Samples a multivariate random normal distribution.
    
    Parameters
    ----------
    mean
        ``(d, )`` shaped mean
    covariance
        Positive definite ``(d, d)`` shaped covariance matrix
    n_samples
        Number of samples

    Returns
    -------
    ``(n_samples, d)`` shaped array where each row corresponds to a single draw from a multivariate normal.
    """
    chol = np.linalg.cholesky(covariance)
    rand = np.random.randn(mean.shape[0], n_samples)
    return (chol @ rand).T + mean

We can verify this is working as desired with a small test and visualization:

mean       = np.array([5., 10.])
covariance = np.array([[2., 1.],
                       [1., 4.]])

samples = sample_multivariate_normal(mean=mean,
                                     covariance=covariance,
                                     n_samples=10_000)

x, y = samples.T
plt.scatter(x, y, alpha=0.1)

empirical_covariance = np.cov(samples, rowvar=False)

print(f"""
Empirical covariance:
{empirical_covariance}

True covariance:
{covariance}
""")

Empirical covariance:
[[1.96748266 1.02186592]
 [1.02186592 3.98317796]]

True covariance:
[[2. 1.]
 [1. 4.]]

png