Most introductions to hypothesis testing are targeted at non-mathematicians. This short post aims to be a precise introduction to the subject for mathematicians.

Consider a parametric model with parameter set $$\Theta$$. The model generates realizations $$X_1, \ldots, X_n$$.

Example (Coin Flip). We are given a coin. The coin has probability $$\theta$$ in $$\Theta \equiv [0, 1]$$ of showing heads. We flip the coin $$n$$ times and record $$X_i = 1$$ if the $$i$$-th flip is heads and $$0$$ otherwise.

Throughout this article, we use the above coin flip model to illustrate the ideas.

In hypothesis testing, we start with a hypothesis (also called the null hypothesis). Specifying a null hypothesis is equivalent to picking some nonempty subset $$\Theta_0$$ of the parameter set $$\Theta$$. Precisely, the null hypothesis is the assumption that realizations are being generated by the model parameterized by some $$\theta$$ in $$\Theta_0$$.

Example (Coin Flip). Our hypothesis is $$\Theta_0 \equiv \{ 1 / 2 \}$$. That is, we hypothesize that the coin is fair.

For brevity, let $$X \equiv (X_1, \ldots, X_n)$$. To specify when the null hypothesis is rejected, we define a rejection function $$R$$ such that $$R(X)$$ is an indicator random variable whose unit value corresponds to rejection.

Example (Coin Flip). Let

$$$R(x_1, \ldots, x_n) \equiv \begin{cases} 1, & \text{if } \left| \left(x_1 + \cdots + x_n \right) / n - 1 / 2 \right| \geq \epsilon \\ 0, & \text{otherwise}. \end{cases}$$$

This corresponds to rejecting the null hypothesis whenever we see “significantly” more heads than tails (or vice versa). Our notion of significance is controlled by $$\epsilon$$.

Note that nothing stops us from making a bad test. For example, taking $$\epsilon = 0$$ in the above example yields a test that always rejects. Conversely, taking $$\epsilon > 1/2$$ yields a test that never rejects.

Definition (Power). The power

$$$\operatorname{Power}(\theta, R) \equiv \mathbb{P}_\theta \left\{ R(X) = 1 \right\}$$$

gives the probability of rejection assuming that the true model parameter is $$\theta$$.

Example (Coin Flip). Let $$F_\theta$$ denote the CDF of a binomial distribution with $$n$$ trials and success probability $$\theta$$. Let $$S \equiv X_1 + \cdots + X_n$$. Then, assuming $$\epsilon$$ is positive,

\begin{align*} \operatorname{Power}(\theta,R) & = 1 - \mathbb{P}_{\theta} \left\{ \left|S/n-1/2\right| < \epsilon \right\} \\ & = 1 - \mathbb{P}_{\theta} \left\{ n/2-\epsilon n < S < n/2+\epsilon n \right\} \\ & = 1 - F_\theta(\left(n/2+\epsilon n\right)-) + F_\theta(n/2-\epsilon n) \end{align*}

where $$F(x-) = \lim_{y \uparrow x} F(y)$$ is a left-hand limit.

Definition (Size). The size of a test

$$$\operatorname{Size}(R) \equiv \sup \left \{ \operatorname{Power}(\theta, R) \colon \theta \in \Theta_0 \right \}$$$

gives, assuming that the null hypothesis is true, the “worst-case” probability of rejection.

Rejecting the null hypothesis errenously is called a type I error (see the table below). The size puts an upper bound on making a type I error.

Retain Null Reject Null
Null Hypothesis is True No error Type I error
Null Hypothesis is False Type II error No error

Example (Coin Flip). Since $$\Theta_0 = \{ 1 / 2 \}$$ is a singleton, $$\operatorname{Size}(R) = \operatorname{Power}(1/2, R)$$.

Definition (p-value). Let $$(R_\alpha)_\alpha$$ be a collection of rejection functions. Define

$$$\operatorname{p-value} \equiv \inf \left\{ \operatorname{Size}(R_\alpha) \colon R_\alpha(X) = 1 \right\}.$$$

as the smallest size for which the null-hypothesis is rejected.

Unlike the size, the p-value is itself a random variable. The smaller the p-value, the more confident we can be that a rejection is justified. A common threshold for rejection is a p-value smaller than 0.01. A rejection in this case can be understood as being at least 99% certain the rejection was not done erroneously.

Theorem 1. Suppose we have a collection of rejection functions $$(R_{\alpha})_{\alpha}$$ of the form

$$$R_{\alpha}(x_1, \ldots, x_n) \equiv \begin{cases} 1, & \text{if } f(x_1, \ldots, x_n) \geq c_{\alpha} \\ 0, & \text{otherwise} \end{cases}$$$

where $$f$$ does not vary with $$\alpha$$. Suppose also that for each point $$y$$ in the range of $$f$$, there exists $$\alpha$$ such that $$c_{\alpha} = y$$. Then,

$$$\operatorname{p-value}(\omega) \equiv \sup\left\{ \mathbb{P}_{\theta} \left\{ f(X) \geq f(X(\omega)) \right\} \colon \theta \in \Theta_0 \right\}.$$$

In other words, the p-value (under the setting of Theorem 1) is the worst-case probability of sampling $$f(X)$$ larger than what was observed, $$f(X(\omega))$$. Note that in the above, we have used $$\omega$$ to distinguish between the actual random variable $$X$$ and $$X(\omega)$$, the observation.

Proof. Note that

$$$\operatorname{p-value}(\omega) = \inf \left\{ \sup \left\{ \mathbb{P}_{\theta} \left\{ f(X)\geq c_{\alpha} \right\} \colon \theta \in \Theta_0 \right\} \colon f(X(\omega)) \geq c_{\alpha} \right\}.$$$

The result follows from noting that the infimum is achieved at the value of $$\alpha$$ for which $$c_{\alpha}=f(X(\omega))$$. $$\square$$

Example (Coin Flip). We flip the coin $$n$$ times and observe $$S(\omega)$$ heads. By Theorem 1,

$$$\operatorname{p-value}(\omega) = \mathbb{P}_{1/2} \left\{ \left|S/n - 1/2\right| \geq \left|S(\omega)/n - 1/2\right| \right\}.$$$

Denoting by $$K(\omega) = S(\omega) - n/2$$,

\begin{align*} \operatorname{p-value}(\omega) & = 1 - \mathbb{P}_{1/2} \left\{ n/2 - \left|K(\omega)\right| < S < n/2 + \left|K(\omega)\right| \right\} \\ & = 1 - F_{1/2}(\left(n/2 + \left|K(\omega)\right|\right)-) + F_{1/2}(n/2 - \left|K(\omega)\right|). \end{align*}

Theorem 2. Suppose the setting of Theorem 1 and that, in addition, $$\Theta_0 = \{\theta_0\}$$ is a singleton and $$f(X)$$ has a continuous and strictly increasing CDF under $$\theta_0$$. Then, the p-value has a uniform distribution on $$[0,1]$$ under $$\theta_0$$.

In other words, if the null hypothesis is true, the p-value (under the setting of Theorem 2) is uniformly distributed on $$[0, 1]$$.

Proof. Denote by $$G$$ the CDF of $$f(X)$$ under $$\theta_0$$. First, note that

$$$\mathbb{P}_{\theta_0} \left\{ f(X) \geq f(X(\omega)) \right\} = 1 - G[f(X(\omega))].$$$

Then,

\begin{align*} \mathbb{P}_{\theta_0} \left\{ \omega \colon \operatorname{p-value}(\omega) \leq u \right\} & = 1 - \mathbb{P}_{\theta_0} \left\{ \omega \colon G[f(X(\omega))] \leq 1 - u \right\} \\ & = 1 - \mathbb{P}_{\theta_0} \left\{ \omega \colon f(X(\omega)) \leq G^{-1}(1 - u) \right\} \\ & = 1 - G(G^{-1}(1 - u)) = u. \square \end{align*}