Most introductions to hypothesis testing are targeted at non-mathematicians. This short post aims to be a precise introduction to the subject for mathematicians.

Remark. While the presentation may differ, some of the notation in this article is from L. Wasserman’s All of Statistics: a Concise Course in Statistical Inference.

Consider a parametric model with parameter set \(\Theta\). The model generates realizations \(X_1, \ldots, X_n\).

Example (Coin Flip). We are given a coin. The coin has probability \(\theta\) in \(\Theta \equiv [0, 1]\) of showing heads. We flip the coin \(n\) times and record \(X_i = 1\) if the \(i\)-th flip is heads and \(0\) otherwise.

Throughout this article, we use the above coin flip model to illustrate the ideas.

In hypothesis testing, we start with a hypothesis (also called the null hypothesis). Specifying a null hypothesis is equivalent to picking some nonempty subset \(\Theta_0\) of the parameter set \(\Theta\). Precisely, the null hypothesis is the assumption that realizations are being generated by the model parameterized by some \(\theta\) in \(\Theta_0\).

Example (Coin Flip). Our hypothesis is \(\Theta_0 \equiv \{ 1 / 2 \}\). That is, we hypothesize that the coin is fair.

For brevity, let \(X \equiv (X_1, \ldots, X_n)\). To specify when the null hypothesis is rejected, we define a rejection function \(R\) such that \(R(X)\) is an indicator random variable whose unit value corresponds to rejection.

Example (Coin Flip). Let

\[\begin{equation} R(x_1, \ldots, x_n) \equiv \begin{cases} 1, & \text{if } \left| \left(x_1 + \cdots + x_n \right) / n - 1 / 2 \right| \geq \epsilon \\ 0, & \text{otherwise}. \end{cases} \end{equation}\]

This corresponds to rejecting the null hypothesis whenever we see “significantly” more heads than tails (or vice versa). Our notion of significance is controlled by \(\epsilon\).

Note that nothing stops us from making a bad test. For example, taking \(\epsilon = 0\) in the above example yields a test that always rejects. Conversely, taking \(\epsilon > 1/2\) yields a test that never rejects.

Definition (Power). The power

\[\begin{equation} \operatorname{Power}(\theta, R) \equiv \mathbb{P}_\theta \left\{ R(X) = 1 \right\} \end{equation}\]

gives the probability of rejection assuming that the true model parameter is \(\theta\).

Example (Coin Flip). Let \(F_\theta\) denote the CDF of a binomial distribution with \(n\) trials and success probability \(\theta\). Let \(S \equiv X_1 + \cdots + X_n\). Then, assuming \(\epsilon\) is positive,

\[\begin{align*} \operatorname{Power}(\theta,R) & = 1 - \mathbb{P}_{\theta} \left\{ \left|S/n-1/2\right| < \epsilon \right\} \\ & = 1 - \mathbb{P}_{\theta} \left\{ n/2-\epsilon n < S < n/2+\epsilon n \right\} \\ & = 1 - F_\theta(\left(n/2+\epsilon n\right)-) + F_\theta(n/2-\epsilon n) \end{align*}\]

where \(F(x-) = \lim_{y \uparrow x} F(y)\) is a left-hand limit.

Definition (Size). The size of a test

\[\begin{equation} \operatorname{Size}(R) \equiv \sup \left \{ \operatorname{Power}(\theta, R) \colon \theta \in \Theta_0 \right \} \end{equation}\]

gives, assuming that the null hypothesis is true, the “worst-case” probability of rejection.

Rejecting the null hypothesis errenously is called a type I error (see the table below). The size puts an upper bound on making a type I error.

  Retain Null Reject Null
Null Hypothesis is True No error Type I error
Null Hypothesis is False Type II error No error

Example (Coin Flip). Since \(\Theta_0 = \{ 1 / 2 \}\) is a singleton, \(\operatorname{Size}(R) = \operatorname{Power}(1/2, R)\).

Definition (p-value). Let \((R_\alpha)_\alpha\) be a collection of rejection functions. Define

\[\begin{equation} \operatorname{p-value} \equiv \inf \left\{ \operatorname{Size}(R_\alpha) \colon R_\alpha(X) = 1 \right\}. \end{equation}\]

as the smallest size for which the null-hypothesis is rejected.

Unlike the size, the p-value is itself a random variable. The smaller the p-value, the more confident we can be that a rejection is justified. A common threshold for rejection is a p-value smaller than 0.01. A rejection in this case can be understood as being at least 99% certain the rejection was not done erroneously.

Theorem 1. Suppose we have a collection of rejection functions \((R_{\alpha})_{\alpha}\) of the form

\[\begin{equation} R_{\alpha}(x_1, \ldots, x_n) \equiv \begin{cases} 1, & \text{if } f(x_1, \ldots, x_n) \geq c_{\alpha} \\ 0, & \text{otherwise} \end{cases} \end{equation}\]

where \(f\) does not vary with \(\alpha\). Suppose also that for each point \(y\) in the range of \(f\), there exists \(\alpha\) such that \(c_{\alpha} = y\). Then,

\[\begin{equation} \operatorname{p-value}(\omega) \equiv \sup\left\{ \mathbb{P}_{\theta} \left\{ f(X) \geq f(X(\omega)) \right\} \colon \theta \in \Theta_0 \right\}. \end{equation}\]

In other words, the p-value (under the setting of Theorem 1) is the worst-case probability of sampling \(f(X)\) larger than what was observed, \(f(X(\omega))\). Note that in the above, we have used \(\omega\) to distinguish between the actual random variable \(X\) and \(X(\omega)\), the observation.

Proof. Note that

\[\begin{equation} \operatorname{p-value}(\omega) = \inf \left\{ \sup \left\{ \mathbb{P}_{\theta} \left\{ f(X)\geq c_{\alpha} \right\} \colon \theta \in \Theta_0 \right\} \colon f(X(\omega)) \geq c_{\alpha} \right\}. \end{equation}\]

The result follows from noting that the infimum is achieved at the value of \(\alpha\) for which \(c_{\alpha}=f(X(\omega))\). \(\square\)

Example (Coin Flip). We flip the coin \(n\) times and observe \(S(\omega)\) heads. By Theorem 1,

\[\begin{equation} \operatorname{p-value}(\omega) = \mathbb{P}_{1/2} \left\{ \left|S/n - 1/2\right| \geq \left|S(\omega)/n - 1/2\right| \right\}. \end{equation}\]

Denoting by \(K(\omega) = S(\omega) - n/2\),

\[\begin{align*} \operatorname{p-value}(\omega) & = 1 - \mathbb{P}_{1/2} \left\{ n/2 - \left|K(\omega)\right| < S < n/2 + \left|K(\omega)\right| \right\} \\ & = 1 - F_{1/2}(\left(n/2 + \left|K(\omega)\right|\right)-) + F_{1/2}(n/2 - \left|K(\omega)\right|). \end{align*}\]

Theorem 2. Suppose the setting of Theorem 1 and that, in addition, \(\Theta_0 = \{\theta_0\}\) is a singleton and \(f(X)\) has a continuous and strictly increasing CDF under \(\theta_0\). Then, the p-value has a uniform distribution on \([0,1]\) under \(\theta_0\).

In other words, if the null hypothesis is true, the p-value (under the setting of Theorem 2) is uniformly distributed on \([0, 1]\).

Proof. Denote by \(G\) the CDF of \(f(X)\) under \(\theta_0\). First, note that

\[\begin{equation} \mathbb{P}_{\theta_0} \left\{ f(X) \geq f(X(\omega)) \right\} = 1 - G[f(X(\omega))]. \end{equation}\]


\[\begin{align*} \mathbb{P}_{\theta_0} \left\{ \omega \colon \operatorname{p-value}(\omega) \leq u \right\} & = 1 - \mathbb{P}_{\theta_0} \left\{ \omega \colon G[f(X(\omega))] \leq 1 - u \right\} \\ & = 1 - \mathbb{P}_{\theta_0} \left\{ \omega \colon f(X(\omega)) \leq G^{-1}(1 - u) \right\} \\ & = 1 - G(G^{-1}(1 - u)) = u. \square \end{align*}\]