## 1.

The posterior density is

$f(\theta \mid X^n)\propto \mathcal{L}(\theta)f(\theta)\propto\exp(-g(\theta))$

where

\begin{align*} 2 g(\theta) &=\frac{1}{\sigma^{2}}\sum_{i}\left(X_{i}-\theta\right)^{2}+\frac{1}{b^{2}}\left(\theta-a\right)^{2}\\ &=\frac{b^{2}\left(\sum_{i}X_{i}^{2}-2X_{i}\theta+\theta^{2}\right)+\sigma^{2}\left(\theta^{2}-2a\theta+a^{2}\right)}{\sigma^{2}b^{2}}\\ &=\frac{\theta^{2}\left(nb^{2}+\sigma^{2}\right)-2\theta\left(\sigma^{2}a+b^{2}\sum_{i}X_{i}\right)}{\sigma^{2}b^{2}}+\text{const.}\\ &=\frac{nb^{2}+\sigma^{2}}{\sigma^{2}b^{2}}\left(\theta-\frac{\sigma^{2}a+b^{2}\sum_{i}X_{i}}{nb^{2}+\sigma^{2}}\right)^{2}+\text{const.} \end{align*}

It follows that the posterior density is normal with variance

$\tau^{2}=\frac{\sigma^{2}b^{2}}{nb^{2}+\sigma^{2}}=\left(\frac{n}{\sigma^{2}}+\frac{1}{b^{2}}\right)^{-1}=\left(\frac{1}{\operatorname{se}^{2}}+\frac{1}{b^{2}}\right)^{-1}$

and mean

$\bar{\theta}=\frac{\sigma^{2}a+b^{2}\sum_{i}X_{i}}{nb^{2}+\sigma^{2}}=\frac{1}{b^{2}/\operatorname{se}^{2}+1}a+\frac{1}{1+\operatorname{se}^{2}/b^{2}}\frac{1}{n}\sum_{i}X_{i}.$

## 2.

### a)

np.random.seed(1)
samples = np.random.randn(100) + 5.


### b)

The posterior is

$f(\mu\mid X^{n})=\mathcal{L}(\mu)f(u)=\mathcal{L}(\mu)\propto\exp\left(-\frac{1}{2\sigma^{2}}\sum_{i}\left(X_{i}-\mu\right)^{2}\right).$

Note that

$\begin{multline*} \frac{1}{n}\sum_{i}\left(X_{i}-\mu\right)^{2}=\frac{1}{n}\sum_{i}X_{i}^{2}-2\mu X_{i}+\mu^{2}=\mu^{2}-\frac{2\mu}{n}\sum_{i}X_{i}+\text{const.}\\ =\left(\mu-\frac{1}{n}\sum_{i}X_{i}\right)^{2}. \end{multline*}$

Therefore, the posterior is a normal distribution with mean $$n^{-1}\sum_{i}X_{i}$$ and variance $$\sigma^{2}/n$$ (see Part (c) for a plot).

### c)

The plot is generated by the code below.

post_mu_mean = np.mean(samples)
post_mu_std = np.sqrt(1. / samples.size)
post_mu_grid = np.linspace(post_mu_mean - 4. * post_mu_std,
post_mu_mean + 4. * post_mu_std, 1000)
post_mu_pdf = scipy.stats.norm.pdf(post_mu_grid, loc=post_mu_mean,
scale=post_mu_std)
np.random.seed(1)
post_mu_samples = post_mu_std * np.random.randn(1000) + post_mu_mean

plt.figure(figsize=(1.618*5., 5.))
plt.hist(post_mu_samples, bins=100, density=True)
plt.plot(post_mu_grid, post_mu_pdf, linewidth=4)
plt.title('Posterior of $\\mu$: sampled vs. analytic')
plt.xlabel('$\\mu$')


### d)

$$\theta \mid X^n$$ is log-normally distributed since $$\log \theta = \mu$$ and $$\mu \mid X^n$$ is normally distributed.

The plot is generated by the code below.

post_theta_med = np.exp(post_mu_mean)
post_theta_std = np.sqrt((np.exp(post_mu_std**2) - 1.) \
* np.exp(2. * post_mu_mean + post_mu_std**2))
post_theta_grid = np.linspace(post_theta_med - 4. * post_theta_std,
post_theta_med + 4. * post_theta_std, 1000)
post_theta_pdf = scipy.stats.lognorm.pdf(post_theta_grid, post_mu_std,
scale=post_theta_med)
post_theta_samples = np.exp(post_mu_samples)

plt.figure(figsize=(1.618*5., 5.))
plt.hist(post_theta_samples, bins=100, density=True)
plt.plot(post_theta_grid, post_theta_pdf, linewidth=4)
plt.title('Posterior of $\\theta$: sampled vs. analytic')
plt.xlabel('$\\theta$')


### e)

Evaluating the code below yields [4.86, 5.25] as an approximate 95% confidence interval for $$\mu$$.

index = int(post_mu_samples.size * 0.025)
sorted_post_mu_samples = np.sort(post_mu_samples)
post_mu_ci_95_lo = sorted_post_mu_samples[  index   ]
post_mu_ci_95_hi = sorted_post_mu_samples[-(index+1)]


### f)

Since $$\mathbb{P}(\mu\leq b)=\mathbb{P}(\theta\leq e^{b})$$ and $$\mathbb{P}(\mu\geq a)=\mathbb{P}(\theta\geq e^{a})$$, it is sufficient to exponentiate the lower and upper bounds from Part (e) to get a 95% confidence interval for $$\theta$$.

Evaluating the code below yields [129.64, 190.77] as an approximate 95% confidence interval for $$\theta$$.

post_ci_95_theta_lo = np.exp(post_mu_ci_95_lo)
post_ci_95_theta_hi = np.exp(post_mu_ci_95_hi)


## 3.

The posterior density is proportional to

$f(\theta\mid X^{n})\propto\mathcal{L}(\theta)f(\theta)=\frac{1}{\theta}\prod_{i}\left[\frac{1}{\theta}I_{(X_{i},\infty)}(\theta)\right]=\frac{1}{\theta^{n+1}}I_{(X_{(n)},\infty)}(\theta).$

In particular, $$f(\theta\mid X^{n})$$ is a power law density with normalizing constant

$c=\int_{X_{(n)}}^{\infty}\frac{1}{\theta^{n+1}}d\theta=\frac{1}{nX_{(n)}^{n}}.$

## 4.

### a)

By equivariance, the MLE is

$\hat{\tau}=\hat{p}_{2}-\hat{p}_{1}=\frac{40}{50}-\frac{30}{50}=0.2.$

The standard error is (see Chapter 9 Question 7 Part (c))

$\widehat{\operatorname{se}}(\hat{\tau})=\sqrt{\frac{\hat{p}_{1}\left(1-\hat{p}_{1}\right)}{50}+\frac{\hat{p}_{2}\left(1-\hat{p}_{2}\right)}{50}}\approx 0.089.$

Therefore, a 90% confidence interval for $$\tau$$ is

$\hat{\tau}\pm1.645\cdot\widehat{\operatorname{se}}(\hat{\tau})\approx[0.053, 0.35].$

The computation above is replicated in the code below.

n_patients = 50
placebo_success = 30
treatment_success = 40

mle_p1 = placebo_success / n_patients
mle_p2 = treatment_success / n_patients
mle_tau = mle_p2 - mle_p1
mle_tau_se = np.sqrt(  mle_p1 * (1. - mle_p1) / n_patients \
+ mle_p2 * (1. - mle_p2) / n_patients)
mle_tau_ci_90_lo = mle_tau - 1.645 * mle_tau_se
mle_tau_ci_90_hi = mle_tau + 1.645 * mle_tau_se


### b)

Evaluating the code below yields approximately the same standard error and confidence interval found in Part (a).

n_sims = 10**6

np.random.seed(1)
bstrap_p1_samples = np.random.binomial(n=n_patients, p=mle_p1,
size=n_sims) / n_patients
bstrap_p2_samples = np.random.binomial(n=n_patients, p=mle_p2,
size=n_sims) / n_patients
bstrap_tau_samples = bstrap_p2_samples - bstrap_p1_samples
bstrap_tau_mean = np.mean(bstrap_tau_samples)
bstrap_tau_std = np.std(bstrap_tau_samples)
bstrap_tau_ci_90_lo = bstrap_tau_mean - 1.645 * bstrap_tau_std
bstrap_tau_ci_90_hi = bstrap_tau_mean + 1.645 * bstrap_tau_std


### c)

Under the prior $$f(p_{1},p_{2})=1$$, the posterior is proportional to

$f(p_{1},p_{2}\mid X_{1},X_{2})\propto p_{1}^{X_{1}}(1-p_{1})^{n-X_{1}}p_{2}^{X_{2}}(1-p_{2})^{n-X_{2}}.$

By Theorem 2.33, the posterior is a product of independent distributions with densities

$g_{i}(p_{i})\propto p_{i}^{X_{i}}(1-p_{i})^{n-X_{i}}.$

It follows that each is a Beta distribution with parameters $$\alpha_{i}=X_{i}+1$$ and $$\beta_{i}=n-X_{i}+1$$.

Evaluating the code below yields a posterior mean of approximately 0.19 and posterior 90% confidence interval of [0.047, 0.34].

np.random.seed(1)
post_p1_samples = np.random.beta(a=placebo_success + 1,
b=n_patients - placebo_success + 1,
size=n_sims)
post_p2_samples = np.random.beta(a=treatment_success + 1,
b=n_patients - treatment_success + 1,
size=n_sims)
post_tau_samples = post_p2_samples - post_p1_samples
post_tau_mean = np.mean(post_tau_samples)
index = int(n_sims * 0.05)
sorted_post_tau_samples = np.sort(post_tau_samples)
post_tau_ci_90_lo = sorted_post_tau_samples[  index   ]
post_tau_ci_90_hi = sorted_post_tau_samples[-(index+1)]


### d)

Let

$g(p_{1},p_{2})=\log\left(\left(\frac{p_{1}}{1-p_{1}}\right)\div\left(\frac{p_{2}}{1-p_{2}}\right)\right).$

By equivariance, the MLE is $$\hat{\psi}=g(\hat{p}_{1},\hat{p}_{2})\approx-0.98$$. The Fisher information matrix is (see Chapter 9 Question 7 Part (b))

$I(p_{1},p_{2})=\operatorname{diag}\left(\frac{n}{p_{1}\left(1-p_{1}\right)},\frac{n}{p_{2}\left(1-p_{2}\right)}\right).$

Moreover,

$\nabla g(p_{1},p_{2})^{\intercal}=\left(\frac{1}{p_{1}\left(1-p_{1}\right)},\frac{1}{p_{2}\left(1-p_{2}\right)}\right).$

Therefore, by the delta method,

$\widehat{\operatorname{se}}(\hat{\psi})=\sqrt{\hat{\nabla}g^{\intercal}I(\hat{p}_{1},\hat{p}_{2})^{-1}\hat{\nabla}g}=\sqrt{\frac{1}{\hat{p}_{1}\left(1-\hat{p}_{1}\right)n}+\frac{1}{\hat{p}_{2}\left(1-\hat{p}_{2}\right)n}}\approx0.46.$

A 90% confidence interval for $$\psi$$ is

$\hat{\psi}\pm1.645\widehat{\operatorname{se}}(\hat{\psi})\approx[-1.73,0.23].$

The computation above is replicated in the code below.

mle_p1_ratio = mle_p1 / (1. - mle_p1)
mle_p2_ratio = mle_p2 / (1. - mle_p2)
mle_psi = np.log(mle_p1_ratio / mle_p2_ratio)
mle_psi_se = np.sqrt(  1./ (mle_p1 * (1. - mle_p1) * n_patients) \
+ 1./ (mle_p2 * (1. - mle_p2) * n_patients))
mle_psi_ci_90_lo = mle_psi - 1.645 * mle_psi_se
mle_psi_ci_90_hi = mle_psi + 1.645 * mle_psi_se


### e)

Evaluating the code below yields a posterior mean of approximately -0.95 and posterior 90% confidence interval of [-1.70, -0.22].

post_p1_samples_ratio = post_p1_samples / (1. - post_p1_samples)
post_p2_samples_ratio = post_p2_samples / (1. - post_p2_samples)
post_psi_samples = np.log(post_p1_samples_ratio / post_p2_samples_ratio)
post_psi_mean = np.mean(post_psi_samples)
sorted_post_psi_samples = np.sort(post_psi_samples)
post_psi_ci_90_lo = sorted_post_psi_samples[  index   ]
post_psi_ci_90_hi = sorted_post_psi_samples[-(index+1)]


## 5.

Let $$n$$ be the number of trials $$k$$ be the number of successes (in this case, 10 and 2). Then,

$f(p\mid X^{n})=\mathcal{L}(p)f(p)\propto p^{k}\left(1-p\right)^{n-k}p^{\alpha-1}\left(1-p\right)^{\beta-1}$

and hence the prior is a conjugate prior. In particular, the posterior is a Beta distribution with $$\bar{\alpha}=k+\alpha$$ and $$\bar{\beta}=n-k+\beta$$.

## 6.

### a)

The likelihood is

$\mathcal{L}(\lambda)=\prod_{i}\frac{\lambda^{X_{i}}e^{-\lambda}}{X_{i}!}\propto\lambda^{\sum_{i}X_{i}}e^{-n\lambda}.$

The prior is a Gamma distribution:

$f(\lambda)\propto\lambda^{\alpha-1}e^{-\beta\lambda}.$

It follows that the posterior is also a Gamma distribution with parameters $$\bar{\alpha}=\alpha+\sum_{i}X_{i}$$ and $$\bar{\beta}=\beta+n$$. The posterior mean is $$\bar{\alpha} / \bar{\beta}$$.

### b)

The Jeffreys prior is

$f(\lambda)=I(\lambda)^{1/2}=\lambda^{-1/2}.$

Combining this with the likelihood computed in Part (a), the posterior is a Gamma distribution with parameters $$\bar{\alpha}=1/2+\sum_{i}X_{i}$$ and $$\bar{\beta}=n$$.

## 7.

Note that

$\mathbb{E}\hat{\psi}=\frac{1}{n}\sum_{i}\mathbb{E}\left[\frac{R_{i}Y_{i}}{\xi_{X_{i}}}\right]=\mathbb{E}\left[\frac{R_{1}Y_{1}}{\xi_{X_{1}}}\right]$

and

$\begin{multline*} \mathbb{E}\left[\frac{R_{1}Y_{1}}{\xi_{X_{1}}}\right]=\sum_{j}\frac{1}{\xi_{j}}\mathbb{E}\left[R_{1}\middle|X_{1}=j\right]\mathbb{E}\left[Y_{1}\middle|X_{1}=j\right]\mathbb{P}(X_{1}=j)\\ =\sum_{j}\mathbb{E}\left[Y_{1}\middle|X_{1}=j\right]\mathbb{P}(X_{1}=j)=\mathbb{E}Y_{1}. \end{multline*}$

Therefore, $$\mathbb{E}\hat{\psi}=\mathbb{E}Y_{1}$$. Similarly,

$\mathbb{V}(\hat{\psi})=\frac{1}{n}\mathbb{V}\left(\frac{R_{1}Y_{1}}{\xi_{X_{1}}}\right)=\frac{1}{n}\left( \mathbb{E}\left[\left(\frac{R_{1}Y_{1}}{\xi_{X_{1}}}\right)^{2}\right]-\mathbb{E}\left[Y_{1}\right]^{2}\right)$

and

$\begin{multline*} \mathbb{E}\left[\left(\frac{R_{1}Y_{1}}{\xi_{X_{1}}}\right)^{2}\right]=\sum_{j}\frac{1}{\xi_{j}^{2}}\mathbb{E}\left[R_{1}^{2}\mid X_{1}=j\right]\mathbb{E}\left[Y_{1}^{2}\mid X_{1}=j\right]\mathbb{P}(X_{1}=j)\\ =\sum_{j}\frac{1}{\xi_{j}}\mathbb{E}\left[Y_{1}\mid X_{1}=j\right]\mathbb{P}(X_{1}=j)\le\frac{1}{\delta}\mathbb{E}\left[Y_{1}\right]. \end{multline*}$

Therefore,

$\mathbb{V}(\hat{\psi})\leq\frac{1}{n}\left( \frac{1}{\delta}\mathbb{E}\left[Y_{1}\right]-\mathbb{E}\left[Y_{1}\right]^{2}\right) \leq\frac{1}{n\delta}.$

## 8.

Let $$X_{1},\ldots,X_{n}\sim N(\mu,1)$$. The MLE is $$\hat{\mu}=n^{-1}\sum_{i}X_{i}$$ with standard error $$\operatorname{se}(\hat{\mu})=1/\sqrt{n}$$. Therefore, the Wald statistic is $$W=\sqrt{n}\hat{\mu}$$, with p-value $$2\Phi(-\sqrt{n}|\hat{\mu}|)$$. Clearly, if $$\mu\neq0$$, then $$\sqrt{n}\hat{\mu}$$ diverges and hence the p-value converges to zero, correctly rejecting the null.

Next, note that

$\mathcal{L}(\mu)f_{H_{1}}(\mu)=\frac{1}{b}\left(\frac{1}{\sqrt{2\pi}}\right)^{n+1}\exp\left\{ -\frac{1}{2}\left[\frac{1}{b^{2}}\mu^{2}+\sum_{i}\left(\mu-X_{i}\right)^{2}\right]\right\} .$

Let

$\sigma^{2}=\frac{b^{2}}{1+nb^{2}}$

Then,

\begin{align*} \frac{1}{b^{2}}\mu^{2}+\sum_{i}\left(\mu-X_{i}\right)^{2} & =\left(\frac{1}{b^{2}}+n\right)\mu^{2}-2\mu\sum_{i}X_{i}+\sum_{i}X_{i}^{2}\\ & =\frac{1}{\sigma^{2}}\mu^{2}-2\mu\sum_{i}X_{i}+\sum_{i}X_{i}^{2}\\ & =\frac{1}{\sigma^{2}}\left(\mu^{2}-2\sigma^{2}\mu\sum_{i}X_{i}\right)+\sum_{i}X_{i}^{2}\\ & =\frac{1}{\sigma^{2}}\left(\mu-\sigma^{2}\sum_{i}X_{i}\right)^{2}+\sum_{i}X_{i}^{2}-\left(\sigma\sum_{i}X_{i}\right)^{2}. \end{align*}

Therefore,

$\mathcal{L}(\mu)f_{H_{1}}(\mu)=C\frac{1}{\sqrt{2\pi}\sigma}\exp\left\{ -\frac{1}{2\sigma^{2}}\left(\mu-\sigma^{2}\sum_{i}X_{i}\right)^{2}\right\} .$

where

$c=\frac{\sigma}{b}\left(\frac{1}{\sqrt{2\pi}}\right)^{n}\exp\left\{ \frac{1}{2}\left(\sigma\sum_{i}X_{i}\right)^{2}-\frac{1}{2}\sum_{i}X_{i}^{2}\right\} .$

In particular, $$\int\mathcal{L}(\mu)f_{H_{1}}(\mu)d\mu=C$$. By the derivation in Section 11.8,

$\mathbb{P}(H_{0}\mid X^{n}=x^{n})=\frac{\mathcal{L}(0)}{\mathcal{L}(0)+c}=\frac{1}{1+c/\mathcal{L}(0)}.$

Moreover,

$\mathcal{L}(0)=\frac{1}{\sqrt{2\pi}}\exp\left(-\frac{1}{2}\sum_{i}X_{i}^{2}\right)$

and hence

$\frac{c}{\mathcal{L}(0)}=\frac{\sigma}{b}\left(\frac{1}{\sqrt{2\pi}}\right)^{n-1}\exp\left\{ \frac{1}{2}\left(\sigma\sum_{i}X_{i}\right)^{2}\right\} .$

If $$\mu\neq0$$, then the $$c/\mathcal{L}(0)$$ diverges and hence $$\mathbb{P}(H_{0}\mid X^{n}=x^{n})$$ converges to zero, correctly rejecting the null.

However, for finite $$n$$, the two tests can disagree. For example, using $$n = 1400$$ samples and a true mean of $$\mu=0.1$$, an extremely small Wald test p-value of approximately $$1.05 \times 10^{-7}$$ is observed. On the other hand, $$\mathbb{P}(H_0 \mid X^n)$$ can be close to one for particular choices of $$b$$.

Code to compute the p-value and generate the above plot is given below.

mu = 0.1
n_sims = 1000

np.random.seed(1)
samples = np.random.randn(n_sims) + mu

# Wald test
mle_mu = np.mean(samples)
wald = np.sqrt(n_sims) * mle_mu
p_value = 2. * scipy.stats.norm.cdf(-np.abs(wald))

# Bayesian test
b = np.linspace(1e-2, 1000., 10**6)
sigma2 = b**2 / (1. + n_sims * b**2)
a = np.sqrt(sigma2) / b * (2 * np.pi)**(-n/2. + 0.5) * np.exp(
0.5 * sigma2 * np.sum(samples)**2)
post_prob_null = 1. / (1. + a)
plt.figure(figsize=(1.618 * 5., 5.))
plt.semilogx(b, post_prob_null)
plt.title('$\\mathbb(H_0 \\mid X^})$'.format(n_sims))
plt.xlabel('$b$')