1.

Let \(\hat{F}\) be the empirical distribution. The method of moment estimators (MMEs) satisfy

\[\begin{align*} \mathbb{E}_{\hat{F}}X & =\frac{\tilde{\alpha}}{\tilde{\beta}}\\ \mathbb{E}_{\hat{F}}[X^{2}] & =\frac{\tilde{\alpha}\left(\tilde{\alpha}+1\right)}{\tilde{\beta}^{2}}. \end{align*}\]

Solving for \(\tilde{\alpha}\) and \(\tilde{\beta}\),

\[\begin{align*} \tilde{\alpha} & =\frac{\left(\mathbb{E}_{\hat{F}}X\right)^{2}}{\operatorname{Var}_{\hat{F}}(X)}\\ \tilde{\beta} & =\frac{\mathbb{E}_{\hat{F}}X}{\operatorname{Var}_{\hat{F}}(X)}. \end{align*}\]

2.

a)

The MMEs satisfy

\[\begin{align*} \mathbb{E}_{\hat{F}}X & =\frac{\tilde{a}+\tilde{b}}{2}\\ \operatorname{Var}_{\hat{F}}(X) & =\frac{1}{12} \left(\tilde{b}-\tilde{a}\right)^{2}. \end{align*}\]

Define \(\tilde{c}=\tilde{b}-\tilde{a}\). It follows that

\[\tilde{c}=2\sqrt{3} \operatorname{Std}_{\hat{F}}(X).\]

Since \(\tilde{a}+\tilde{b}=2\tilde{a}+\tilde{c}\) and \(\tilde{a}+\tilde{b}=2\tilde{b}-\tilde{c}\),

\[\begin{align*} \tilde{a} & =\mathbb{E}_{\hat{F}}X-\sqrt{3} \operatorname{Std}_{\hat{F}}(X)\\ \tilde{b} & =\mathbb{E}_{\hat{F}}X+\sqrt{3} \operatorname{Std}_{\hat{F}}(X). \end{align*}\]

b)

The maximum likelihood estimators (MLEs) maximize

\[\mathcal{L}(a,b)=\prod_{i}f(X_{i};a,b)=\frac{1}{\left(b-a\right)^{n}}\prod_{i}I_{(a,b)}(X_{i}).\]

The maximum occurs at \(\hat{a}=\min_{i}X_{i}\) and \(\hat{b}=\max_{i}X_{i}\).

c)

By equivariance, the MLE of \(\tau = (a + b) / 2\) is \(\hat{\tau} = (\hat{a} + \hat{b}) / 2\).

d)

Estimator Mean squared error (MSE)
MLE 0.015
Non-parametric plug-in estimate 0.033

The MSE of \(\hat{\tau}\), the MLE of the mean, is computed using the code below.

import numpy as np

a = 1.
b = 3.
n = 10
n_sims = 10**6

samples = np.random.uniform(low=a, high=b, size=[n_sims, n])

a_mle = np.min(samples, axis=1)  # Maximum likelihood estimator of a.
b_mle = np.max(samples, axis=1)  # Maximum likelihood estimator of b.

tau     = (a     + b    ) / 2.   # Mean of Uniform(a, b).
tau_mle = (a_mle + b_mle) / 2.   # Maximum likelihood estimator of the mean.

mse = np.mean((tau_mle - tau)**2)

The non-parameteric plug-in estimate of \(\tau\) is \(\tilde{\tau} = \mathbb{E}_{\hat{F}} X\). Since this estimator is unbiased, its MSE is

\[\operatorname{Var}(\tilde{\tau})=\frac{1}{n}\operatorname{Var}(X_{1})=\frac{1}{12n}\left(b-a\right)^{2}\]

3.

a)

Let \(Z\) be a standard normal random variable so that

\[0.95=\mathbb{P}(X<\tau)=\mathbb{P}(Z<\left(\tau-\mu\right)/\sigma)=F_{Z}(\left(\tau-\mu\right)/\sigma)\]

and hence

\[\tau=g(\mu, \sigma)\equiv F_{Z}^{-1}(0.95)\sigma+\mu.\]

The MLEs of the mean and standard deviation of the original distribution are \(\hat{\mu}=\mathbb{E}_{\hat{F}}X\) and \(\hat{\sigma}=\operatorname{Std}_{\hat{F}}(X)\). By equivariance, the MLE of \(\tau\) is \(\hat{\tau} = g(\hat{\mu}, \hat{\sigma})\).

b)

By the delta method,

\[\begin{multline*} \hat{\operatorname{se}}(\hat{\tau})=\sqrt{\nabla g^{\intercal}\hat{J}_{n}\nabla g}=\sqrt{\frac{1}{n}\begin{pmatrix}1\\ F_{Z}^{-1}(0.95) \end{pmatrix}^{\intercal}\begin{pmatrix}\sigma^{2} & 0\\ 0 & \frac{\sigma^{2}}{2} \end{pmatrix}\begin{pmatrix}1\\ F_{Z}^{-1}(0.95) \end{pmatrix}}\\ =\sigma\sqrt{\frac{1}{n}+\frac{1}{2n}\left(F_{Z}^{-1}(0.95)\right)^{2}} \end{multline*}\]

A \(1 - \alpha\) confidence interval is \(\hat{\tau} \pm z_{\alpha / 2} \cdot \hat{\operatorname{se}}(\hat{\tau}).\)

c)

Estimator SE
Delta Method 0.558
Parametric Bootstrap (100,000 samples) 0.557

Code for the delta method and parametric bootstrap are given below.

import numpy as np
from scipy.stats import norm

data = np.array([ 3.23, -2.50,  1.88, -0.68,  4.43,  0.17,
                  1.03, -0.07, -0.01,  0.76,  1.76,  3.18,
                  0.33, -0.31,  0.30, -0.61,  1.52,  5.43,
                  1.54,  2.28,  0.42,  2.33, -1.03,  4.00,
                  0.39])

se_delta_method = np.std(data) \
                * np.sqrt(1. / n_samples * (1. + 0.5 * norm.ppf(0.95)**2))

n_samples = data.size
n_sims = 10**5
samples = np.std(data) * np.random.randn(n_sims, n_samples) + np.mean(data)
tau_mles = np.std(samples, axis=1) * norm.ppf(0.95) + np.mean(samples, axis=1)
se_parametric_boostrap = np.std(tau_mles)

4.

By Question 2 (b), the MLE of \(\theta\) is \(Y = \max_i X_i\). Its CDF is \(F_Y = F_{X_1}^n\). Therefore, for any \(\epsilon > 0\), \(F_Y(\theta - \epsilon) < 1\) and \(F_Y(\theta + \epsilon) = 1\). It follows that

\[\begin{multline*} \mathbb{P}(\left|Y-\theta\right|>\epsilon)=1-\mathbb{P}(\theta-\epsilon\leq Y\leq\theta+\epsilon)=1-F_{Y}(\theta+\epsilon)+F_{Y}(\theta-\epsilon)\\ =F_{Y}(\theta-\epsilon)=F_{X_{1}}(\theta-\epsilon)^{n} \end{multline*}\]

Taking a limit in \(n\) yields the desired result.

5.

Since \(\lambda = \mathbb{E} X_1\), the MME is the sample mean.

The MLE is also the sample mean. To see this, note that

\[\log f(X_{i};\lambda)=\log\left(\frac{\lambda^{X_{i}}e^{-\lambda}}{X_{i}!}\right)=X_{i}\log\lambda-\lambda-\log(X_{i}!)\]

and hence

\[(s(X_{i};\lambda))\equiv \frac{d}{d\lambda}\left[f(X_{i};\lambda)\right]=\frac{X_{i}}{\lambda}-1.\]

Therefore, the derivative of the log likelihood is

\[\frac{d}{d\lambda}\left[\log\mathcal{L}(\lambda)\right]=\sum_{i}\frac{d}{d\lambda}\left[\log f(X_{i};\lambda)\right]=\frac{1}{\lambda}\mathbb{E}_{\hat{F}}X-n.\]

Setting this to zero and solving for the parameter yields the desired result.

The Fisher information is

\[I(\lambda)=\operatorname{Var}(s(X_{i};\lambda))=\frac{1}{\lambda}.\]

6.

a)

\(\hat{\theta}=\mathbb{E}_{\hat{F}}X\) is the MLE of the mean. Let \(Z\) be a standard normal random variable. Since

\[\psi=\mathbb{P}(Y_{1}=1)=\mathbb{P}(X_{1}>0)=\mathbb{P}(Z>-\theta)=F_{Z}(\theta),\]

the MLE of \(\psi\) is \(\hat{\psi}=F_{Z}(\hat{\theta})\) by equivariance.

b)

An approximate 95% confidence interval (CI) for \(\psi\) is \(\hat{\psi}\pm 2\hat{\operatorname{se}}(\hat{\psi})\) where, by the delta method,

\[\hat{\operatorname{se}}(\hat{\psi})=\left|f_{Z}(\hat{\theta})\right|\operatorname{se}(\hat{\theta})=f_{Z}(\hat{\theta})\frac{1}{\sqrt{n}}\]

c)

By the law of large numbers (LLN), \(\tilde{\psi}\) converges in probability to \(\mathbb{E}Y_{1}=\mathbb{P}(Y_{1}=1)=\psi\).

d)

Similarly to Part (b),

\[\operatorname{se}(\hat{\psi})=f_{Z}(\theta)\frac{1}{\sqrt{n}}.\]

Moreover,

\[\operatorname{se}(\tilde{\psi})=\frac{1}{\sqrt{n}}\operatorname{se}(Y_{1})=\sqrt{\frac{\psi\left(1-\psi\right)}{n}}.\]

Therefore,

\[\operatorname{ARE}(\tilde{\psi},\hat{\psi})=\frac{\left(f_{Z}(\theta)\right)^{2}}{\psi\left(1-\psi\right)}\]

It’s possible to show that the ARE achieves its maximum value of \(2 / \pi \approx 0.637\) at \(\theta = 0\). Note that this quantity is necessarily less than one due to the asymptotic optimality of the MLE (Theorem 9.23).

e)

Suppose the distribution is not normal. Under sufficient regularity, the LLN guarantees the sample mean \(\hat{\theta}\) to converge, in probability, to the true mean \(\mu\) of the distribution. As such, \(\hat{\psi} = F_Z(\hat{\theta})\) converges, in probability, to \(F_Z(\mu)\).

7.

a)

The log-likelihood for drug \(i=1,2\) is

\[s(X_{i};p_{i})=\log f(X_{i};p_{i})=\log\binom{n_{i}}{X_{i}}+X_{i}\log p_{i}+\left(n-X_{i}\right)\log\left(1-p_{i}\right).\]

Taking derivatives,

\[\frac{d}{dp_{i}}\left[s(X_{i};p_{i})\right]=\frac{X_{i}}{p_{i}}+\frac{X_{i}-n}{1-p_{i}}=\frac{X_{i}-np_{i}}{p_{i}\left(1-p_{i}\right)}.\]

It follows that the MLE is \(\hat{p}_{i}=X_{i}/n\). By equivariance the MLE of \(\psi=p_{1}-p_{2}\) is \(\hat{\psi}=\hat{p}_{1}-\hat{p}_{2}\).

b)

The Fisher information of drug \(i=1,2\) is

\[\operatorname{Var}(s(X_{i};p_{i}))=\operatorname{Var}\left(\frac{X_{i}}{p_{i}}+\frac{X_{i}-n_{i}}{1-p_{i}}\right)=\frac{n_{i}}{p_{i}\left(1-p_{i}\right)}.\]

Since the two trials are independent, the complete Fisher information is

\[I(p_{1},p_{2})=\operatorname{diag}\left(\frac{n_{1}}{p_{1}\left(1-p_{1}\right)},\frac{n_{2}}{p_{2}\left(1-p_{2}\right)}\right).\]

c)

By the delta method,

\[\hat{\operatorname{se}}(\hat{\psi})=\sqrt{\begin{pmatrix}+1\\ -1 \end{pmatrix}^{\intercal}I(\hat{p}_{1},\hat{p}_{2})^{-1}\begin{pmatrix}+1\\ -1 \end{pmatrix}}=\sqrt{\frac{\hat{p}_{1}\left(1-\hat{p}_{1}\right)}{n_{1}}+\frac{\hat{p}_{2}\left(1-\hat{p}_{2}\right)}{n_{2}}}.\]

d)

Method 90% CI Lower bound 90% CI Upper bound
Delta Method -0.009 0.129
Parametric Bootstrap (100,000 samples) -0.009 0.129
import numpy as np
from scipy.stats import norm

n      = 200
x1     = 160
x2     = 148
n_sims = 10**5

p1_mle   = x1/n
p2_mle   = x2/n
psi_mle  = p1_mle - p2_mle
ppf_0p95 = norm.ppf(0.95)

se_delta = np.sqrt(p1_mle * (1. - p1_mle) / n + p2_mle * (1. - p2_mle) / n)
print('90% CI delta method: [{:.3f}, {:.3f}]'.format(
    psi_mle - 1.645 * se_delta_method, psi_mle + 1.645 * se_delta_method))

samples1 = np.random.binomial(n, p1_mle, size=[n_sims])
samples2 = np.random.binomial(n, p2_mle, size=[n_sims])
psi_mles = samples1/n - samples2/n
se_parametric_bootstrap = np.std(psi_mles)
print('90% CI parametric bootstrap: [{:.3f}, {:.3f}]'.format(
    psi_mle - 1.645 * se_parametric_bootstrap,
    psi_mle + 1.645 * se_parametric_bootstrap))

8.

The log likelihood is

\[\mathcal{L}(\mu,\sigma)=\sum_{i}\log f(X_{i};\mu,\sigma)=-\frac{n}{2}\log(2\pi)-n\log(\sigma)-\frac{1}{2\sigma^{2}}\sum_{i}\left(X_{i}-\mu\right)^{2}.\]

Taking derivatives,

\[\begin{align*} \frac{\partial\mathcal{L}}{\partial\mu} & =\frac{\sum_{i}\left(X_{i}-\mu\right)}{\sigma^{2}}\\ \frac{\partial^{2}\mathcal{L}}{\partial\mu^{2}} & =-\frac{n}{\sigma^{2}}\\ \frac{\partial\mathcal{L}}{\partial\sigma} & =-\frac{n}{\sigma}+\frac{1}{\sigma^{3}}\sum_{i}\left(X_{i}-\mu\right)^{2}\\ \frac{\partial\mathcal{L}}{\partial\mu\partial\sigma} & =-\frac{2\sum_{i}\left(X_{i}-\mu\right)}{\sigma^{2}}\\ \frac{\partial^{2}\mathcal{L}}{\partial\sigma^{2}} & =\frac{n}{\sigma^{2}}-\frac{3}{\sigma^{4}}\sum_{i}\left(X_{i}-\mu\right)^{2}. \end{align*}\]

Taking expectations,

\[\begin{align*} \mathbb{E}\left[\frac{\partial\mathcal{L}}{\partial\mu\partial\sigma}\right] & =0\\ \mathbb{E}\left[\frac{\partial^{2}\mathcal{L}}{\partial\sigma^{2}}\right] & =-\frac{2n}{\sigma^{2}}. \end{align*}\]

Therefore, the Fisher information is

\[I(\mu,\sigma)=-\frac{n}{\sigma^{2}}\begin{pmatrix}1\\ & 2 \end{pmatrix}.\]

9.

a)

Results are given below.

Method 95% CI Lower bound 95% CI Upper bound
Delta Method 126.146 189.219
Parametric Bootstrap (100,000 samples) 126.076 189.288
Non-parametric Bootstrap (100,000 samples) 129.553 185.812

The MLE of \(\theta\) is \(\hat{\theta}=g(\hat{\mu})\equiv\exp\hat{\mu}\) where \(\hat{\mu}=\mathbb{E}_{\hat{F}}X\) is the MLE of the mean. By the delta method,

\[\hat{\operatorname{se}}(\hat{\theta})=\left|g^{\prime}(\hat{\mu})\right|\operatorname{se}(\hat{\mu})=\frac{\exp(\hat{\mu})}{\sqrt{n}}.\]

Therefore, a 95% CI for \(\theta\) is \((1\pm2n^{-1/2})\exp(\hat{\mu})\).

Code for all three methods is given below.

import numpy as np

np.random.seed(1)
data = np.random.randn(100) + 5.
mu_mle = np.mean(data)

n_sims = 10**5
n_samples = data.size

print('95% CI delta method: [{:.3f}, {:.3f}]'.format(
    (1. - 2. / np.sqrt(n_samples)) * np.exp(mu_mle),
    (1. + 2. / np.sqrt(n_samples)) * np.exp(mu_mle)))

samples = np.random.randn(n_sims, n_samples) + mu_mle
theta_mles = np.exp(np.mean(samples, axis=1))
se_parametric_bootstrap = np.std(theta_mles)
print('95% CI parametric bootstrap: [{:.3f}, {:.3f}]'.format(
    np.exp(mu_mle) - 2. * se_parametric_bootstrap,
    np.exp(mu_mle) + 2. * se_parametric_bootstrap))

indices = np.random.randint(n_samples, size=[n_sims * n_samples])
samples = data[indices]
splits = np.split(samples, n_sims)
theta_mles = np.empty([n_sims])
for i, split in enumerate(splits):
    theta_mles[i] = np.exp(np.mean(split))
se_bootstrap = np.std(theta_mles)
print('95% CI non-parametric bootstrap: [{:.3f}, {:.3f}]'.format(
    np.exp(mu_mle) - 2. * se_bootstrap,
    np.exp(mu_mle) + 2. * se_bootstrap))

b)

10.

a)

The CDF of \(\hat{\theta}\) is \(F(y) = (y / \theta)^n\). This, along with CDFs of bootstrap estimators, are plotted below.

b)

The parametric bootstrap estimator of this parameter has a continuous distribution and hence zero probability of being equal to \(\hat{\theta}\) exactly. Let \(\hat{\theta}^{*}=\max_{1\leq i\leq n}X_{i}^{*}\) be the non-parametric bootstrap estimator. Note that

\[\mathbb{P}(\hat{\theta}^{*}=\hat{\theta})=1-\mathbb{P}(\hat{\theta}^{*}\neq X_{(n)})=1-\left(1-1/n\right)^{n}\rightarrow1-1/e\approx0.632.\]

That is, the non-parametric bootstrap estimator has a good chance of being equal to the MLE. These phenomena are visibile in the plot above.