## Optimal stopping I: a dynamic programming principle

The following is an expository post in which a dynamic programming principle is derived for an optimal stopping problem. The exposition is inspired by a proof in N. Touzi's textbook [1], an invaluable resource.

Before we begin, let's give some motivation. As an example, consider a risk-neutral stock given by the process $(X_t)_{t\geq 0}$. Optimal stopping describes the price of an American option paying off $g(X_t)$ at time $t$. Through this three-part series of posts, the reader is shown that the value of such an option is the unique viscosity solution of a partial differential equation (in particular, a single-obstacle variational inequality).

Consider a filtered probability space (with filtration $(\mathcal{F}_{t})_{t\geq0}$) satisfying the usual conditions, on which a standard Brownian motion $W_{t}$ is defined. Let $X_{s}^{t,x}$ denote the strong solution of the stochastic differential equation (SDE) $$dX_{s}=b(s,X_{s})ds+\sigma(s,X_{s})dW_{s}\text{ for }s>t\text{ and }X_{t}=x.$$ To ensure its existence and uniqueness, we need:

$b$ and $\sigma$ are Lipschitz and of linear growth in $x$ uniformly in $t$.

Let $T<\infty$ and $\mathscr{T}_{[t,T]}$ be the set of $[t,T]$ stopping times independent of $\mathcal{F}_{t}$. Consider the problem $$u(t,x)=\sup_{\tau\in\mathscr{T}_{[t,T]}}J(t,x;\tau)\text{ where }J(t,x;\tau)=\mathbb{E}\left[g(\tau,X_{\tau}^{t,x})\right]$$and $g$ is a given function. To ensure this is well-defined, we take the following:

$g:[0,T]\times\mathbb{R}^d$ is continuous and of quadratic growth (i.e., $|g(t,x)|\leq K(1+|x|^2)$ for some constant $K>0$ independent of $(t,x)$).

The above assumption implies that for all $s$ and $\tau\in\mathscr{T}_{[s,T]}$, the function $(t,x)\mapsto J(t,x;\tau)$ is continuous on $[0,s]\times\mathbb{R}^{d}$ by the following argument. Let $(t_{n}^\prime,x_{n}^\prime)_{n}$ be a sequence converging to $(t^\prime,x^\prime)\in[0,s]\times\mathbb{R}^{d}$. If we can show that $(g(\tau,X_{\tau}^{t_{n}^\prime,x_{n}^\prime}))_n$ is dominated by an integrable function, we can apply the dominated convergence theorem to get \begin{align*} \lim_{n\rightarrow\infty}J(t_{n}^\prime,x_{n}^\prime;\tau) & =\lim_{n\rightarrow\infty}\mathbb{E}\left[g(\tau,X_{\tau}^{t_{n}^\prime,x_{n}^\prime})\right]\\ & =\mathbb{E}\left[\lim_{n\rightarrow\infty}g(\tau,X_{\tau}^{t_{n}^\prime,x_{n}^\prime})\right]\\ & =\mathbb{E}\left[g(\tau,\lim_{n\rightarrow\infty}X_{\tau}^{t_{n}^\prime,x_{n}^\prime})\right]\\ & =\mathbb{E}\left[g(\tau,X_{\tau}^{t^\prime,x^\prime})\right]\\ & =J(t^\prime,x^\prime;\tau). \end{align*} Moreover, since $g$ is of quadratic growth, \begin{align*} \mathbb{E}\left[\left|g(\tau,X_{\tau}^{t_{n}^\prime,x_{n}^\prime})\right|\right] & \leq\mathbb{E}\left[K\left(1+\left|X_{\tau}^{t_{n}^\prime,x_{n}^\prime}\right|^{2}\right)\right]\\ & =K\left(1+\mathbb{E}\left[\left|X_{\tau}^{t_{n}^\prime,x_{n}^\prime}\right|^{2}\right]\right)\\ & \leq K_{0}\left(1+\left|x_{n}^\prime\right|^{2}\right) \end{align*} where $K_{0}$ can depend on $T$ (by the usual argument for Ito processes using Gronwall's lemma). Since $x_{n}^\prime\rightarrow x^\prime$, domination follows.

We denote by $f^{*}$ and $f_{*}$ the upper and lower semicontinuous envelopes of a function $f\colon Y\rightarrow[-\infty,\infty]$, where $Y$ is a given topological space.

Let $\theta\in\mathscr{T}_{[t,T]}$ be a stopping time such that $t < \theta < T$ and $X_{\theta}^{t,x}\in\mathbb{L}^{\infty}$. The following dynamic programming principle holds: \begin{align*} u(t,x) & \leq\sup_{\tau\in\mathscr{T}_{[t,T]}}\mathbb{E}\left[\mathbf{1}_{\left\{ \tau<\theta\right\} }g(\tau,X_{\tau}^{t,x})+\mathbf{1}_{\left\{ \tau\geq\theta\right\} }u^{*}(\theta,X_{\theta}^{t,x})\right].\\ u(t,x) & \geq\sup_{\tau\in\mathscr{T}_{[t,T]}}\mathbb{E}\left[\mathbf{1}_{\left\{ \tau<\theta\right\} }g(\tau,X_{\tau}^{t,x})+\mathbf{1}_{\left\{ \tau\geq\theta\right\} }u_{*}(\theta,X_{\theta}^{t,x})\right]. \end{align*}

Note that if $u$ is continuous, the above dynamic programming principle becomes, by virtue of $u=u_{*}=u^{*}$, $$u(t,x)=\sup_{\tau\in\mathscr{T}_{[t,T]}}\mathbb{E}\left[\mathbf{1}_{\left\{ \tau<\theta\right\} }g(\tau,X_{\tau}^{t,x})+\mathbf{1}_{\left\{ \tau\geq\theta\right\} }u(\theta,X_{\theta}^{t,x})\right].$$

Intuition behind proof: The $\leq$ inequality is established by the tower property (see the formal proof below). The $\geq$ inequality requires more work. Intuitively, we can take an $\epsilon$-optimal control $\tau^{\epsilon}(\theta)$ as follows: $$u(\theta,X_{\theta}^{t,x})\leq J(\theta,X_{\theta}^{t,x};\tau^{\epsilon}(\theta))+\epsilon.$$ Now, let $\tau$ be an arbitrary stopping time and $$\hat{\tau}=\tau\mathbf{1}_{\left\{ \tau<\theta\right\} }+\tau^{\epsilon}(\theta)\mathbf{1}_{\left\{ \tau\geq\theta\right\} }.$$ Then, \begin{align*} u(t,x) & \geq J(t,x;\hat{\tau})\\ & =\mathbb{E}\left[\mathbf{1}_{\left\{ \tau<\theta\right\} }g(\tau,X_{\tau}^{t,x})+\mathbf{1}_{\left\{ \tau\geq\theta\right\} }g(\tau^{\epsilon}(\theta),X_{\tau^{\epsilon}(\theta)}^{t,x})\right]\\ & =\mathbb{E}\left[\mathbf{1}_{\left\{ \tau<\theta\right\} }g(\tau,X_{\tau}^{t,x})+\mathbb{E}\left[\mathbf{1}_{\left\{ \tau\geq\theta\right\} }g(\tau^{\epsilon}(\theta),X_{\tau^{\epsilon}(\theta)}^{t,x})\mid\mathcal{F}_{\theta}\right]\right]\\ & =\mathbb{E}\left[\mathbf{1}_{\left\{ \tau<\theta\right\} }g(\tau,X_{\tau}^{t,x})+\mathbf{1}_{\left\{ \tau\geq\theta\right\} }J(\theta,X_{\theta}^{t,x};\tau^{\epsilon}(\theta))\right]\\ & \geq\mathbb{E}\left[\mathbf{1}_{\left\{ \tau<\theta\right\} }g(\tau,X_{\tau}^{t,x})+\mathbf{1}_{\left\{ \tau\geq\theta\right\} }u(\theta,X_{\theta}^{t,x})\right]-\epsilon. \end{align*} The desired result follows since $\tau$ and $\epsilon$ are arbitrary (take a sup over $\tau$ on both sides of the inequality). However, $\hat{\tau}$ is not a $\mathscr{T}_{[t,T]}$ stopping time, so the first inequality fails. In the proof below, this apparent issue is dealt with rigorously. We also mention another, perhaps less grave, issue: in the event that $u$ is not continuous, we cannot say anything about its measurability so that the expectation involving $u$ at a future time is ill-defined (this is the reason we use upper and lower semicontinuous envelopes in the above).

The $\leq$ inequality follows directly from the tower property: \begin{align*} J(t,x;\tau) & =\mathbb{E}\left[g(\tau,X_{\tau}^{t,x})\right]\\ & =\mathbb{E}\left[\mathbf{1}_{\left\{ \tau<\theta\right\} }g(\tau,X_{\tau}^{t,x})+\mathbf{1}_{\left\{ \tau\geq\theta\right\} }g(\tau,X_{\tau}^{t,x})\right]\\ & =\mathbb{E}\left[\mathbf{1}_{\left\{ \tau<\theta\right\} }g(\tau,X_{\tau}^{t,x})+\mathbb{E}\left[\mathbf{1}_{\left\{ \tau\geq\theta\right\} }g(\tau,X_{\tau}^{t,x})\mid\mathcal{F}_{\theta}\right]\right]\\ & =\mathbb{E}\left[\mathbf{1}_{\left\{ \tau<\theta\right\} }g(\tau,X_{\tau}^{t,x})+\mathbf{1}_{\left\{ \tau\geq\theta\right\} }J(\theta,X_{\theta}^{t,x};\tau)\right]\\ & \leq\mathbb{E}\left[\mathbf{1}_{\left\{ \tau<\theta\right\} }g(\tau,X_{\tau}^{t,x})+\mathbf{1}_{\left\{ \tau\geq\theta\right\} }u^{*}(\theta,X_{\theta}^{t,x})\right]. \end{align*} Now, take the supremum over all stopping times $\tau$ on both sides to arrive at the desired result. The $\geq$ inequality requires more work. For brevity, let $\mathcal{O}=(t,T)\times\mathbb{R}^{d}$ for the remainder. Let $\epsilon>0$ and $\varphi\colon[0,T]\times\mathbb{R}^d\rightarrow\mathbb{R}$ be an upper semicontinuous function satisfying $u\geq\varphi$. For each $(s,y)\in \mathcal{O}$, there exists $\tau^{s,y}\in\mathscr{T}_{[s,T]}$ such that $$u(s,y)\leq J(s,y;\tau^{s,y})+\epsilon.$$ Using the upper semicontinuity of $\varphi$ and the continuity of $J$ (see above), we can find a family $(r^{s,y})$ of positive constants such that $$\epsilon\geq\varphi(t^{\prime},x^{\prime})-\varphi(s,y)\text{ and }J(s,y;\tau^{s,y})-J(t^{\prime},x^{\prime};\tau^{s,y})\leq\epsilon \text{ for }(t^{\prime},x^{\prime})\in B(s,y;r^{s,y})$$ where $$B(s,y;r)=(s-r,s)\times\left\{ x\in\mathbb{R}^d\colon\left|x-y\right| < r\right\}.$$ This seemingly strange choice for the sets above is justified later. Since $$\left\{ B(s,y;r^{s,y})\colon(s,y)\in \mathcal{O}\right\}$$ forms a cover of $\mathcal{O}$ by open sets, Lindelöf's lemma yields a countable subcover $\{B(t_{i},x_{i};r_{i})\}$. Let $C_{0}=\emptyset$, and $$A_{i+1}=B(t_{i+1},x_{i+1};r_{i+1})\setminus C_{i}\text{ where }C_{i+1}=A_{1}\cup\cdots\cup A_{i+1}\text{ for }i\geq0.$$ Note that the countable family $\{A_{i}\}$ is disjoint by construction, and that $X_{\theta}^{t,x}\in\cup_{i\geq1}A_{i}$ a.s. (recall that $X_{\theta}^{t,x}\in\mathbb{L}^{\infty}$ and $t < \theta < T$ by definition). Moreover, letting $\tau^{i}=\tau^{t_{i},x_{i}}$ for brevity, \begin{align*} J(t^{\prime},x^{\prime};\tau^{i}) & \geq J(t_{i},x_{i};\tau^{i})-\epsilon\\ & \geq u(t_{i},x_{i})-2\epsilon\\ & \geq\varphi(t_{i},x_{i})-2\epsilon\\ & \geq\varphi(t^{\prime},x^{\prime})-3\epsilon & \text{for }(t^{\prime},x^{\prime})\in B(t_{i},x_{i};r_{i})\supset A_{i}. \end{align*} Now, let $A^{n}=\cup_{i\leq n}A_{i}$ for $n\geq1$. Given a stopping time $\tau\in\mathscr{T}_{[t,T]}$, let $$\hat{\tau}^{n}=\tau\mathbf{1}_{\left\{ \tau<\theta\right\} }+\mathbf{1}_{\left\{ \tau\geq\theta\right\} }\left(T\mathbf{1}_{\mathcal{O}\setminus A^{n}}(\theta,X_{\theta}^{t,x})+\sum_{i=1}^{n}\tau^{i}\mathbf{1}_{A_{i}}(\theta,X_{\theta}^{t,x})\right).$$ In particular, since $B(t_{i},x_{i};r_{i})\supset A_{i}$ was picked such that for all $(t^{\prime},x^{\prime})\in B(t_{i},x_{i};r_{i})$, $t^{\prime}\leq t_{i}$, we have that $\hat{\tau}^n\in\mathscr{T}_{[t,T]}$. If we had instead chosen the open sets $B_{r_{i}}(t_{i},x_{i})$ to form our cover, we would not be able to use $\tau^{i}$ in the above definition of $\hat{\tau}^{n}$ without violating--roughly speaking--the condition that "stopping times cannot peek into the future." We first write \begin{align*} u(t,x) & \geq J(t,x;\hat{\tau}^{n})\\ & =\mathbb{E}\left[\left(\mathbf{1}_{\left\{ \tau<\theta\right\} }+\mathbf{1}_{\left\{ \tau\geq\theta\right\} }\mathbf{1}_{\mathcal{O}\setminus A^{n}}(\theta,X_{\theta}^{t,x})+\mathbf{1}_{\left\{ \tau\geq\theta\right\} }\mathbf{1}_{A^{n}}(\theta,X_{\theta}^{t,x})\right)g(\hat{\tau}^{n},X_{\hat{\tau}^{n}}^{t,x})\right] \end{align*} and consider the terms in the summation separately. It follows from our choice of $A^{n}$ and the tower property that \begin{align*} \mathbb{E}\left[\mathbf{1}_{\left\{ \tau\geq\theta\right\} }g(\hat{\tau}^{n},X_{\hat{\tau}^{n}}^{t,x})\mathbf{1}_{A^{n}}(\theta,X_{\theta}^{t,x})\right] & =\mathbb{E}\left[\mathbf{1}_{\left\{ \tau\geq\theta\right\} }g(\tau^{i},X_{\tau^{i}}^{t,x})\mathbf{1}_{A^{n}}(\theta,X_{\theta}^{t,x})\right]\\ & =\mathbb{E}\left[\mathbb{E}\left[\mathbf{1}_{\left\{ \tau\geq\theta\right\} }g(\tau^{i},X_{\tau^{i}}^{t,x})\mathbf{1}_{A^{n}}(\theta,X_{\theta}^{t,x})\mid\mathcal{F}_{\theta}\right]\right]\\ & =\mathbb{E}\left[\mathbf{1}_{\left\{ \tau\geq\theta\right\} }J(\theta,X_{\theta}^{t,x};\tau^{i})\mathbf{1}_{A^{n}}(\theta,X_{\theta}^{t,x})\right]\\ & \geq\mathbb{E}\left[\mathbf{1}_{\left\{ \tau\geq\theta\right\} }\left(\varphi(\theta,X_{\theta}^{t,x})-3\epsilon\right)\mathbf{1}_{A^{n}}(\theta,X_{\theta}^{t,x})\right]\\ & \geq\mathbb{E}\left[\mathbf{1}_{\left\{ \tau\geq\theta\right\} }\varphi(\theta,X_{\theta}^{t,x})\mathbf{1}_{A^{n}}(\theta,X_{\theta}^{t,x})\right]-3\epsilon. \end{align*} Moreover, $$\mathbf{1}_{\left\{ \tau\geq\theta\right\} }g(\hat{\tau}^{n},X_{\hat{\tau}^{n}}^{t,x})\mathbf{1}_{\mathcal{O}\setminus A^{n}}(\theta,X_{\theta}^{t,x})=\mathbf{1}_{\left\{ \tau\geq\theta\right\} }g(T,X_{T}^{t,x})\mathbf{1}_{\mathcal{O}\setminus A^{n}}(\theta,X_{\theta}^{t,x})\leq|g(T,X_{T}^{t,x})|$$ and hence the dominated convergence theorem yields $$\lim_{n\rightarrow\infty}\mathbb{E}\left[\mathbf{1}_{\left\{ \tau\geq\theta\right\} }g(T,X_{T}^{t,x})\mathbf{1}_{\mathcal{O}\setminus A^{n}}(\theta,X_{\theta}^{t,x})\right] =\mathbb{E}\left[\mathbf{1}_{\left\{ \tau\geq\theta\right\} }g(T,X_{T}^{t,x})\lim_{n\rightarrow\infty}\mathbf{1}_{\mathcal{O}\setminus A^{n}}(\theta,X_{\theta}^{t,x})\right]=0$$ since we can (a.s.) find $i$ such that $(\theta,X_{\theta}^{t,x})\in A_{i}$. By Fatou's lemma, \begin{align*} \liminf_{n\rightarrow\infty}\mathbb{E}\left[\mathbf{1}_{\left\{ \tau\geq\theta\right\} }\varphi(\theta,X_{\theta}^{t,x})\mathbf{1}_{A^{n}}(\theta,X_{\theta}^{t,x})\right] & \geq\mathbb{E}\left[\mathbf{1}_{\left\{ \tau\geq\theta\right\} }\varphi(\theta,X_{\theta}^{t,x})\liminf_{n\rightarrow\infty}\mathbf{1}_{A^{n}}(\theta,X_{\theta}^{t,x})\right]\\ & =\mathbb{E}\left[\mathbf{1}_{\left\{ \tau\geq\theta\right\} }\varphi(\theta,X_{\theta}^{t,x})\right]. \end{align*} Note that we were able to use Fatou's lemma since $\varphi(\theta,X_\theta^{t,x})$ is bounded due to the assumption $X_{\theta}^{t,x}\in\mathbb{L}^{\infty}$. Since $\tau$ and $\epsilon$ were arbitrary, we have that $$u(t,x)\geq\mathbb{E}\left[\mathbf{1}_{\left\{ \tau<\theta\right\} }g(\tau,X_{\tau}^{t,x})+\mathbf{1}_{\left\{ \tau\geq\theta\right\} }\varphi(\theta,X_{\theta}^{t,x})\right] \text{ for } \tau\in\mathscr{T}_{[t,T]}.$$ For the last step, let $r>0$ such that $|X_{\theta}^{t,x}|\leq r$ a.s. (recall $X_{\theta}^{t,x}\in\mathbb{L}^{\infty}$). We can find a sequence of continuous functions $(\varphi_{n})$ such that $\varphi_{n}\leq u_{*}$ and converges pointwise to $u_{*}$ on $[0,T]\times B_{r}(0)$. Letting $\varphi^N = \min_{n\geq N} \varphi_n$ denote a nondecreasing modification of this sequence, by the monotone convergence theorem, we get \begin{align*} u(t,x) & \geq\lim_{N\rightarrow\infty}\mathbb{E}\left[\mathbf{1}_{\left\{ \tau<\theta\right\} }g(\tau,X_{\tau}^{t,x})+\mathbf{1}_{\left\{ \tau\geq\theta\right\} }\varphi^N(\theta,X_{\theta}^{t,x})\right]\\ & =\mathbb{E}\left[\mathbf{1}_{\left\{ \tau<\theta\right\} }g(\tau,X_{\tau}^{t,x})+\mathbf{1}_{\left\{ \tau\geq\theta\right\} }u_{*}(\theta,X_{\theta}^{t,x})\right] & \text{ for } \tau \in \mathscr{T}_{[t,T]}.\end{align*} Now, simply take supremums on both sides to arrive at the desired result.

## Bibliography

1. Touzi, Nizar. Optimal stochastic control, stochastic target problems, and backward SDE. Vol. 29. Springer Science & Business Media, 2012.