Building a Model for Returns

Published September 28, 2024

The St. Petersburg Stock Exchange - Ivan Aivazovsky (1847)

Who needs $\sigma$ anyway?
A tale of two growth rates
Great average returns do not a winner make
The binomial model
Conclusion - What a drag!
References

Volatility drag in Gaussian stochastic return models

Who needs $\sigma$ anyway?

Quantitative finance follows in the tradition set by Bachelier in his 1900 paper "Théorie de la spéculation" [Bachelier 1900], in which he models asset prices as fundamentally random. In quantitative finance, we throw up our hands and leave the business of trying to forecast prices to the value investors and the day-traders. Once we acquiesce to a nondeterministic universe¹, we ask "ok, now what can we do?"

Bachelier essentially started with an arithmetic binomial model and considered the transition probability between timesteps to obtain a stochastic diffusion process:

c^2 \frac{\partial \mathscr{P}}{\partial t} - \frac{\partial^2 \mathscr{P}}{\partial x^2} = 0

Into the 60s and 70s, however, a preference emerged to model asset prices as stochastic processes which were geometric, both in drift and diffusion. This is certainly an improvement, but it leads to subtleties which are not present in Bachelier’s arithmetic model.

Daily asset price data for SPY is chaotic at every time scale. Fractals, anyone?

We begin by considering a discrete time series indexed by time.

\{S_t\} \equiv \{S_{k \, \Delta t}\}_{k \in \mathbb{N}_0} = \{S_0, S_{\Delta t}, S_{2 \, \Delta t}, \dots\}

The series $\{S_t\}$ describes the stock price at particular times $k \, \Delta t$ , and crucially, we declare it to be random. In this probabilistic setting proposed by Bachelier, descriptive statistics of some underlying random variable can grant us an initial foothold in describing the jagged, chaotic time series of asset prices. Of course, efficient markets mostly strip these statistics of useful information, but their properties are instructive.

We might consider the entire sequence $\{S_t\}$ to be a random variable, i.e. $\{S_t\} \equiv S(t, \omega)$ , but it’s not clear how we would extend this definition or sample from its distribution in practice. The individual prices $S_t$ might be another choice, though their distributions clearly depend on the realized previous price (in other words, the amount by which the price can change over a timestep appears bounded). Bachelier considered the sequence of absolute differences $\{\Delta S_t\}$ to solve this problem, but we can do better.

To proceed, we make an empirical observation.

The average change in $S_t$ over an interval $k \, \Delta t$ is proportional to $S_t$ and the length of the interval.

This suggests that the stock’s movement has something to do with exponential growth, and it’s why a geometric stochastic process model is a better choice.

A couple important points:

It’s important to recognize the above as a purely empirical fact. This is not generally true for an arbitrary sequence of numbers.
The noise in real-world market data can make it difficult to convince yourself of this unless you look at sufficiently small intervals across sufficiently large time scales, and I wonder if that’s why Bachelier didn’t bother with this mathematical framing. Across short time scales, with the resolution of price data Bachelier had access to, the relationship would have appeared linear.
Even still, the statement is a simplification which does not survive statistical scruitiny. This is a restatement of the classic (over)simplification that $S_t$ is lognormally distributed – this is only an approximation.

This also means that, while studying the properties of the $\Delta S_t$ indeed leads to the same answers, the math gets a bit cluttered, as the distribution of $\Delta S_t$ depends on $S_t$ . Thus, a natural choice is to reframe the model in terms of the main character of this essay: returns.

Returns are noisy and somewhat Gaussian, and since annual return is simply compounded daily returns, it’s fair to think of the asset price path as "growth plus noise." The subtle interplay of this growth and noise is the subject of this essay.

Histogram of simple returns. If you blur your eyes and shake your head back and forth, you might convince yourself that it resembles a Gaussian distribution.

A tale of two growth rates

Let’s start by formalizing our choice of random variable. Emperically, we approximated that

\widebar{\Delta S_t} \propto S_t \, \Delta t

so a natural first step is to model a sequence of simple returns.

\label{eq:SimpleReturn} R_t^{(\Delta t)} := \frac{\Delta S_t}{S_t} = \frac{S_{t + \Delta t} - S_t}{S_t}

Here, I introduce the notation $R_t^{(\Delta t)}$ to indicate that this return is defined across a timestep $t_{i+1} - t_i = \Delta t$ . The notation may seem cumbersome, but it will prove quite useful later on.

S_{t + \Delta t} = S_t \left(1 + R_t^{(\Delta t)}\right) \qquad\qquad S_{n \, \Delta t} = S_0 \prod_{i=0}^{n-1} \left(1 + R_{i \, \Delta t}^{(\delta t)}\right)

But we can eliminate our dependence on $S_t$ another way: by considering log returns.

\label{eq:LogReturn} \left(R_{\log}^{(\Delta t)}\right)_t := \log \frac{\Delta S_t}{S_t} = \log \left(1 + R_t^{(\Delta t)}\right)

Under this view, the sequence becomes

S_{n \, \Delta t} = S_0 \exp \left( \sum_{i=0}^{n-1} \left(R_{\log}^{(\Delta t)}\right)_{i \, \Delta t} \right)

The implication of modeling the asset price as random is to claim that it could have ended differently. So what is the event space? Recall the emperical observation from earlier, this time in terms of returns:

\widebar{R^{(k \, \Delta t)}} \propto k \, \Delta t \qquad\qquad \widebar{R_{\log}^{(k \, \Delta t)}} \propto k \, \Delta t

We’ll label these proportionality constants $\tilde{\mu}$ and $\mu$ respectively, so that

\widebar{R^{(\Delta t)}} = \tilde{\mu} \, \Delta t \qquad\qquad \widebar{R_{\log}^{(\Delta t)}} = \mu \, \Delta t

Now that we have chosen $R^{(\Delta t)}$ or $R_{\log}^{(\Delta t)}$ as the engine of the price sequence, we can reason about the event space. Both approaches have the result of turning the geometric problem into a linear one. Furthermore, we can see that, under our assumptions, $\widebar{R^{(\Delta t)}}$ and $\widebar{R_{\log}^{(\Delta t)}}$ do not depend on time. Thus, if we have a sufficiently large number of returns, we can replace our sample averages with the expectations.

\mathbb{E}\left[R^{(\Delta t)}\right] = \tilde{\mu} \, \Delta t \qquad\qquad \mathbb{E}\left[R_{\log}^{(\Delta t)}\right] = \mu \, \Delta t

This is a good start, but we have a mystery on our hands. If we think of a stock price as "growth plus noise", which growth rate is the correct one? And what does the other one represent?

The growth rate $\tilde{\mu}$ associated with simple returns might feel more legitimate at first. After all, simple returns are all that really matters in finance – they are the actual amount of money gained or lost.

For kicks, let’s superimpose these growth rates atop the same SPY time series.

The growth rate obtained from simple returns overestimates total return, whereas mean log returns does not. Note that $\tilde{\mu} \, \Delta t$ is small enough that $\exp(\tilde{\mu} t) \approx (1 + \tilde{\mu}\, \Delta t)^{t / \Delta t}$ . The base of exponentiation is insignificant; the difference is in the growth rate itself.

In some sense, this view makes $\mu$ seem more legitimate, since it preserves total return! If indeed our stock price is "growth plus noise," surely this growth curve is the ideal deterministic basis with which to compose the noise.

We can rewrite these random variables equivalently as discrete random growth factors (i.e. price ratios).

\begin{aligned} \mathbb{E}\left[1 + R^{(\Delta t)}\right] &= \frac{1}{n} \sum_i \left(1 + R_i^{(\Delta t)}\right) \\ \mathbb{E}\left[1 + R_{\log}^{(\Delta t)}\right] &= 1 + \frac{1}{n} \sum_{i=0}^{n-1} \log \frac{S_{i \, \Delta t + \Delta t}}{S_{i \, \Delta t}} \\ &= 1 + \frac{1}{n} \sum_{i=0}^{n-1} \log \left(1 + R_i^{(\Delta t)}\right) \\ &= 1 + \log \left( \prod_i 1 + R_i^{(\Delta t)} \right)^\frac{1}{n} \\ &\approx \left( \prod_i 1 + R_i^{(\Delta t)} \right)^\frac{1}{n} \end{aligned}

We can safely ignore terms smaller than $O\left((\mu \, \Delta t)^2\right)$ . But be careful – we cannot ignore terms of order $O\left(\left(R^{(\Delta t)}\right)^2\right)$ .

We can see that these price ratio expectations are the arithmetic and geometric means of the price ratios, respectively. The AM-GM inequality tells use that $\tilde{\mu}$ will always be larger than $\mu$ .

This view also explains why log returns recover the total return.

(1 + \mu \, \Delta t)^n = \mathbb{E}\left[1 + R_{\log}^{(\Delta t)}\right]^n = \prod_i 1 + R_i^{(\Delta t)} = \prod_i \frac{S_{i \, \Delta t + \Delta t}}{S_{i \, \Delta t}} = \frac{S_{n \, \Delta t}}{S_0}

The mean log return simply answers the question "what growth rate, if compounded $n = t / \Delta t$ times, would result in the correct total growth?"

You might not think that this difference is a big deal; for the period of SPY prices shown above, the two differ by only about 3.6%.

\tilde{\mu} = 0.000984 \qquad \mu = 0.000951

But consider pathological cases where these two quantities have different signs. As it turns out, in markets, this happens all the time. I tested 252-day windows for 15 stocks, and for most stocks, a whopping 5% of their year-long windows exhibit this behavior – overall positive daily returns but with a negative total return! This should convince you that understanding the difference between these two growth rates is worthwhile.

A smattering 15 stocks were chosen, and their mean daily returns were calculated over a sliding 252-day window. Random periods are plotted which have positive mean daily returns but negative total return across the period. Each ticker is displayed with its percentage of periods tested which have this characteristic.

Think about the implication:

An investor who buys the asset each morning and sells the following morning, while buying the asset again at the exact same price makes money.
An investor who buys and holds loses money.

This exercise also gives us a clue – more volatile stocks have a higher frequency of these pathological periods, and for any single stock, the periods occur during volatile periods. Whatever this thing is, it has something to do with volatility.

Indeed, this phenomenon is called volatility drag. To understand it better, we need to make a stop at the casino.

Great average returns do not a winner make

An American roulette table has 38 squares. Two squares are green, 18 are red, and 18 are black. The game starts by betting money on red or black. A ball is then spun around the wheel, randomly landing on a color. A correct guess doubles your money, and an incorrect guess loses the entire bet.

\Delta S = \begin{cases} +\$1 & p = \frac{18}{38} \\ -\$1 & 1 - p = \frac{20}{38} \end{cases}

The expected winnings $\Delta S$ on a $1 bet is trivial to calculate:

\mathbb{E}[\Delta S] = p * 1 + (1 - p) * -1 = 2 * p - 1 = 2\frac{18}{38} - 1 = -\$0.053

Indeed, after $k$ games, the expected P&L is $\mathbb{E}[R] = \mathbb{E}\left[\sum R_i\right] = \sum \mathbb{E}[R_i] = k * -$ $ $0.053$ .

So far so good. This is why no mathematicians play routlette. We can make roulette bets analogous to stock investing by simulating a single-period "buy and hold" strategy for some finite number of plays – we let our bet ride for $n$ rounds. Unsurprisingly, the expected final payout is abysmal:

\begin{aligned} \frac{S_n}{S_0} &= 1 + R^{(n)} = \prod_i \left(1 + R_i\right) \\ \mathbb{E}\left[\frac{S_n}{S_0}\right] &= \mathbb{E}\left[\prod_i \left(1 + R_i\right)\right] = \prod_i \mathbb{E}\left[1 + R_i\right] = (\$0.947)^n \end{aligned}

But something quite interesting happens when we change the game slightly. Let’s imagine a roulette table which has inverted the odds in the gambler’s favor. At this table, the green squares are feebies - they always count as a win.

Now the expected return for a single play is positive.

\begin{gathered} R = \begin{cases} +1 & p = \frac{20}{38} \\ -1 & 1 - p = \frac{18}{38} \end{cases} \\ \mathbb{E}[R] = \frac{2}{38} = +\$0.053 \end{gathered}

Not only that, but the expected growth in wealth after $n$ plays is also positive, guaranteeing that the casino loses money in the long run.

\mathbb{E}\left[\frac{S_n}{S_0}\right] = (1.053)^n

Should we play this game?

We’ve just calculated the expected payout, but that number would only be realized after a sufficiently large number of $n$ -round plays. We might also wonder about the expected outcome of a single play. Because individual rounds multiply the starting wealth, the order doesn’t matter. Thus, the final wealth is fully determined by the number of wins. Each win occurs with probability $p$ , so the expected number of wins in an $n$ -round play is simply

\mathbb{E}[\text{num wins}] = n p

Then the final return for a game which has the expected number of wins is

\frac{(\Delta S)^{(n)}}{S_0} = 2^{n p} * 0^{n - n p} = \left(2^p * 0^{1 - p}\right)^n = 0

Therefore, it’s most likely that you will walk away with nothing each time you sit down to play! Sure, you could walk out the door with $2^{10} =$ $ $1024$ after 10 rounds, but that happens only with a $\left(\frac{20}{38}\right)^{10} = .16\%$ chance! On the other hand, if you play a 10-round game 400 times, it’s more likely than not that you’ll have won $2^{10} =$ $ $1024$ at some point, only sacrificing $399 for the privilege.

We can also see a cartoon example of someone with positive average returns who still ends up a loser. Consider the unlucky soul who won 9 times in a row and lost on their 10th play. Their average return was a whopping 80% per play!

\bar{R} = \frac{1}{10} (1 * 9 + -1 * 1) = 0.8

But of course, they ended up with a total return of -100%. The bigger the rise, the harder the fall.

Whether or not you would take this bet given finite money (i.e. finite number of plays) depends on your level of risk tolerance.

This is a variation on the St. Petersburg Paradox, which highlights the difference between the mathematical value of a game and a rational individual’s willingness to play it. This topic was also explored in a similar setting to this one by Paul Samuelson in 1971 (Samuelson (1971)).

Now, take a look at the quantity $2^p * 0^{1 - p}$ . Taking the log of both sides of the above equation,

\begin{aligned} \log \frac{(\Delta S)^{(n)}}{S_0} &= n \log \left(2^p * 0^{1 - p}\right) \\ \frac{1}{n} \log \frac{(\Delta S)^{(n)}}{S_0} &= p * \log 2 + (1 - p) * \log 0 \\ &= \mathbb{E}\left[R_{\log}\right] = -\infty \end{aligned}

This is the log return!

To summarize,

Simple returns tell us about the dynamics of the expected value of the price across many possible realizations of the path.
Log returns tell us about the dynamics of the expected path that the asset price takes.

This tracks with our discussion about the SPY time series. By accepting that time series data as our sample and declaring that it perfectly models the population statistics, we are not saying that "this sample has the mean growth rate;" rather, we are saying that "this sample has the most likely growth rate."

So, we’ve tracked down the difference in these two growth rates, but we’ll get a better feel for the machinery by building a bridge between the stock price and the casino. So far we’ve only been concerned with the first moment of our random returns, but in order to develop our model, we need to consider the second moment as well.

The binomial model

The binomial model provides the perfect setting for a random walk. It simplifies the math while providing some key ingredients:

a natural structure for geometric growth
sufficient degrees of freedom to tune distributions to our liking
in the limit, converges on Geometric Brownian Motion

A geometric version of the binomial model was introduced by Cox, Ross, and Rubinstein (1979) in 1979 to price options. I’ll refer to it as the Geometric Binomial Random Walk (GBRW). They also demonstrated that this option pricing method converges on the Black-Scholes-Merton solution. We won’t be pricing options here, but we’ll use a similar convergence technique to study and extend our stock price model.

In the GBRW model, the asset price $S_t$ rises to $u \cdot S_t$ with probability $p$ or falls to $v \cdot S_t$ in a single time step.

This is simply a generalization of the Roulette game from above.

One time step is not enough for our needs – we need to develop a procedure for extending the model to time scales larger and smaller than $\Delta t$ so we can examine the statistics².

We begin by examining a simpler case: the Arithmetic Binomial Random Walk (ABRW).

Arithmetic binomial random walk

The arithmetic binomial random walk is characterized by step sizes that do not change with position or time. Here, I’ve kept the model general by allowing for asymmetric step sizes and probabilities.

There are many ways to frame and study this setting mathematically, but as a computer scientist by trade, I like the framing that facilitates simulation (i.e. one that allows repeated sampling). As it turns out, this approach also builds a nice bridge to stochastic calculus.

We will analyze the binomial model via the properties of a random walk along its tree-like structure. The engine of the random walk is the random variable $\Delta y^{(\Delta t)}$ .

\Delta y^{(\Delta t)} = \begin{cases} \Delta y_1^{(\Delta t)} & p \\ \Delta y_2^{(\Delta t)} & 1 - p \end{cases} \qquad\qquad y_{t + k \, \Delta t} := y_t + \sum_k \Delta y_k^{(\Delta t)}

This is already enough setup to do some interesting math, but the real meat comes from extending the definition of our model.

We recognize that daily returns for assets do not tell the full story. Rather, they are a snapshot at a resolution of $\Delta t = 1 \text{ day}$ . Thus, daily returns (simple or log) cannot be the atomic random variables! Just like how annual return is a random quantity constructed from random daily returns, so too are daily returns merely a summary of higher frequency trading. In the limit, we consider trading to occur continuously.

Therefore, in order for the binomial model to be useful to us, we need to be able to reason about its properties at time scales both larger and smaller than the timescale $\Delta t$ over which it is initially defined.

I like to think of it this way: by defining the sample space and probability measure at a particular time horizon, what we’re really doing is setting the parameters of a distribution at that time horizon. Then, we analyze the properties of the model at discrete interval lengths $k \, \Delta t$ to find a continuous map $M$ from time to the distribution.

M : t \to \mathcal{D}_t = \mathcal{D}(\boldsymbol{\theta}(t))

Think of $\Delta t$ as the resolution of the grid. It is the mesh of the partition of the time axis.

Now, consider the process

Z_t = \log S_t

The logarithm turns the geometric binomial grid above into an arithmetic one.

\Delta \left(\log S\right)^{(\Delta t)} = \Delta Z^{(\Delta t)} = \begin{cases} \log u & p \\ \log v & 1 - p \end{cases} \qquad\qquad \Delta \left(\log S_t\right)^{(\Delta t)} = \log \frac{S_t}{S_0} := \sum_k \left(\Delta Z\right)_k^{(\Delta t)}

Equivalently, one could also consider the random variable to be the number of up-moves, analogous to the number of wins in the Roulette example. Across one timestep, this is a boolean random variable.

\mathrm{Bool}(\text{up-move})^{(\Delta t)} = \begin{cases} 1 & p \\ 0 & 1 - p \end{cases}

At this point, the grid is only defined for times $t + k \, \Delta t; k \in \{0, 1, 2, \dots\}$ . So we look at how the expectation and variance evolve for $k > 1$ . It’s relatively easy to work this out using the binomial distribution, or you can simply recognize that we are accumulating independent Bernoulli trials.

\mathbb{E}\left[\Delta Z^{(k \, \Delta t)}\right] = k \mathbb{E}\left[\Delta Z^{(\Delta t)}\right] \qquad\qquad \mathbb{V}\left[\Delta Z^{(k \, \Delta t)}\right] = k \mathbb{V}\left[\Delta Z^{(\Delta t)}\right]

Because the quantities $\frac{\mathbb{E}\left[\Delta Z^{(\Delta t)}\right]}{\Delta t}$ and $\frac{\mathbb{V}\left[\Delta Z^{(\Delta t)}\right]}{\Delta t}$ are invariant for any $t = k \, \Delta t$ , this gives us a natural procedure to extend the binomial grid definition to times between $t$ and $t + \Delta t$ ; i.e. to increase the resolution: iteratively partition the interval $\Delta t$ while preserving these two invariants³.

With this framing in mind, we’ll indicate the hard-coded parameters for the $\Delta t$ -resolution grid by defining constants.

\mathbb{E}\left[\Delta Z^{(\Delta t)}\right] = \mu \, \Delta t \qquad\qquad \mathbb{V}\left[\Delta Z^{(\Delta t)}\right] = \sigma^2 \, \Delta t

We did a similar substitution at the beginning of this essay for the SPY time series. But in this case, the substitution results directly from how we’ve defined the binomial grid, not an emperical observation like before. Writing $\mu \, \Delta t$ is useful because later on we will manipulate $\Delta t$ like a variable. It is justified because, as shown above, $\mathbb{E}\left[\Delta Z^{(\cdot)}\right]$ is linear in time, insofar as you consider $\Delta Z^{(\cdot)}$ a discrete function of time:

\Delta Z^{(k \, \Delta t)} := Z^{(k \, \Delta t)} - Z_0 = \sum_{i=1}^k \left(\Delta Z^{(\Delta t)}\right)_i

This essentially recognizes $\mu$ and $\sigma^2$ as growth and volatility per unit time of $Z^{(k \, \Delta t)}$ respectively.

If the goal is to simulate a random walk on the binomial grid (e.g. with a computer), all we really need is a formula for the binomial random variable from which to sample to construct a random walk with timesteps of arbitrary length. We simply extend $k$ to $\mathbb{R}^+$ , and define $n = 1/k$ . Let

\delta t := \frac{\Delta t}{n} \qquad\qquad \delta Z := \Delta Z^{(\delta t)}

Then, we can simply increase $n$ until we have sufficient precision.

\begin{aligned} \mathbb{E}\left[\delta Z\right] &= p \delta Z_1 + (1 - p) \delta Z_2 &&= \frac{1}{n} \mu \, \Delta t &&= \mu \, \delta t \\ \mathbb{V}\left[\delta Z\right] &= p (1 - p) (\delta Z_1 - \delta Z_2)^2 &&= \frac{1}{n} \sigma^2 \, \Delta t &&= \sigma^2 \, \delta t \end{aligned}

These two constraints in two unknowns gives a unique solution (up to a reflection about the first moment):

\begin{aligned} \delta Z_1 &= \mu \, \delta t \pm \sigma \sqrt{\delta t} \sqrt{\frac{1 - p}{p}} \\ \delta Z_2 &= \mu \, \delta t \mp \sigma \sqrt{\delta t} \sqrt{\frac{p}{1 - p}} \end{aligned}

Here I’ve defined a linear binomial model by $p = 0.6$ , $\Delta Z_1^{(\Delta t)} = 0.4$ , $\Delta Z_2^{(\Delta t)} = -0.9$ . The expected value and variance at time $t = \Delta t$ are invariant as the resolution of the binomial grid increases. $\Delta t$ is set to 1 w.l.o.g., and the mean path is interpolated to make the graph easier to read. The expectation of $Z_t$ is linear in time.

What’s more, we can let $n$ become large, and with a healthy bit of applied math hand-waving, let

\delta t \to dt \qquad\qquad \delta Z_t \to dZ_t \qquad\qquad Z_t := Z_0 + \int_0^t dZ_t

Voilà! We’ve discovered Brownian motion and stochastic increments!

dZ_t = \begin{cases} \mu \, dt \pm \sigma \sqrt{dt} \sqrt{\frac{1 - p}{p}} & p \\ \mu \, dt \mp \sigma \sqrt{dt} \sqrt{\frac{p}{1 - p}} & 1 - p \end{cases}

It should be clear that we aren’t in Kansas anymore, and that normal rules of calculus don’t apply here. In traditional calculus, a differential is defined as the linear component of its function’s derivative:

dZ(t) := Z'(t) \, dt

But at no point (including in the limit) is $Z_t$ differentiable!

Here we’ve approached $dZ_t$ through a limiting process of smaller and more frequent Bernoulli trials, so $dZ_t$ takes the form of a Bernoulli random variable. The more common (functionally equivalent) form is to write $dZ_t$ as a Gaussian random variable.

dZ_t = d(\log S_t)= \mu \, dt + \sigma \phi \sqrt{dt}

with $\phi \sim \mathcal{N}(0, 1)$ .

All that’s left to do is integrate and exponentiate to obtain our simulated path.

\frac{S_t}{S_0} = \exp\left(\int_0^t dZ_s\right)

Geometric binomial random walk

Now let’s find out what happens when we apply the same procedure for the geometric case.

Start by choosing a random variable which drives the random walker across the grid.

\Delta S_{k \, \Delta t}^{(\Delta t)} := \begin{cases} (u - 1) S_{k \, \Delta t} & p \\ (v - 1) S_{k \, \Delta t} & 1 - p \end{cases} \quad\implies\quad S_{k \, \Delta t} = S_0 + \sum_k \Delta S_{k \, \Delta t}^{(\Delta t)}

We would rather not have our random variable depend on $k$ (via $S_{k \, \Delta t}$ , such that its distribution changes after each trial), so instead, we express the problem geometrically.

R^{(\Delta t)} \equiv \frac{\Delta S_{k \, \Delta t}^{(\Delta t)}}{S_{k \, \Delta t}} = \begin{cases} u - 1 & p \\ v - 1 & 1 - p \end{cases} \quad\implies\quad S_{k \, \Delta t} := S_0 \prod_k \left(1 + R_k^{(\Delta t)}\right)

Whether you consider $\Delta S_{k \, \Delta t}^{(\Delta t)}$ , $S_{k \, \Delta t}$ itself, or even a growth factor like

G^{(\Delta t)} := \begin{cases} u & p \\ v & 1 - p \end{cases} \quad\implies\quad S_{k \, \Delta t} = S_0 \prod_k G_k^{(\Delta t)}

to be the random variable, it makes no difference.

The key characteristic is that we are sampling an independent sequence of binomial random variables and accumulating them geometrically to walk along the binomial tree. These forms are all equivalent and lead to the same results below. Using $R^{(\Delta t)}$ or $G^{(\Delta t)}$ makes the math a bit cleaner (we don’t get $S_t$ popping up everywhere).

The expectation and variance across $\Delta t$ are

\begin{aligned} \mathbb{E}\left[1 + R^{(\Delta t)}\right] &= p u + (1 - p) v \\ \mathbb{V}\left[R^{(\Delta t)}\right] = \mathbb{V}\left[1 + R^{(\Delta t)}\right] &= \mathbb{E}\left[\left(1 + R^{(\Delta t)}\right)^2\right] - \mathbb{E}\left[1 + R^{(\Delta t)}\right]^2 \\ &= u^2 p + v^2 (1 - p) - (u p + v (1 - p))^2 \\ &= p (1 - p) \left(u - v\right)^2 \end{aligned}

Because these values are just constants, we’ll write them as such.

\mathbb{E}\left[R^{(\Delta t)}\right] = \tilde{\mu} \, \Delta t \qquad \mathbb{V}\left[R^{(\Delta t)}\right] = \tilde{\sigma}^2 \, \Delta t

Again, we write them this way to create quantities $\tilde{\mu}$ and $\tilde{\sigma}^2$ which are expressed per unit time in anticipation of scaling the time interval of our trials later on. But at this point, $\Delta t$ is just a number, not a variable!

The next step is to find the first and second moments of the derived random variable, $R^{(k \, \Delta t)}$ . Recall from the linear example that the goal is to establish a relationship between $R^{(\Delta t)}$ and $R^{(k \, \Delta t)}$ which can be extended to all values of $t$ .

Across $k$ timesteps of length $\Delta t$ , the expectation is binomial.

\begin{aligned} \mathbb{E}\left[1 + R^{(k \, \Delta t)}\right] = \mathbb{E}\left[\frac{S_{t + k \, \Delta t}}{S_t}\right] &= \mathbb{E}\left[u^i v^{k - i}\right] \\ &= \sum_i u^i v^{k - i} \Pr[i] \\ &= \sum_i {k \choose i} (u p)^i (v (1 - p))^{k - i} \\ &= (p u + (1 - p) v)^k \\ &= \mathbb{E}\left[1 + R^{(\Delta t)}\right]^k \end{aligned}

This also follows from the expectation of independent variates.

\begin{aligned} 1 + R^{(k \, \Delta t)} = \frac{S_{t + k \, \Delta t}}{S_t} &= \prod_k \left(1 + R_k^{(\Delta t)}\right) \\ \mathbb{E}\left[1 + R^{(k \, \Delta t)}\right] = \mathbb{E}\left[\prod_k \left(1 + R_k^{(\Delta t)}\right)\right] &= \prod_k \left(\mathbb{E}\left[1 + R_k^{(\Delta t)}\right]\right) \end{aligned}

This confirms that the expected total growth and thus the expected value of $S_t$ grows exponentially, which is unsurprising.

The variance isn’t quite as clean:

\begin{aligned} \mathbb{V}\left[1 + R^{(k \, \Delta t)}\right] &= \mathbb{E}\left[\left(1 + R^{(k \, \Delta t)}\right)^2\right] - \mathbb{E}\left[1 + R^{(k \, \Delta t)}\right]^2 \\ &= \left( \sum_i u^{2i} v^{2 (k - i)} \Pr[i] \right) - \mathbb{E}\left[1 + R^{(\Delta t)}\right]^{2k} \\ &= \left(u^2 p + v^2 (1 - p)\right)^k - \mathbb{E}\left[1 + R^{(\Delta t)}\right]^{2k} \\ &= \left( \mathbb{V}\left[R^{(\Delta t)}\right] + \mathbb{E}\left[1 + R^{(\Delta t)}\right]^2 \right)^k - \mathbb{E}\left[1 + R^{(\Delta t)}\right]^{2k} \\ &= \left( \mathbb{V}\left[R^{(\Delta t)}\right] + \mathbb{E}\left[1 + R^{(\Delta t)}\right]^2 \right)^k - \left( \mathbb{E}\left[1 + R^{(\Delta t)}\right]^2 \right)^k \end{aligned}

Nonetheless, we have the necessary formulae to determine our binomial random variables for any grid resolution. Again, extend $k$ to $\mathbb{R}^+$ , with $n = 1/k$ .

\delta t := \frac{\Delta t}{n} \qquad\qquad \delta S := \Delta S^{(\delta t)}

\begin{aligned} \mathbb{E}\left[1 + R^{(\delta t)}\right] &= 1 + p R_1^{(\delta t)} + (1 - p) R_2^{(\delta t)} &&= (1 + \tilde{\mu} \, \Delta t)^{\delta t / \Delta t} \\ \mathbb{V}\left[1 + R^{(\delta t)}\right] &= p (1 - p) \left(R_1^{(\delta t)} - R_2^{(\delta t)}\right)^2 &&= \left( \tilde{\sigma}^2 \, \Delta t + \left(1 + \tilde{\mu} \, \Delta t\right)^2 \right)^{\delta t / \Delta t} - \left(1 + \tilde{\mu} \, \Delta t\right)^{2 \delta t / \Delta t} \end{aligned}

Since $\delta t$ becomes small, we can simplify.

\begin{aligned} \mathbb{E}\left[1 + R^{(\delta t)}\right] &= (1 + \tilde{\mu} \, \Delta t)^{\delta t / \Delta t} \\ &= 1 + \frac{\delta t}{\Delta t} \log (1 + \tilde{\mu} \, \Delta t) \\ \mathbb{V}\left[1 + R^{(\delta t)}\right] &= \left( \tilde{\sigma}^2 \, \Delta t + \left(1 + \tilde{\mu} \, \Delta t\right)^2 \right)^{\delta t / \Delta t} - \left(1 + \tilde{\mu} \, \Delta t\right)^{2 \delta t / \Delta t} \\ &= \frac{\delta t}{\Delta t} \log \left( \tilde{\sigma}^2 \, \Delta t + \left(1 + \tilde{\mu} \, \Delta t\right)^2 \right) - 2 \frac{\delta t}{\Delta t} \log (1 + \tilde{\mu} \, \Delta t) \\ &= \frac{\delta t}{\Delta t} \log \left( 1 + \frac{\tilde{\sigma}^2 \, \Delta t}{\left(1 + \tilde{\mu} \, \Delta t\right)^2} \right) \end{aligned}

Now we notice something very interesting. Just like in the linear case, we have the scaling of the random variables proportional to $\delta t$ . But instead of the $\tilde{\mu}$ and $\tilde{\sigma}$ which describe the final distribution, we have new quantities.

\mu_\dagger := \frac{1}{\Delta t} \log(1 + \tilde{\mu} \, \Delta t) \qquad\qquad \sigma_\dagger^2 := \frac{1}{\Delta t} \log \left( 1 + \frac{\tilde{\sigma}^2 \, \Delta t}{\left(1 + \tilde{\mu} \, \Delta t\right)^2} \right)

What’s more, we can see that these values represent an expected value and variance of some other distribution on the $\Delta t$ time scale.

\mu_\dagger \, \Delta t = \log(1 + \tilde{\mu} \, \Delta t) \qquad\qquad \sigma_\dagger^2 \, \Delta t = \log \left( 1 + \frac{\tilde{\sigma}^2 \, \Delta t}{\left(1 + \tilde{\mu} \, \Delta t\right)^2} \right)

What are these spooky quantities? One option here is to recall that we ignore anything smaller than $O\left((\tilde{\mu} \, \Delta t)^2\right)$ , in which case we simply have

\mu_\dagger \approx \tilde{\mu} \qquad\qquad \sigma_\dagger^2 \approx \tilde{\sigma}^2

But recall that by definition, at any grid resolution,

Z^{(\cdot)} = \log\left(1 + R^{(\cdot)}\right)

Thus,

\begin{aligned} \mathbb{E}[1 + R] = \mathbb{E}[\exp Z] &\approx 1 + \mathbb{E}[Z] + \frac{1}{2} \mathbb{E}\left[Z^2\right] \\ \log\left(1 + \mathbb{E}\left[R^{(\Delta t)}\right]\right) &\approx \left(\mu + \frac{1}{2} \sigma^2\right) \, \Delta t \end{aligned}

This is our log return, corrected for convexity! I’ll spare the reader of the algebra, but the variance works out to

\sigma_\dagger^2 \approx \sigma^2

We’ve played fast and loose with our approximations here, so it’s worth taking a beat to notice that the growth and volatility densities $\mu_\dagger$ and $\sigma_\dagger^2$ depend on $\Delta t$ , the starting resolution of the grid. This was not the case for $\mu$ and $\sigma^2$ from the arithmetic binomial grid. In other words, in the process of refining the grid above, we swapped out $\mu_\dagger$ for $\left(\mu + \frac{1}{2} \sigma^2\right)$ and $\sigma_\dagger^2$ for $\sigma^2$ , but only at the $\Delta t$ resolution. This means that the error introduced at the $\Delta t$ resolution doesn’t vanish for smaller resolutions, so $\mu \, \Delta t$ and $\sigma^2 \, \Delta t$ should be sufficiently small.

In the limit, when $Z^{(\cdot)} \equiv R_{\log}^{(\cdot)}$ and $R^{(\cdot)}$ are Gaussian, these equations are exact:

\begin{aligned} R_{\log}^{(\cdot)} &= \log\left(1 + R^{(\cdot)}\right); \quad R_{\log}^{(\delta t)} \sim \mathcal{N}\left(\mu \, \delta t, \sigma^2 \, \delta t\right) \\ \mathbb{E}[\exp R_{\log}^{(\delta t)}] &= \exp \left(\mu \, \delta t + \frac{1}{2} \sigma^2 \, \delta t\right) \\ 1 + \tilde{\mu} \, \delta t &= \exp \left(\mu \, \delta t + \frac{1}{2} \sigma^2 \, \delta t\right) \\ \log \left(1 + \tilde{\mu} \, \delta t\right) &= \left(\mu + \frac{1}{2} \sigma^2\right) \, \delta t \end{aligned}

Similarly, for the variance:

\begin{aligned} \mathbb{V}\left[\exp R_{\log}^{(\delta t)}\right] &= \mathbb{V}\left[1 + R^{(\delta t)}\right] \\ &= \mathbb{E}\left[\left(\exp R_{\log}^{(\delta t)}\right)^2\right] - \mathbb{E}\left[\exp R_{\log}^{(\delta t)}\right]^2 \\ &= \mathbb{E}\left[\exp 2 R_{\log}^{(\delta t)}\right] - \exp \left(2 \mu \, \delta t + \sigma^2 \, \delta t\right) \\ \tilde{\sigma}^2 \, \delta t &= \exp \left(2 \mu \, \delta t + 2 \sigma^2 \, \delta t\right) - \left(1 + \tilde{\mu} \, \delta t\right)^2 \\ \frac{\tilde{\sigma}^2 \, \delta t}{\left(1 + \tilde{\mu} \, \delta t\right)^2} &= \exp \left(\sigma^2 \, \delta t\right) - 1 \\ \sigma^2 \, \delta t &= \log \left(1 + \frac{\tilde{\sigma}^2 \, \delta t}{\left(1 + \tilde{\mu} \, \delta t\right)^2}\right) \end{aligned}

Once again, let’s consolidate the random variable and write its equivalent stochastic differential equation form:

\begin{aligned} R^{(\delta t)} = \frac{\delta S_t}{S_t} &= \begin{cases} \tilde{\mu} \, \delta t \pm \tilde{\sigma} \sqrt{\delta t} \sqrt{\frac{1-p}{p}} & p \\ \tilde{\mu} \, \delta t \mp \tilde{\sigma} \sqrt{\delta t} \sqrt{\frac{p}{1-p}} & 1 - p \end{cases} \\ &= \begin{cases} \left(\mu + \frac{1}{2} \sigma^2\right) \, \delta t \pm \sigma \sqrt{\delta t} \sqrt{\frac{1-p}{p}} & p \\ \left(\mu + \frac{1}{2} \sigma^2\right) \, \delta t \mp \sigma \sqrt{\delta t} \sqrt{\frac{p}{1-p}} & 1 - p \end{cases} \\ \frac{dS_t}{S_t} &= \left(\mu + \frac{1}{2} \sigma^2\right) \, dt + \sigma \phi \sqrt{\delta t} \\ dS_t &= \left(\mu + \frac{1}{2} \sigma^2\right) S_t \, dt + \sigma S_t \, dW_t \end{aligned}

Visualization

The arithmetic and geometric approach are equivalent at each resolution (to a close approximation).

Here I’ve defined a geometric binomial model by $p = 0.55$ , $R_1^{(\Delta t)} = 0.11$ , $R_2^{(\Delta t)} = -0.21$ . $\Delta t$ is set to 1 w.l.o.g., and the mean path is interpolated to make the graph easier to read. The expectation of $S_t$ is exponential in time.

Another view, for good measure:

Here, I’ve used more human-friendly numbers that show a plausible stock price path over a few years. Volatility drag is the difference between mean and median paths taken through the binomial tree.

Conclusion - What a drag!

We have seen how the relationship between geometric growth and randomness leads to the phenomenon of volatility drag:

- \frac{1}{2} \sigma^2

It can be described in many different, yet equivalent ways:

the difference between expected simple and log returns at any time scale
the convexity correction for exponential growth
the difference in growth between the mean price path and the median price path
the difference between arithmetic and geometric mean of growth rates
minus half the quadratic variation of $dS_t / S_t$

Bear in mind that simple vs log returns has nothing to do with "compounding," as I have seen claimed online.

All geometric (exponential) growth compounds, by definition.

Practitioners often model asset prices as Geometric Brownian Motion (if only as a precursor to more sophistocated methods) and sample historical price data at a particular resolution in order to estimate the parameters $\mu$ and $\sigma$ for e.g. Monte Carlo simulations. As was stated earlier in this essay, by declaring that the historical (discrete!) time series data is representative of the population, we implicitly declare that our sample path grows at the median rate, not the mean rate!

There’s just one more loose end to tie up. If volatility drag is a negative correction, why did the analysis from the binomial model section end up with a positive correction, $\mu + \frac{1}{2} \sigma^2$ ? Most online materials and papers write Geometric Brownian Motion in terms of increments in $S_t$ (as opposed to increments in $\log S_t$ ).

dS_t = \mu S_t \, dt + \sigma S_t \, dW_t

Indeed, the $\mu$ and $\sigma$ in this equation should be the simple return growth rate and volatility! In the notation used in this essay, the correct formula is

\begin{aligned} dS_t &= \tilde{\mu} S_t \, dt + \tilde{\sigma} S_t \, dW_t \\ dS_t &= \left(\mu + \frac{1}{2} \sigma^2\right) S_t \, dt + \sigma S_t \, dW_t \end{aligned}

It is a common mistake to use the log return growth and volatility in this equation. I believe the mistake comes from the habit of working with log returns by default due to their nice properties, and failing to correct the quantities when plugging them into this most common GBM form.

I do wonder if there’s any money to be made in the market by arbitraging this bug which is no doubt running in some trading algorithm in some hedge fund somewhere, but the errors introduced by assuming Gaussian returns and assuming historical returns are predictive of future returns probably far outweighs the upside.

Additionally, in the Black-Scholes framework, the whole point is to eliminate $\mu$ by hedging anyways. This has the effect of setting the portfolio growth rate to $r$ , the continuously compounded risk-free rate. But of course, rates are quoted in simple terms, not log terms! That’s why, under the risk-neutral $\mathbb{Q}$ measure, the solution for $S_t$ applies the convexity correction downwards:

S_t = S_0 e^{\left(r - \frac{1}{2} \sigma^2\right) t + \sigma W_t^{(\mathbb{Q})}}

The same correction is seen in the solution to the Black-Scholes differential equation for call options.

References

Bachelier, Louis. 1900. “Théorie de La Spéculation.” Annales Scientifiques de L’Ecole Normale Supérieure 17: 21–88.

Cox, John C., Stephen A. Ross, and Mark Rubinstein. 1979. “Option Pricing: A Simplified Approach.” Journal of Financial Economics 7 (3): 229–63. https://doi.org/https://doi.org/10.1016/0304-405X(79)90015-1.

Samuelson, Paul A. 1971. “The "Fallacy" of Maximizing the Geometric Mean in Long Sequences of Investing or Gambling.” Proceedings of the National Academy of Sciences of the United States of America 68 (10): 2493–96. http://www.jstor.org/stable/61075.

We’re in good company, the physicists are all here.↩
$u$ , $v$ , and $\Delta t$ are real world, non-infinitesimal quantities like 1.1 and "5 days."↩
Due to the CLT, the binomial distribution becomes Gaussian in the limit, so it makes sense that we end up with two parameters at each timestep.↩