How Not to Visualize Martingales

Published February 14, 2024

TL;DR
A brief review of Martingales
The Naive Martingale Detector
Fixing the Naive Martingale Detector
- Solution 1: Mind the filtration
- Solution 2: Autoregressive model
Final thoughts
- Just because it works in practice, doesn’t mean it works in theory!
- More lessons: lightning round
References

A cautionary tale about the subtleties of stochastic process simulation

TL;DR

While exploring the properties of various continuous-time stochastic processes, I had a bit of fun simulating them in python using discrete approximations generated via the Euler-Maruyama method.

In order to check my work when manipulating various stochastic random variables using Itô calculus, I built a Martingale Detector - a crude tool which tests a drift hypothesis via Monte Carlo methods.

When the naive detection scheme yielded unexpected results for certain processes, I discovered that I had implicitly baked in a faulty assumption in my code.

I believe that the bug described below is an instructive example of the consequences of eschewing rigor when designing simulations. This essay is equal parts mathematical candy and postmortem.

While my use for the tool was rather pedestrian (checking my work on some practice problems), testing for signals like Martingale and Markov properties in stochastic time-series is actually of great interest in fields like finance and physics.

The usual disclaimer: The purpose of this essay is to document my experience and present some food for thought to the reader. The procedures described below should not be relied upon for anything important.

A brief review of Martingales

Assume a filtered probability space $(\Omega, \mathcal{F}, \{\mathcal{F}_t\}_{t \ge 0}, \mathbb{P})$ is given. An $\mathcal{F}_t$ -adapted process $Y_t$ is a continuous-time martingale if and only if $\mathbb{E}\left[|Y_t|\right] < \infty$ and

\mathbb{E}\left[Y_t | \mathcal{F}_s\right] = Y_s, \quad \forall \; 0 \le s \le t

Proving the martingale conditions for simple stochastic processes which can be expressed in closed-form often requires little more than some massaging using the tools of Itô calculus and basic probability theory. For example, for processes $F_t = F(W_t, t)_t$ which can be expressed as functions of a standard Wiener process $W_t$ and time $t$ , it is sufficient to prove

\frac{\partial F}{\partial t} + \frac{1}{2} \frac{\partial^2 F}{\partial W^2} = 0

Still, I wanted a quick-and-dirty way to check my work, preferrably with a method that makes few assumptions about the underlying dynamics.

The Naive Martingale Detector

The problem domain

I didn’t bother formalizing requirements for the types of processes which the tool should handle. The general form of stochastic differential equations describes far too broad a domain of processes and admits many pathological examples which could evade any numerical drift detector I throw at it ¹.

dX_t = \mu(t, X_t) \, dt + \sigma(t, X_t) \, dW_t

I essentially restricted $\mu(t, X_t)$ and $\sigma(t, X_t)$ to analytic functions or Itô integrals of analytic functions. I may revisit this topic in the future to further generalize the revised tool presented at the end of this essay.

While fancy non-parametric techniques like bootstrapping might be a more mathematically sound tool for the job, Monte Carlo will essentially give me a population standard deviation estimate ( $\hat{\sigma} = s$ ) to any degree of precision my CPU can stand, thus making a $z$ -test a reasonable choice. CLT, take the wheel!

A better name for this tool might be the Drift Detector ², since we will be evaluating a Martingale null hypothesis.

H_0: \mathbb{E}\left[Y_t\right] - Y_0 = 0 \quad \forall \, 0 \le t \le T

Spoiler alert:

The above null is NOT equivalent to invalidating the Martingale property; the nature of this mistake is the primary subject of this essay.

Aiming for simplicity, I ran a $z$ -test for each timestep $t = i \, \delta t$ and rejected the null upon discovery of any breaches.

Detection procedure

For the following, let $t_i \equiv i \, \delta t$ . ³

My naive (indeed faulty!) scheme for drift detection is essentially a series of hypothesis tests driven by Monte Carlo simulation. The details:

Sample $n \times T$ stochastic increments $dW_t \equiv \delta W_{t_i}; \ i \in \{0, \dots, T - 1\}$ from a Gaussian⁴; i.e. $\delta W_{t_i} \sim \mathcal{N}(0, \sqrt{\delta t})$ .
Perform discrete integration of the $dW_t$ to obtain $W_t$ , ensuring that $W_0 = 0$ .
Shift the $dW_t$ to the left such that any vectorized operations on $W_t$ and $dW_t$ are properly aligned (I may describe this step in further detail later, but it is largely unimportant for this discussion).
Construct $Y_t = f(W_t, t)$ via Euler-Maruyama. At this point, path realizations of $Y_t$ can be rendered to get a sense of their dynamics.
Drift detection step: Compute $z$ -statistics for each timestep ⁵ and test the null.
$z_t = \frac{\bar{Y}_t - Y_0}{\hat{\sigma}_t / \sqrt{n}} \approx \frac{\frac{1}{n} \sum_{i=1}^n Y_t^{(i)} - Y_0}{s_t / \sqrt{n}}$

Below are some examples of the detector working as designed.

Figure 1: A Monte Carlo path simulation with significance level $\alpha = 0.05$ shown as black lines. The test correctly fails to reject the null for Martingale $e^{t/2} \cos W_t$ and correctly rejects the Martingale null for near-Martingale $e^{t/1.9}\cos W_t$ (breaches shown as red circles).

Notice the growing critical region boundaries (in black) as the processes diffuse.

For good measure, I included the Wiener process, which passes unsurprisingly. I also cooked up some more exotic test cases, like processes which are locally driftless at the boundaries of the interval. This test will catch these types of processes, up to the resolution of the discretization.

Figure 2: This process satisfies the Martingale properties locally but not globally.

Alas, it was too good to be true!

I quickly noticed the detector making Type 2 errors for certain processes, some of which are shown below.

Figure 3: Examples of non-Martingales which caused Type 2 errors in the detector.

In the case of $Y_t = (W_t - t) e^{W_t + 2t}$ , we do indeed see a few breaches. However, they do not appear on all seeds, and when they do, the $p$ -values are very close to the significance level. We will treat this example as a false negative.

I suspect this is a sampling error which is easily resolved by generating more paths. Nevertheless, I have kept this seed in the essay to demonstrate that getting these types of tests right is tricky!

In order to diagnose the problem, we need to analyze the two non-Martingales shown above ⁶.

$X_t = e^{\alpha t} \cos W_t$ (correctly identified)
$Y_t = (W_t - t) e^{W_t + k t}$ (false negative)

I set out to build a tool to sanity-check my math; now we must rely on the math to debug the tool!

Non-Martingale examples

Non-Martingale 1: correctly identified

Consider the process

X_t = e^{\alpha t} \cos W_t; \quad \alpha \in \mathbb{R}

Then $\mathbb{E}\left[X_t | \mathcal{F}_s\right] = e^{\alpha t} \mathbb{E}\left[\cos W_t | \mathcal{F}_s\right]$ . To proceed further, define $f(W_t, t) = e^{t/2} \cos W_t$ . By Itô’s lemma,

df_t = -e^{t/2} \sin W_t \, dW_t

This stochastic differential equation is equivalent to the more flexible integral form:

f_t - f_s = - \int_s^t e^{u/2} \sin W_u \, dW_u

Rearranging,

\begin{gathered} \cos W_t = e^{-(t - s)/2} \cos W_s - \int_s^t e^{(u - t)/2} \sin W_u \, dW_u \\ \therefore \mathbb{E}\left[\cos W_t | \mathcal{F}_s\right] = e^{-(t - s)/2} \cos W_s \end{gathered}

Finally, have

\mathbb{E}\left[X_t | \mathcal{F}_s\right] = e^{\alpha t} e^{-(t - s)/2} \cos W_s = e^{\left(\alpha - \frac{1}{2}\right) (t - s)} Y_s

So $X_t$ is not a martingale for all $\alpha \ne 1/2$ .

Non-Martingale 2: false negative

Consider the following process:

Y_t = (W_t - t) e^{W_t + k t}; \quad k \in \mathbb{R}

The conditional expectation is relatively easy to calculate. There may be more elegant methods, but I computed the expectation by brute force, taking advantage of the normality of the quantity $W_t - W_s$ .

\begin{aligned} \mathbb{E}\left[Y_t | \mathcal{F}_s\right] &= e^{kt} \mathbb{E}\left[(W_t - t) e^{W_t} | \mathcal{F}_s\right] \\ &= -t \mathbb{E}\left[e^{W_t} | \mathcal{F}_s\right] + \mathbb{E}\left[W_t e^{W_t} | \mathcal{F}_s\right] \\ &= -t \mathbb{E}\left[e^{W_t} | \mathcal{F}_s\right] + \mathbb{E}\left[(W_t - W_s) e^{W_t - W_s} e^{W_s} | \mathcal{F}_s\right] + \mathbb{E}\left[W_s e^{W_t - W_s} e^{W_s} | \mathcal{F}_s\right] \\ &= -t \mathbb{E}\left[e^{W_t} | \mathcal{F}_s\right] + e^{W_s} \mathbb{E}\left[(W_t - W_s) e^{W_t - W_s} | \mathcal{F}_s\right] + W_s e^{W_s} \mathbb{E}\left[e^{W_t - W_s} | \mathcal{F}_s\right] \end{aligned}

There are then just two functions of Gaussian quantities for which the expectation must be found. Let $x \equiv W_t - Ws$

\begin{aligned} \mathbb{E}\left[e^{W_t - W_s} | \mathcal{F}_s\right] &= \frac{1}{\sqrt{2 \pi (t - s)}} \int_\mathbb{R}e^x e^{-\frac{x^2}{2 (t - s)}} \, dx \\ \mathbb{E}\left[(W_t - W_s) e^{W_t - W_s} | \mathcal{F}_s\right] &= \frac{1}{\sqrt{2 \pi (t - s)}} \int_\mathbb{R}x e^x e^{-\frac{x^2}{2 (t - s)}} \, dx \end{aligned}

It is trivial to show that

\begin{aligned} \mathbb{E}\left[e^{W_t - W_s} | \mathcal{F}_s\right] &= e^{(t - s)/2} \\ \mathbb{E}\left[(W_t - W_s) e^{W_t - W_s} | \mathcal{F}_s\right] &= (t - s) e^{(t - s)/2} \end{aligned}

Finally,

\begin{aligned} \mathbb{E}\left[Y_t | \mathcal{F}_s\right] &= e^{kt} \left( -t e^{(t - s)/2} e^{W_s} + e^{W_s} (t - s) e^{(t - s)/2} + W_s e^{W_s} e^{(t - s)/2} \right) \\ &= e^{kt} e^{(t - s)/2} e^{W_s} \left( -t + (t - s) + W_s \right) \\ &= e^{kt - ks} e^{(t - s)/2} (W_s - s) e^{W_s + k s} \\ &= e^{\left(k + \frac{1}{2}\right) (t - s)} Y_s \end{aligned}

So $Y_t$ is not a martingale for all $k \ne -1/2$ .

This bears a striking resemblance to the result for $\mathbb{E}\left[X_t | \mathcal{F}_s\right]$ . So why does the martingale detection method work for $X_t = e^{\alpha t} \cos W_t$ , but not for $Y_t = (W_t - t) e^{W_t + k t}$ ?

So, what went wrong?

The Monte Carlo techniques used by the Naive Martingale Detector work by path averaging, i.e.

\bar{Y}_t = \frac{1}{n} \sum_{i=1}^n Y_t^{(i)} \xrightarrow{P} \mathbb{E}\left[Y_t\right]

where convergence in probability is guaranteed by the Weak Law of Large Numbers.

Spot the mistake? $\bar{Y}_t$ converges on the unconditional expectation of $Y_t$ , whereas the martingale property is defined by the expectation of $Y_t$ conditional upon the filtrations $\{\mathcal{F}_s\}_{0 \le s \le t}$ !

For stochastic processes whose expectation conditioned on $\mathcal{F}_0$ is always some static value, the Naive Martingale Detector will "report" a false positive!

The originally proposed null hypothesis is invalid because it implies a conditional expectation only on $\mathcal{F}_0$ and no other values. It is not equivalent to the Martingale drift property.

Hopefully it’s clear now why the Naive Martingale Detector failed or succeeded for the processes discussed earlier.

The process $X_t = e^{\alpha t} \cos W_t$ has unconditional expectation $\mathbb{E}\left[X_t\right] = e^{\left(\alpha - \frac{1}{2}\right) t}$ .
NMD correctly identified the drift.
The process $Y_t = (W_t - t) e^{W_t + k t}$ has unconditional expectation $\mathbb{E}\left[Y_t\right] = 0 \; \forall t$ .
NMD false negative.
The process $Z_t = \frac{W_t^2}{t}$ has unconditional expectation $\mathbb{E}\left[Z_t\right] = 1 \; \forall t$ .
NMD false negative.

Fixing the Naive Martingale Detector

There are many ways to fix the Naive Martingale Detector. The first is my solution and admittedly a bit hacky; the second is the test used in most papers on this topic.

Solution 1: Mind the filtration

This solution is a bit crude but allows us to keep most of the code from the Naive Martingale Detector.

After implementing this, I was pleased to find that a more refined version of this strategy was used successfully in this paper: Park and Whang (2005)

Modify the hypothesis:

H_0(s, x): \mathbb{E}\left[Y_t - Y_s | Y_{s} > x\right] = 0 \quad \forall \, s \le t \le T

for some $0 \le s \le T$ and some $x \in \mathbb{R}$ .

The conditional probability is the key here; there may be many expressions which work. I found a simple threshold filter to yield good results.

The best results are obtained by setting test parameters $s$ and $x$ manually for each process; luckily, this is not hard to do. A filter at time $s = \left\lfloor \frac{T}{4 \delta t} \right\rfloor$ (about 25% along the $t$ axis) works for all processes, but doesn’t always lead to good visuals. At first, I somewhat arbitrarily used $x = 2 \hat{\sigma}(s)$ but switched to filtering the upper quartile to accommodate processes with highly clustered distributions.

Here, we revisit the false negatives from above, this time armed with the new detector.

Figure 4: Corrected Detector with unconditional expectation (red) and filtered expectation (orange).

Success! Now, we must ensure there is no regression for the first examples in this essay.

Figure 5: Qualitatively, these Martingales exhibit much less drift after filtering when compared to their non-Martingale cousins above.

Unfortunately, we have indeed regressed here, erroneously rejecting the null hypothesis for the process $Y_t = e^{t/2} \cos W_t$ . On the bright side, it’s not due to a flaw in logic, but it does expose a weakness of this approach. I believe sampling error is the culprit in this instance, since we filter out 75% of the samples before testing. It’s trivial to compensate for this with a larger initial sample size.

I am confident that one could construct a clever process which could systematically break this revised procedure. If you can find one, I’d love to hear about it!

Solution 2: Autoregressive model

Up to this point, every solution discussed (valid or otherwise) just throws CPU power at the problem quite brutishly. This approach is often tempting (and incredibly powerful), especially when time is of the essence.

But I would be doing the reader a great disservice if I did not mention the right way to solve this problem. This approach is not my own; full credit goes to the many people who have contributed to its development.

Instead of analyzing cross-sections of samples, model the sequence and test for the strength of the drift term in an AR(p) model. For AR(1), model

X_t = \mu + \theta X_{t-1} + \varepsilon_i

To make this useful, assume a unit root ( $\theta$ = 1) so that

X_t = \mu t + \sum_0^t u_s + X_0

Thus, we have our Martingale null hypothesis:

H_0 : \mathbb{E}\left[\Delta X_t - \mu \;\big\vert\;\mathcal{F}_{t-1}\right] = 0

Clever indeed! For more, see this paper from Phillips and Jin: Phillips and Jin (2014)

Final thoughts

I’d like to document some of my takeaways here, written as advice to my future self.

Just because it works in practice, doesn’t mean it works in theory!

Wikipedia lists the 4 defining properties of the Wiener process $W_t$ . I designed the path generation procedure described in this essay to preserve properties 1, 3, and 4 (obviously sacrificing continuity). The path generation itself worked well.

However, the mere existence of a functioning discretization mechanism should not lead you to assume that some continuous analog of that mechanism even exists, much less is the fundamental engine of the object of study!

For a while, I operated with the mental model that a realization of the Wiener process was constructed by some mysterious mathematical supercomputer running my same algorithm but with an uncountably infinite array of $dW_t$ . But of course, this is the wrong mathematical framing! Instead, entire paths (on some interval) are indexed by samples $\omega \in \Omega$ , and we observe the properties of these objects through the $t$ -indexed lenses provided by a filtration (a proxy for real-world ignorance) on the underlying probability space.

Notice that $dW_t$ appears nowhere in the 4 defining properties of the Wiener process. $dW_t$ really comes from Itô calculus ⁷, when Itô generalized the Riemann-Stieltjes integral to stochastic settings in the 1940s, two decades after Wiener’s formalization.

More lessons: lightning round

There is a place for rigor when designing simulations, and it can be rewarding to nail down seemingly abstract mathematical objects.
When designing and debugging simulations, be explicit about your assumptions; own and challenge them. Understand what you lose when discretizing.
Use caution when switching between mathematical contexts. Brownian motion and its cousins are interesting objects of study because they are fundamentally random and fundamentally continuous.

Simulations always make compromises. But simulations like this have additional (sometimes quite subtle) sources of potential error (e.g. discretization error, sampling error, even accumulating floating point errors).
Quality testing of code is critical; focus on edge cases and combinations of edge cases.

References

Park, Joon Y., and Yoon-Jae Whang. 2005. “A Test of the Martingale Hypothesis.” Studies in Nonlinear Dynamics & Econometrics 9 (2): 1–32. https://doi.org/10.2202/1558-3708.1163.

Phillips, Peter C. B., and Sainan Jin. 2014. “Testing the Martingale Hypothesis.” Journal of Business & Economic Statistics 32 (4): 537–54. https://doi.org/10.1080/07350015.2014.908780.

For example, hide drift at some irrational number $t$ that the discretization $i \, \delta t$ will miss.↩
but Martingale Detector is better for branding!↩
In order to simplify the notation throughout, I often switch between continuous-time and discrete-time notation without much justification. I’ll hand wave here and appeal to the strong convergence of order $p = 1/2$ of Euler-Maruyama; discretization does not play a major role in the arguments herein.↩
Of course, this need not be a Gaussian so long as the first and second moments are correct!↩
I’m sure there is some kind of fancy aggregation statistic or early-stopping method which is more optimal, but this worked fine for a first attempt.↩
I’ve added constants in $X_t$ and $Y_t$ ( $\alpha$ and $k$ ) to make the math more general. As you will see, only particular values of these constants make the processes Martingales, which were not used above.↩
I’m not sure if stochastic increments appeared before this; the point is that Wiener’s original model did not use them.↩