I was fortunate enough to talk about my latest open source work with Brian Peterson at the R/Finance conference in Chicago just less than 1 month ago. It was my first time to the conference, and I will be back again for sure. The topics and their presentations are available on the website. With this post, I hope to share the main ideas of my talk.
Back in August 2017 I wrote a post about a new function in the R blotter package called mcsim(). The objective for writing that post was to introduce the function, and how and why it may be useful when evaluating quantitative trading strategies. In this article I aim to achieve the same goal for a new function named txnsim(), with a similar application but with more analytical value.
With mcsim() an analyst is able to simulate a trading strategy’s portfolio PL which will give a person some information on the statistical properties of the strategy overall. Simulating portfolio PL can have drawbacks; in particular that simulations on a portfolio level may lack transparency or may not line up with historical market regimes. In these instances sampling round turn trades may be more useful.
With txnsim() the analyst is able to construct a random strategy that preserves as many of the stylized facts (or style) of the observed strategy as possible, while demonstrating no skill. The round turn trades of the random replicate strategies, while outwardly resembling the original strategy in summary time series statistics, are the result of random combinations of observed features taking place at random times in the tested time period. This effectively creates simulated traders with the same style but without skill. For this reason txnsim() is most appropriate for discerning skill vs. luck or overfitting.
Performance Simulations in the Literature
- Pat Burns (2004) covers the use of random portfolios for performance measurement and in a subsequent paper in 2006 for evaluating trading strategies which he terms a related but distinct task. He goes on to mention in his evaluating strategies paper that statistical tests for a signal’s predictiveness was generally possible even in the presence of potential data snooping bias. Things have likely changed in the 12 years since in that data snooping has become more prevalent, with more data, significantly advanced computing power and the ability to fit an open source model to almost any dataset.
Pat Burns – “If we generate a random subset of the paths, then we can make statistical statements about the quality of the strategy.”
- Jaekle & Tomasini in their Trading Systems book refer to the analysis of trading systems using Monte Carlo analysis of trade PNL. In particular they mention the benefit of a confidence interval estimation for max drawdowns.
Jaekle & Tomasini – “Changing the order of the performed trades gives you valuable estimations about expected maximum drawdowns.”
- In the Probability of Backtest Overfitting paper by Lopez de Prado et al in 2015, they present a method for assessing data snooping as it relates to backtests, which are used by investment firms and portfolio managers to allocate capital.
Lopez de Prado et al – “…because the signal-to-noise ratio is so weak, often
the result of such calibration is that parameters are chosen to profit from
past noise rather than future signal.”
Thanks to the consent of Matt Barry, we will soon be implementing code from the pbo package into blotter and extending from there to fit in with the general framework of blotter and our ambitions of broadening the overfit detection methods available in quantstrat and blotter.
- Harvey et al, in their series of papers including Backtesting and the Cross-Section of Expected Returns discuss their general dismay at the reported significance of papers attempting to explain the cross-section of expected returns. They propose a method for deflating the Sharpe Ratio when taking into account the data snooping bias otherwise referred to as Multiple Hypothesis testing.
Harvey et al – “We argue that most claimed research findings in financial economics are likely false.”
We implemented Harvey’s haircut Sharpe ratio model in quantstrat.
What all these methods have in common, is an element of random sampling based on some constraint. What we propose in txnsim() is the random sampling of round turn trades bound by the constraint of the stylized facts of the observed strategy.
Compared with more well-known simulation methods, such as simulating portfolio P&L, Round Turn Trade Simulation has the following benefits:
- Increased transparency, since you can view the simulation detail down to the exact transaction, thereby comparing the original strategy being simulated to random entries and exits with the same overall dynamic.
- More realistic since you sample from trade durations and quantities actually observed inside the strategy, thereby creating a distribution around the trading dynamics, not just the daily P&L.
What all this means, of course, is you are effectively creating simulated traders with the same style but zero skill.
If you consider the stylized facts of a series of transactions that are the output of a discretionary or systematic trading strategy, it should be clear that there is a lot of information available to work with. The stylized facts txnsim() uses for simulating round turns include;
- Round turn trade durations
- Ratio of long:short durations
- Quantity of each round turn trade
- Direction of round turns
- Number of layers entered, limited by maximum position
Using these stylized facts, txnsim() samples either with or without replacement between flat periods, short periods and long periods and then layers onto these periods the sampled quantities from the original strategy with their respective durations.
Round Turn Trades
In order to sample round turn trades, the analyst first needs to define what a round turn trade is for their purposes. In txnsim() there is a parameter named tradeDef which can take one of 3 arguments, 1. “flat.to.flat”, 2. “flat.to.reduced”, 3. “increased.to.reduced”. The argument is subsequently passed to the blotter::perTradeStats() function from which we extract the original strategy’s stylized facts.
For a more comprehensive explanation of the different trade definitions, i will refer you to the help documentation for the perTradeStats() function as well as the documentation for the txnsim() function in blotter.
The first empirical example we will take a look at is an analysis using txnsim() and the longtrend demo in blotter. My only modification to the demo was to end the strategy in Dec 2017, purely for the purposes of replicating my results.
As we can see in the blue Positionfill window, the strategy only enters into a position once, before exiting.
If we look at how the strategy performed overall (after setting the seed to lucky number 333), relative to its random replicates we can see fairly quickly that it is a difficult strategy to beat.
To observe the stylized facts of the original versus the winning replicate strategy, we can contrast the Positionfills of both.
Replicate number 664 out of 1,000 was the most profitable strategy overall (we confirm this in a moment), and we get a good sense of how txnsim() honoured the stylized facts of the original strategy when determining the random entries and exits of the ultimate winning replicate number 664.
One of the many slots returned in the txnsim object is named “replicates” and includes the replicate start timestamps, the durations and the corresponding quantities. With the duration information in particular, we are able to chart the distribution of long period durations and flat period durations.
Perhaps not surprisingly, we see the original strategy duration for long and flat periods roughly in the middle of the distributions of the replicates.
Since longtrend was a long only strategy, the flat period distribution is a mirror image of the long period distribution.
Included in the returned list object of class txnsim, are ranks and pvalues which summarise the performance of the originally observed strategy versus the random replicates.
As we can see, longtrend was in the 90th percentile for all performance metrics analysed except for stddev where it ranked inside the 84th percentile.
Using the hist method for objects of type txnsim we can plot any of the performance metric distributions to gauge how the observed strategy fared overall. Looking at the Sharpe ratio, we see graphically how longtrend does relative to the replicates as well as relative to configurable confidence intervals.
Maximum drawdown is another one of the performance metrics used and generally a favorite for simulations. We can see again, visually, that longtrend outperforms most random replicates on this measure.
Layers and Long/Short strategies with ‘bbands’
For any round turn trade methodology which is not measuring round turns as flat.to.flat, things get more complicated.
The first major complication with any trade that levels into a position is that the sum of trade durations will be longer than the market data. The general pattern of the solution is that we sample as usual, to a duration equivalent to the duration of the first layer of the strategy. In essence we are sampling assuming round turns are defined as “flat.to.flat”. Any sampled durations beyond this first layer are overlapped onto the first layer. The number of layers is determined by the amount of times the first layer total duration is divisible into the total duration. In this way the total number of layers and their duration is directly related to the original strategy.
The next complication is max position. Now, a strategy may or may not utilize position limits. This is irrelevant. We have no idea which parameters are used within a strategy, only what is observable ex post. For this reason we store the maximum long and short positions observed as a stylized fact. To ensure we do not breach these observed max long and short positions during layering we keep track of the respective cumsum of each long and short leveled trade.
For any trade definition other than flat.to.flat, however, we need to be cognizant of flat periods when layering to ensure we do not layer into an otherwise sampled flat period. For this reason we match the sum duration of flat periods in the original strategy for every replicate. To complete the first layer with long and short periods, we sample these separately and truncate the respectively sampled long and short duration which takes us over our target duration. When determining a target long and short total duration to sample to, we use the ratio of long periods to short periods from the original strategy to distinguish between the direction of non-flat periods.
To highlight the ability of txnsim() to capture the stylized facts of more comprehensive strategies including Long/Short strategies with leveling we use a variation of the ‘bbands’ strategy. Since we apply a sub-optimal position-sizing adjustment to the original demo strategy in order to illustrate leveling, we do not expect the strategy to outperform the majority of its random counterparts.
A quick look at the chart.Posn() output of bbands should highlight the difference in characteristics between longtrend and bbands.
We run 1k replicates and the resulting equity curves as you can see here confirm our suspicions. We have a lower probability of outperforming random replicates for this version of ‘bbands’. In fact you will see there are periods during the backtest that we severely underperform the other random agents. Something I touch on in the future work section is the addition of something similar to Burns’ non-overlapping periodic p-values so we can better visualize just how the backtest performed through time.
Taking a closer look at the performance and position taking of the “winning” random replicate, we get a sense of how the strategy attempts to mirror the original in terms of position sizing and duration of long versus short positions overall. It should also be evident how the replicate has honored the maximum long and short positions observed in the original strategy.
Comparing the position fills of the original strategy and the winning replicate more directly we get a better sense of the overall dynamic of both the original and the winning replicate.
What is potentially a red flag from this chart, is the difference in padding. The replicate clearly has less padding, meaning the total duration that the strategy is in the market will be less than the original.
When we plot the long and short duration distributions of the replicates and compare these to the original strategy, it highlights the magnitude of the discrepancy and is something we hope to resolve soon so we can move onto the txnsim vignette and hopefully the start of a paper on Round Turn Trade Simulations.
As mentioned previously and in no particular order, future work items will include:
- Refining the layering process to better replicate total trade duration of the original strategy
- Adding p-value visualization through time, similar to Pat Burns’ 10-day non-overlapping pvalues
- Adding other simulation methodologies
- Simulation studies of multivariate portfolios bound by capital constraints
- Basing simulations on simulated or resampled market data
- Applying txnsim stylized facts to “OOS” market data
- And of course, a vignette and hopefully a paper on Round Turn Trade Simulation
Round turn trade Monte Carlo simulates random traders who behave in a similar manner to an observed series of real or backtest transactions. We feel that round turn trade simulation offers insights significantly beyond what is available from:
- equity curve Monte Carlo (implemented in blotter in mcsim),
- from simple resampling (e.g. from pbo or boot),
- or from the use of simulated input data (which typically fails to recover many important stylized facts of real market data).
Round turn trade Monte Carlo as implemented in txnsim directly analyzes what types of trades and P&L were plausible with a similar trade cadence to the observed series. It acts on the same real market data as the observed trades, efficiently searching the feasible space of possible trades given the stylized facts. It is, in our opinion, a significant contribution for any analyst seeking to evaluate the question of “skill vs. luck” of the observed trades, or for more broadly understanding what is theoretically possible with a certain trading cadence and style.
The source code for my talk and this post is on GitHub.
Thanks to my co-author Brian Peterson!