My experience in financial markets to date has mostly been related to trading and investment banking. From executing index arbitrage and various other strategies, to product managing a team building execution algorithms for automating strategies which minimize market impact and make relative and conditional trading decisions. More recently, post a move to Canada from South Africa, I have had the opportunity to cut my teeth on the importance of smart order routing, including latency optimization for minimizing latency arbitrage and information leakage. In my spare time i contribute to various open source packages in the R/Finance ecosystem, mostly quantstrat and blotter which are amazingly powerful packages for building and testing quantitative strategies.

None of the above are skills that are directly transferable to a portfolio optimization problem. So why bother with portfolio optimization from a personal perspective? Well, a few reasons. Firstly an opportunity to learn and utilize some of the portfolio analytics functionality available as open source in various R/Finance packages. Secondly, an opportunity to apply institutional techniques to a personal portfolio. This is important. As someone with a day job which does not involve staring at screens of asset prices (those were the days) i simply do not have the time to follow the daily ebb and flow of the markets nor to pick stocks with very much due diligence. Of course stock picking for the majority is a net loser’s game, but with entertainment value for those who enjoy the gyrations of their portfolio net equity. The third reason, which is kind of related to the second is that i need a process which i can apply to my disposable income, and ideally something with less volatility than your average meme stock basket.

Below is a view of a portfolio i started towards the end of last year. Whilst the cumulative time-weighted return is outperforming my rudimentary S&P TSX Composite Index benchmark, that was not the case during part of May and June. Fortunately the portfolio recovered, helped by some speculative plays in Blackberry and averaging into some technology names that took an absolute beating with their prospects unchanged or even improving with each quarter. Nevertheless, it was a time of introspection and an acknowledgement that i needed a process for at least part of my portfolio, the non-stock-picking allocation part. With a diligent portfolio optimization process i should be able to reduce the risk in my portfolio, with relatively infrequent rebalancing requirements which will help with minimizing execution fees.

This blog post is to document my very recently conspired process for future reference, to share with others if they find it valuable and of course to receive feedback and/or criticism where the process is flawed or can be improved.

### Investable Universe

The starting point for any portfolio allocation process is to select a range of possible investments. This universe will be largely dependent on your objectives (preserve capital, exceed inflation, outperform S&P500 in absolute or risk-adjusted terms, minimize drawdowns, etc) and constraints (market access, capital available, etc). For a Global Tactical Asset Allocation (GTAA) approach to selecting your investment universe, there are several starting points you could consider. One of the seminal papers on the topic is Meb Faber’s A Quantitative Approach to Tactical Asset Allocation. Meb starts with the following assets:

US Large Cap (S&P 500), Foreign Developed (MSCI EAFE), US 10-Year Government Bonds, Commodities (Goldman Sachs Commodity Index), Real Estate Investment Trusts (NAREIT Index).

In a paper published by the team at ReSolve Asset Managemet, Adaptive Asset Allocation: A Primer, the authors use 10 asset classes spanning U.S. stocks, European stocks, Japanese stocks, Emerging market stocks, U.S. REITs, International REITs, U.S. 7-10 year Treasuries, U.S. 20+ year Treasuries, Commodities and Gold.

For my purposes, considering this is currently a pet project for a tiny portfolio allocation i figured i would limit my range of possible investments to the range of iShares ETFs offered by BlackRock and select the top ranked securities per major asset class using trailing 6m returns. As at 14 July 2021 there were 394 possible ETFs. When downloading historical data for these names (more on the data in a bit) i was unable to download data for 15 of the 394 ETFs. The breakdown of the remaining ETFs by asset class using Net Assets is below.

So we have a healthy mix of Asset Classes represented in the range of iShares ETFs offered by BlackRock. Clearly Net Assets are skewed towards Equity and Fixed Income. This is not surprising. What about the count of ETFs offered per asset class above?

As expected, the number of ETFs across asset classes is slightly less skewed than when looking at Net Assets.

The next question i asked myself was how was the absolute value of Net Assets distributed across each of the iShares ETFs?

The notable outlier at almost 300bn USD in Net Assets is the iShares Core S&P 500 ETF (not observable in the plot, but in an interactive session hovering over the bar with plotly will highlight the name of the ETF). This is not surprising, and highlights why securities can move like they do when it is announced they will be added to the S&P 500 Index. Tesla and Moderna are recent examples.

Ok, with this information it is clear to see there are many fringe ETFs with low Net Assets. At this point i chose a Net Assets threshold of 5bn USD. Any ETFs with less than 5bn USD would get automatically excluded from my investment universe. With that filter the investment universe reduces to 84 ETFs. When deciding on an appropriate USD cutoff i looked at a scatterplot of the Net Assets to Days Since Inception. Clearly the longer an ETF has been around, the more opportunity it has had to receive investments. Of course there is also more opportunity for drawdowns and redemptions to reduce the size of Net Assets, but on the whole there was a positive relationship between Net Assets and Days Since Inception. *I excluded ETFs with Net Assets > 50bn USD from this plot, for better visualization.*

There are several ETFs above the threshold which have existed for at least 4-5 years, so i am happy with this rudimentary cutoff. I will revisit this in future iterations of my process. Ultimately it is the liquidity of the underlying ETF and their constituents that is most important, but for now we will use Net Assets of the ETF as a proxy.

So, with the remaining list of 84 ETFs what is the distribution across Asset Class, Sub Asset Class, Region, Market and Investment Style?

Its clear the remaining 84 ETFs are nicely distributed across Asset Class, Region and Market etc. The sole Active ETF is the *BlackRock Ultra Short-Term Bond ETF*. At the right price, there is a market for anything.

### Data

Most of the data for the above plots was taken from the product screener downloadable from the iShares website. For running the portfolio optimization process which required daily returns for as far back as possible, i used the IBrokers package in R. With relatively little effort i could configure my local instance of Trader Workstation (TWS) to accept API connections. Once i signed up for the relevant market data subscriptions with Interactive Brokers (1 USD per month for the Cboe One Add-On Bundle – *waived if monthly commissions reach 5 USD*, and 10 USD per month for US Securities Snapshot and Futures Value Bundle – *waived if monthly commissions reach 30 USD*) i was able to download historical bar data, as well as real-time quote and bar data for all the iShares ETFs.

In a previous call to download data for all 394 ETFs, 15 failed, hence my starting point being the remaining 379 ETFs. Given the 84 ETFs with Net Assets greater than 5bn USD, my next task was to download monthly bar data from which to compute my momentum filter, using 6m returns. Before doing that however, i downloaded a snapshot of the real-time quotes for all 84 ETFs just to get an idea of the spreads (in basis points) in the market at the time. That chart is below.

These spreads appear pretty narrow which bodes well for trading purposes and as another proxy for liquidity. A closer look at the tick size of the spread illustrated that most were as narrow as they could be given the tick size constraints of one penny.

### Momentum Filter

After downloading bar data for the remaining 84 ETFs, i was able to compute the 6m return per asset class. I did this somewhat crudely, merely converting the monthly prices to quarterly returns (using * periodReturn *from the

`quantmod`

package) and summing the last 2 observations. Below are bar plots of the distribution of returns per Asset Class from the remaining 84 ETFs.Now of course, momentum as a factor in both time series and cross sectionally is a well studied and well documented factor. My choice of 6m is based on what i have seen in the literature. Future iterations may look at alternate horizons, and alternate metrics for computing a momentum ranking. For now it will suffice.

We notice in the above plots how pervasive Equity ETFs are, followed by Fixed Income ETFs and then 2 Commodity ETFs (Gold and Silver) and the sole Real Estate ETF being the iShares US Real Estate ETF. For future iterations, i will need to look at a broader representation of these Asset Classes, likely by reducing or even eliminating my 5bn USD constraint for starters. Again, for now…it will suffice. The goal is to get to portfolio optimization using the Hierarchical Risk Parity technique documented by Marcos Lopez de Prado.

With the above information, we will select a single security from each Asset Class to start. We will select the ETF with the highest returns. In some papers the authors discard the 10th and 90th percentile in order to minimize potential idiosyncratic risk (thanks Brian for the reminder). For ETFs, this is less of a concern.

### Hierarchical Risk Parity

Right, given a list of prospective investments how do i determine the relevant weights to assign to each? There are many approaches one can take here, from quadratic optimization using a mean-variance approach to more convex solvers using particle swarming or genetic algorithms. For a thorough implementation with functionality to specify multiple objectives and constraints and using an open source package in the R/Finance ecosystem you can refer to the introductory vignette for the `PortfolioAnalytics`

package.

However, Marcos Lopez de Prado writes about Building Diversified Portfolios that Outperform Out-of-Sample and suggests an approach which is meant to outperform more familiar approaches. At this point i was not going to test different approaches, but learn the particular approach suggested by Marcos. In future iterations i will definitely be doing portfolio backtesting and comparisons for multiple portfolio optimization techniques and objectives. In the paper he goes on to illustrate, with an example and code (dont you just love papers that are published with reproducible results and the accompanying code) the idea of a Hierarchical Risk Parity approach to optimizing a given portfolio. Gautier Marti, who now works with Marcos, gives a slightly more refined version in his blog post with code ready to run inside a Jupyter Notebook. Gautier’s code is in Python 3, and Marcos’ is in Python 2. In order to get Marcos’ code to run in Python 3, you will need to edit a few lines of code…particularly replace `xrange`

with `range`

and use `//`

instead of a single `/`

in the `getRecBipart`

function from the paper.

Being a stronger R user than a Python user my next task was to use the code kindly contributed by Ilya Kipnis in his blog post here. Using the Python `generateData`

function from the paper i was only able to reproduce half of the data inside the correlation and covariance matrices, which was kindly disclosed in Ilya’s blog post but not included in the paper. I tried in both Python 2 and 3, and neither worked. This was not a showstopper, it just means i will be unable to reproduce the results from the paper exactly. My ultimate goal from that point was to get identical results with the correlation and covariance matrices i did have, and see if i could get similar results using Ilya’s code and Marcos’ code.

Once i was able to figure out some Python 2 and Python 3 conflicts, i was getting identical results. Great…this means i have 2 implementations of the HRP approach proposed by Marcos in his paper. From here i needed to get real data, specifically correlation and covariance matrices for my prospective investments. Using the `IBrokers`

package again (shoutout to Jeff Ryan and Josh Ulrich as the authors and maintainers of this package respectively), i downloaded daily data over a 10 year period for each of IGV, TLT, IAU and IYR. Since i could only get data as far back as 2016 for all the ETFs, i cut off significant amounts of data for the ETFs which had the entire 10 year dataset. Again, this is not ideal but for getting an MVP solution off the ground this would suffice.

Using the `Return.calculate`

function from the popular `PerformanceAnalytics`

package to compute returns, i then computed the correlation and covariance matrices. From there, i had the option of using either R or Python for computing the necessary weights. One thing to note is that Marcos’ code outputs weights in their cluster order, whereas Ilya is outputting them in the order that they appear in the correlation and covariance matrices. Using the matrices computed from my daily returns for the 4 ETFs, i was able to replicate the weights proposed by Marcos’ code and Ilya’s code.

The theory behind the approach is widely discussed by Marcos, elaborated on by Gautier and Ilya does a great job of simplifying it down to the main components in his blog post. So i will refrain from commenting on the method, but suffice to say i need to spend some time in the Recursive Bisection function to properly understand the mechanism for assigning weights using a minimum variance and clustered approach.

The weights for my investments IGV, TLT, IAU and IYR respectively were:

[1] 0.1043488 0.3841945 0.3761123 0.1353444

The below code to reproduce these results in Python and R is included below. The correlation and covariance matrices are included below as well (i could only upload them as .xls documents…so you will need to convert to .csv or edit the code to read .xls documents):

# R code | |

corMat <- read.csv("corMat_23072021.csv")[,–1] | |

covMat <- read.csv("covMat_23072021.csv")[,–1] | |

clustOrder <- hclust(dist(corMat), method = 'single')$order | |

clustOrder | |

# Plot Dendogram for some visualization of the order | |

clust <- hclust(dist(corMat), method = 'single') | |

plot(clust) | |

getIVP <- function(covMat) { | |

# get inverse variance portfolio from diagonal of covariance matrix | |

invDiag <- 1/diag(as.matrix(covMat)) | |

weights <- invDiag/sum(invDiag) | |

return(weights) | |

} | |

getClusterVar <- function(covMat, cItems) { | |

# compute cluster variance from the inverse variance portfolio above | |

covMatSlice <- covMat[cItems, cItems] | |

weights <- getIVP(covMatSlice) | |

cVar <- t(weights) %*% as.matrix(covMatSlice) %*% weights | |

return(cVar) | |

} | |

getRecBipart <- function(covMat, sortIx) { | |

# keeping track of weights vector in the global environment | |

assign("w", value = rep(1, ncol(covMat)), envir = .GlobalEnv) | |

# run recursion function | |

recurFun(covMat, sortIx) | |

return(w) | |

} | |

recurFun <- function(covMat, sortIx) { | |

# get first half of sortIx which is a cluster order | |

subIdx <- 1:trunc(length(sortIx)/2) | |

# subdivide ordering into first half and second half | |

cItems0 <- sortIx[subIdx] | |

cItems1 <- sortIx[–subIdx] | |

# compute cluster variances of covariance matrices indexed | |

# on first half and second half of ordering | |

cVar0 <- getClusterVar(covMat, cItems0) | |

cVar1 <- getClusterVar(covMat, cItems1) | |

alpha <- 1 – cVar0/(cVar0 + cVar1) | |

# updating weights outside the function using scoping mechanics | |

w[cItems0] <<- w[cItems0] * alpha | |

w[cItems1] <<- w[cItems1] * (1–alpha) | |

# rerun the function on a half if the length of that half is greater than 1 | |

if(length(cItems0) > 1) { | |

recurFun(covMat, cItems0) | |

} | |

if(length(cItems1) > 1) { | |

recurFun(covMat, cItems1) | |

} | |

} | |

out <- getRecBipart(covMat, clustOrder) | |

out |

For future iterations I will need to look into:

1. Understand whether or not expenses (and distributions for that matter) are priced into the prices im using for returns, or adjust with expense ratio info

2. Look at multiple metrics for “momentum”

3. Different objective functions and constraints

4. Different rebalancing periods

5. Backtest multiple portfolio optimization techniques

6. Refine how many assets to pick from each asset class, using a top-down approach. I suspect my choice of benchmark will matter here, and that will depend on my ultimate objectives and constraints.

7. Re-consider my 5bn USD threshold…there are better liquidity filters to use without risking the possible exclusion of otherwise valuable and appropriate ETFs

Thanks for reading!