The following document goes through the process of optimizing the hedge ratio between 3 - 5 equity instruments.
In order to do this, I first analyzed an array of ETFs from different sectors and determined the five with smallest covariances amongst the group of all ETFs.
Using these five ETFs, I then built out portfolios with varying equity weights to look for the optimal ratio between the instruments. The optimal ratio is determined by reducing the portfolio variance based on the work found here and here. This is done three times, each with increasing accuracy and precision using step sizes of 0.1, 0.025, and 0.05 (see below for further explanation).
Through out this document, the Yahoo! Finance API and Pandas Library are used to conduct the research.
import yahoo_finance as yf
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
To start, I first set basic variables. First and foremost, performance is analyzed over the past 2.5 years using a portfolio size of $100,000 (NOTE: the optimal weights found in this document might not work for all portfolio sizes).
Some top performing ETFs of 6 different sectors were compiled to look for the top five instruments to use. Given different sector performance, the hope was to reduce volatility by including ETFs from counter-cyclical sectors. ETFs were found from various articles like this for consumer retail and this for industrials.
start_date = '2013-01-02'
end_date = '2015-06-23'
p_size = 100000
consumer_etfs = ['XRT', 'XLY']
financial_etfs = ['VFH', 'KRE']
healthcare_etfs = ['IHF', 'XLV']
industrial_etfs = ['XLI', 'VIS']
technology_etfs = ['XLK', 'VGT']
utility_etfs = ['XLU', 'VPU']
all_equities = consumer_etfs + financial_etfs + healthcare_etfs + industrial_etfs + technology_etfs + utility_etfs
In order to analyze these equities, each instrument's pricing data was pulled in using the Yahoo Finance API. From there, the data was piped into a Pandas DataFrame and cleaned for further analysis later.
I also pulled in data for the S&P-500 to be used for performance metrics (beta, etc.) and to compare to final optimized portfolio.
# given a df directly piped from Yahoo Finance API, clean up df for further analysis
def clean_historical_df(df):
# check if df already cleaned
if 'Date' not in df.columns:
return df
# convert dates to datatime format and make date column into df index array
df.index = [pd.to_datetime(date) for date in df['Date']]
df.drop('Date', axis=1, inplace=True)
del df.index.name
# convert data from string to int / float types
df['Volume'] = df['Volume'].astype(int)
for column in ['Adj_Close', 'Close', 'High', 'Low', 'Open']:
df[column] = df[column].astype(float)
return df
# pull in data from Yahoo Finance for equities in array and return Pandas df with prices
def get_equity_prices(equities, start_date, end_date):
df = pd.DataFrame(index = pd.date_range(start_date, end_date))
# for each equity, pull in all historical data and store closing price in df
for equity in equities:
e_data = clean_historical_df(pd.DataFrame(yf.Share(equity).get_historical(start_date, end_date)))
df[equity] = pd.Series(e_data['Close'], index = df.index)
return df.dropna()
prices = get_equity_prices(all_equities, start_date, end_date)
# plot price data (simply as check for any strange behavior like stock splits)
f = plt.figure()
f.suptitle('Equity Prices - ' + ', '.join(all_equities), fontsize=12)
f.set_figwidth(10)
f.set_figheight(5)
for equity in prices.columns:
plt.plot(prices.index, prices[equity], '-', label=equity)
plt.legend(loc=2)
# pull in historical S&P-500 data (for index reference)
SPX = clean_historical_df(pd.DataFrame(yf.Share('^GSPC').get_historical(start_date, end_date)))
SPX.sort_index(ascending=True, inplace=True)
SPX['Return'] = SPX['Close'].pct_change()
I generated the covariance matrix for all equities to look for the ETFs with the smallest covariance between them. These instruments should result in the lowest volatility for use in the portfolio.
cov_mtx = prices.pct_change().cov()
cov_mtx
equities = cov_mtx.mean().order()[:5].index.tolist() # take the ETFs with smallest 5 mean covariances
equities
With the five ETFs chosen, I analyzed the performance of various portfolios with different weights for each equity.
To do so, each portfolio was built based on the weights provided and the prices on the start date (note: the portfolio value was only allowed to be equal to or less than the $100,000 of capital). From there, each daily value was calculated and was then used for returns. The portfolio performance was then stored in a dataframe to look at later.
# build portfolio with maximum number of positions given the weights
def build_portfolio(size, date, prices, equities, weights):
df = pd.DataFrame(columns = ['price', 'positions'])
# find maximized number of positions to hold given weight, price, and portfolio size
for equity in equities:
price = prices.loc[start_date][equity]
positions = int(size * weights[equities.index(equity)] / price)
df.loc[equity] = {'price': price, 'positions': positions}
return df
# generate returns for a portfolio with given positions / prices
def generate_returns(portfolio, prices, start_date, end_date):
df = pd.DataFrame(index = pd.date_range(start_date, end_date))
# build value of portfolio given prices of equities and number of positions
df['Value'] = pd.Series([0] * len(df.index))
df['Value'].fillna(value = 0, inplace = True)
start_val = 0
for e in portfolio.index:
df['Value'] += prices[e] * portfolio.loc[e]['positions']
start_val += portfolio.loc[e]['positions'] * portfolio.loc[e]['price']
# given portfolio value, calculate daily returns
df['Return'] = df['Value'].pct_change()
df.loc[start_date]['Return'] = (df.loc[start_date]['Value'] - start_val) / start_val
return df.dropna()
def get_beta(index_df, equity_df):
return index_df['Return'].cov(equity_df['Return']) / index_df['Return'].var()
def get_alpha(index_df, portfolio_df, rate):
p_return = (portfolio_df.iloc[-1]['Value'] - portfolio_df.iloc[0]['Value']) / portfolio_df.iloc[0]['Value']
i_return = (index_df.iloc[-1]['Close'] - index_df.iloc[0]['Close']) / index_df.iloc[0]['Close']
return p_return - (rate + get_beta(index_df, portfolio_df) * (i_return - rate))
# should take in daily risk-free rate
def get_sharpe(df, riskfree):
excess_return = df['Return'] - riskfree
return excess_return.mean() / excess_return.std()
### Helper functions used throughout document ###
# generate range of percentage numbers with given start, stop, and step values
def drange(start, stop, step):
l = [start]
while l[-1] + step <= stop:
l.append(l[-1] + step)
return l
# simple helper function to generate string that says equity name and " Weight"
def weight_str(e, n=-1):
if n == -1:
return str(e) + ' Weight'
else:
return str(e[n]) + ' Weight'
def generate_all_portfolios(size, prices, equities, range_data, start_date, end_date, log_progress):
# build dataframe with column for each equity weight and other performance metrics
df_cols = [ ]
for equity in equities:
df_cols.append(weight_str(equity))
df = pd.DataFrame(columns = df_cols + ['Return', 'Beta', 'Alpha', 'Sharpe', 'Volatility'])
# loop over step ranges and if weight combinations are valid, generate portfolio for weights
for w1 in drange(range_data['starts'][0], range_data['ends'][0], range_data['step']):
for w2 in drange(range_data['starts'][1], range_data['ends'][1], range_data['step']):
for w3 in drange(range_data['starts'][2], range_data['ends'][2], range_data['step']):
for w4 in drange(range_data['starts'][3], range_data['ends'][3], range_data['step']):
if w1 + w2 + w3 + w4 <= 1:
w5 = 1 - w1 - w2 - w3 - w4
# generate portfolio and returns and store in dataframe
portfolio = build_portfolio(size, start_date, prices, equities, [w1, w2, w3, w4, w5])
returns = generate_returns(portfolio, prices, start_date, end_date)
df.loc[len(df) + 1] = {weight_str(equities, 0): round(w1, 3),
weight_str(equities, 1): round(w2, 3),
weight_str(equities, 2): round(w3, 3),
weight_str(equities, 3): round(w4, 3),
weight_str(equities, 4): round(w5, 3),
'Return': returns['Return'].mean(),
'Beta': get_beta(SPX, returns),
# using annual risk-free return of 0.05 (used to get daily rate below)
'Alpha': get_alpha(SPX, returns, 0.05),
'Sharpe': get_sharpe(returns, 0.000133681),
'Volatility': returns['Return'].var()
}
if log_progress and len(df) % 250 == 0:
print "#%d: %f, %f, %f, %f, %f" % (len(df), w1, w2, w3, w4, w5)
return df
Because it isn't efficient to loop over every single possible weight combination, I am going to perform a breadth-first search of sorts. I will start out looking at weights from 0 to 1 by steps of 0.1 and then narrow the ranges and reduce the step sizes with each subsequent trial.
Portfolios are first built using a search step of 0.1. To find the best combination, I analyzed portfolios containing the three, four, and five best ETFs to see how each combination performed. These portfolios are then plotted to make sure performance forms the Markowitz curve. The ultimate goal is to find the optimal portfolio weights at the tip of the "bullet".
range_info = {'step': .1, 'starts': [0,0,0,0], 'ends': [1,1,1,1]}
wdf_A = generate_all_portfolios(p_size, prices, equities, range_info, start_date, end_date, True)
range_info = {'step': .1, 'starts': [0,0,0,0], 'ends': [1,1,0,0]}
wdf_with3 = generate_all_portfolios(p_size, prices, ['XLU', 'VPU', 'XLK', 'XLV', 'IHF'], range_info, start_date, end_date, True)
range_info = {'step': .1, 'starts': [0,0,0,0], 'ends': [1,1,1,0]}
wdf_with4 = generate_all_portfolios(p_size, prices, ['XLU', 'VPU', 'XLK', 'XLV', 'IHF'], range_info, start_date, end_date, True)
print "~~~ Done ~~~"
# Plot Return vs. Volatility Frontier (showing efficient frontier)
f = plt.figure()
f.suptitle('Markowitz Bullet - ' + ', '.join(equities), fontsize=12)
f.set_figwidth(8)
f.set_figheight(5)
# limits of data to use for axis limits
print "y-limits (return): (%f, %f)" % (wdf_A['Return'].min(), wdf_A['Return'].max())
print "x-limits (vol): (%f, %f)" % (wdf_A['Volatility'].min(), wdf_A['Volatility'].max())
plt.scatter(wdf_A['Volatility'], wdf_A['Return'], label = 'With Top 5')
plt.scatter(wdf_with4['Volatility'], wdf_with4['Return'], c='y', label = 'With Top 4')
plt.scatter(wdf_with3['Volatility'], wdf_with3['Return'], c='r', label = 'With Top 3')
plt.ylabel('Return')
plt.xlabel('Volatility')
plt.ylim(0.00025, 0.00125)
plt.xlim(0.00004, 0.00008)
plt.legend(loc=2)
In the plot above, we can see that the three-ETF portfolios performed significantly worse than four or five-ETF portfolios so we can disregard those. Next is to consider whether to use four or five instruments. Since they seem to have similar volatility but five-ETF portfolios appear to have slightly higher returns for similar volatilities, I decided to go with a five-ETF portfolio (using the five instruments with the lowest mean covariances).
From here, I will use the optimal weights from Trial A to increase precision and make smaller ranges in Trial B, now searching with a step of 0.025. This should narrow the search further.
wdf_A.sort(['Volatility','Sharpe'], ascending=[1,0])[:5]
range_info = {'step': .025, 'starts': [.15,.1,.15,.325], 'ends': [.35,.3,.25,.45]}
wdf_B = generate_all_portfolios(p_size, prices, equities, range_info, start_date, end_date, True)
wdf_B.sort(['Volatility','Sharpe'], ascending=[1,0])[:5]
Using the results found from Trial B, I am going to again repeat the process used for Trial A and B, now searching with step of 0.005.
With these new optimized portfolios, I will again plot all of the portfolios, forming the Markowitz Bullet, and make sure the optimal weighting is at tip of bullet.
range_info = {'step': .005, 'starts': [.23,.13,.185,.35], 'ends': [.26,.16,.215,.40]}
wdf_C = generate_all_portfolios(p_size, prices, equities, range_info, start_date, end_date, True)
wdf_C.sort(['Volatility','Sharpe'], ascending=[1,0])[:5]
# Second check for markowitz curve
f = plt.figure()
f.suptitle('Markowitz Bullet - ' + ', '.join(equities), fontsize=12)
f.set_figwidth(12)
f.set_figheight(5)
plt.scatter(wdf_A['Volatility'], wdf_A['Return'], c = 'b', label = 'Trial A')
plt.scatter(wdf_B['Volatility'], wdf_B['Return'], c = 'c', label = 'Trial B')
plt.scatter(wdf_C['Volatility'], wdf_C['Return'], c = 'y', label = 'Trial C')
ideal_weight = wdf_C.sort(['Volatility','Sharpe'], ascending=[1,0])[:1]
plt.scatter(ideal_weight['Volatility'], ideal_weight['Return'], c = 'r', s = 50, label = 'Optimized Weight')
plt.ylabel('Return')
plt.xlabel('Volatility')
plt.ylim(0.0005, 0.00085)
plt.xlim(0.000043, 0.0000515)
plt.legend(loc=2)
Since the optimized weight that was calculated does sit on the point of the efficient frontier, it appears that this weighting is the most efficient for the equities used.
Next, I am going to look into see how each trial improved the performance of the portfolio. I'm going to take the best portfolio from each trial and compare them. Finally, I'll see how the optimized weighting compares to the S&P-500, simply to see performance.
def get_weights(row):
ws = []
for col in row.columns:
if " Weight" in col:
ws.append(row[col].iloc[0])
return ws
best_A_row = wdf_A.sort(['Volatility','Sharpe'], ascending=[1,0])[:1]
best_B_row = wdf_B.sort(['Volatility','Sharpe'], ascending=[1,0])[:1]
optimal = wdf_C.sort(['Volatility','Sharpe'], ascending=[1,0])[:1]
p = build_portfolio(p_size, start_date, prices, equities, get_weights(optimal))
r = generate_returns(p, prices, start_date, end_date)
spx_row = pd.DataFrame([{'Return': SPX['Return'].mean(),
'Beta': get_beta(SPX, SPX),
# using annual risk-free return of 0.05 (used to get daily rate below)
'Alpha': get_alpha(SPX, SPX.rename(columns={'Close': 'Value'}), 0.05),
'Sharpe': get_sharpe(SPX, 0.000133681),
'Volatility': SPX['Return'].var()
}])
result = pd.concat([best_A_row, best_B_row, optimal, spx_row])
result.index = ['Trial A', 'Trial B', 'Trial C (Optimized)', 'S&P 500']
result[best_A_row.columns]
f = plt.figure()
f.suptitle('Optimized Weighting - ' + ', '.join(equities), fontsize=12)
f.set_figwidth(15)
f.set_figheight(5)
l = plt.plot(r.index, r['Value'], 'r-', label='Final Optimized Weight')
sp = plt.plot(SPX.index, SPX['Close'] * int(p_size / SPX.iloc[0]['Close']), 'b-', label='S&P 500 (Scaled)')
plt.legend(loc=2)
Considering Modern Portfolio Theory and the efficient frontier (seen above), this final optimized portfolio is the best hedge ratio for the used equities. We can see a similar result by comparing the returns vs. that of the S&P and Trial A.
The biggest potential flaws that stems from the method(s) used is the selection of the instruments used. Picking the ideal equities to build the optimal portfolio is a complex question that remains without a solid answer in industry. Thus what I intended to do was to obtain the best portfolio ratio by reducing the portfolios volatility. With so many ways to do this, I went with the method that offered the simplest route to the best approximation. However, other options that could have been used include PCA analysis on the different ETFs or the possibility of using non-ETF instruments. Given this goal, I standby the decision to chose the five ETFs that were used.
With all of this in mind, the optimized weights (in percentage of portfolio, using $100,000 in capital) can be found below.
optimal