Creating a Value-Weighted Stock Index¶

This project uses a data set consisting of companies listed on three major stock exchanges. A value-weighted index is created using a selection of these companies.

Ref: https://app.datacamp.com/learn/courses/manipulating-time-series-data-in-python

In [1]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
In [2]:
os.chdir('/content/drive/MyDrive/Colab Notebooks/Datacamp/Manipulating Time Series')
The data set contains info such as the Company Name, Stock Symbol, Sector, Market Cap, and Last Sale values for each listed company¶
In [3]:
listings = pd.read_csv('listings mod.csv')
In [4]:
listings.set_index('Stock Symbol', inplace=True) # reset index
In [5]:
listings.dropna(subset=['Sector'], inplace=True) # drop rows that have no data for Sector
The components of the index will be the companies we have stock info for (in a separate file): AAPL, AMGN, AMZN, CPRT, EL, GS, ILMN, MA, PAA, RIO, TEF, UPS¶
In [6]:
components = listings.loc[['AAPL', 'AMGN', 'AMZN', 'CPRT', 'EL', 'GS', 'ILMN', 'MA', 'PAA', 'RIO', 'TEF', 'UPS']]
In [7]:
# Pull just the stock symbols
tickers = components.index
In [8]:
# Pull out specific info on the components
component_info = listings.loc[tickers.to_list(), ['Company Name', 'Market Capitalization', 'Last Sale']]
pd.options.display.float_format  = '{:,.2f}'.format
These companies are the components of the index¶
In [9]:
component_info
Out[9]:
Company Name Market Capitalization Last Sale
Stock Symbol
AAPL Apple Inc. 740,024.47 141.05
AMGN Amgen Inc. 118,927.21 161.61
AMZN Amazon.com, Inc. 422,138.53 884.67
CPRT Copart, Inc. 13,620.92 29.65
EL Estee Lauder Companies, Inc. (The) 31,122.51 84.94
GS Goldman Sachs Group, Inc. (The) 88,840.59 223.32
ILMN Illumina, Inc. 25,409.38 173.68
MA Mastercard Incorporated 123,330.09 111.22
PAA Plains All American Pipeline, L.P. 22,223.00 30.72
RIO Rio Tinto Plc 70,431.48 38.94
TEF Telefonica SA 54,609.81 10.84
UPS United Parcel Service, Inc. 90,180.89 103.74
In [10]:
# Get stock price info for the companies of interest
prices = pd.read_csv('stock_data.csv', index_col='Date', parse_dates=True)
The stock price data contains daily prices for each company between 2010 and 2016¶
In [11]:
prices.head()
Out[11]:
AAPL AMGN AMZN CPRT EL GS ILMN MA PAA RIO TEF UPS
Date
2010-01-04 30.57 57.72 133.90 4.55 24.27 173.08 30.55 25.68 27.00 56.03 28.55 58.18
2010-01-05 30.63 57.22 134.69 4.55 24.18 176.14 30.35 25.61 27.30 56.90 28.53 58.28
2010-01-06 30.14 56.79 132.25 4.53 24.25 174.26 32.22 25.56 27.29 58.64 28.23 57.85
2010-01-07 30.08 56.27 130.00 4.50 24.56 177.67 32.77 25.39 26.96 58.65 27.75 57.41
2010-01-08 30.28 56.77 133.52 4.52 24.66 174.31 33.15 25.40 27.05 59.30 27.57 60.17
In [12]:
# Calculate number of shares for each company
num_shares = component_info['Market Capitalization'] / component_info['Last Sale']
In [13]:
# Create a market cap series by multiplying the stock price of each company by the number of shares
market_cap_series = prices.mul(num_shares)
In [14]:
# Sum the market cap values for each row
raw_index = market_cap_series.sum(axis=1)
In [15]:
# Create a normalized index
index = raw_index.div(raw_index.iloc[0]).mul(100)
A plot of the index shows how the aggregated market cap changed over time¶
In [16]:
index.plot(title='Market Cap Weighted Index')
Out[16]:
<Axes: title={'center': 'Market Cap Weighted Index'}, xlabel='Date'>
No description has been provided for this image
Total value added¶
In [17]:
# Total value added by all companies over the time period
value_added = raw_index.iloc[-1] - raw_index.iloc[0]
In [18]:
print(f'Total Value Added: {round(value_added, 2)}')
Total Value Added: 880043.98
Contribution of each company¶
In [ ]:
# Contribution of each company
change = pd.concat([market_cap_series.first('D'), market_cap_series.last('D')])
In [21]:
change.diff().iloc[-1].sort_values() # last row of the result is each companies' contribution
Out[21]:
2016-12-30
TEF -97,481.53
RIO -31,779.17
PAA 3,826.81
CPRT 4,272.33
ILMN 14,262.79
EL 19,133.71
GS 26,403.14
UPS 49,080.52
AMGN 65,118.92
MA 86,016.14
AMZN 293,922.79
AAPL 447,267.53

In [22]:
# Contribution of each company to the index's return (percentage)
market_cap = component_info['Market Capitalization']
weights = market_cap.div(market_cap.sum())
weights.sort_values().mul(100) # displays the weights
Out[22]:
Market Capitalization
Stock Symbol
CPRT 0.76
PAA 1.23
ILMN 1.41
EL 1.73
TEF 3.03
RIO 3.91
GS 4.93
UPS 5.01
AMGN 6.60
MA 6.85
AMZN 23.44
AAPL 41.09

Return¶
In [23]:
# Total index return
index_return = (index.iloc[-1] / index.iloc[0] - 1) * 100
In [24]:
print(f'Total Index Return: {round(index_return, 2)}%')
Total Index Return: 126.66%
In [25]:
# Weighted returns
weighted_returns = weights.mul(index_return)
weighted_returns.sort_values().plot(kind='barh', title='Contribution of Each Component to Total Return')
Out[25]:
<Axes: title={'center': 'Contribution of Each Component to Total Return'}, ylabel='Stock Symbol'>
No description has been provided for this image
Evaluate performance compared to the S&P 500¶
In [26]:
perf_data = index.to_frame('Index')
In [27]:
perf_data['SP500'] = pd.read_csv('sp500.csv', index_col='date', parse_dates=True).loc['2010':'2016']
In [28]:
# Normalize the S&P 500 data
perf_data['SP500'] = perf_data['SP500'].div(perf_data['SP500'].iloc[0]).mul(100)
The plot shows the performance of the Index against the S&P 500¶
In [29]:
perf_data.plot(title='Index vs S&P 500')
Out[29]:
<Axes: title={'center': 'Index vs S&P 500'}, xlabel='Date'>
No description has been provided for this image
Performance in 30-day periods¶

Allows you to spot the periods where the index out/under-performed the benchmark

In [30]:
def multi_period_return(r):
  return (np.prod(r + 1) - 1) * 100
perf_data.pct_change().rolling('30D').apply(multi_period_return).plot()
Out[30]:
<Axes: xlabel='Date'>
No description has been provided for this image