Creating a Value-Weighted Stock Index¶
This project uses a data set consisting of companies listed on three major stock exchanges. A value-weighted index is created using a selection of these companies.
Ref: https://app.datacamp.com/learn/courses/manipulating-time-series-data-in-python
In [1]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
In [2]:
os.chdir('/content/drive/MyDrive/Colab Notebooks/Datacamp/Manipulating Time Series')
The data set contains info such as the Company Name, Stock Symbol, Sector, Market Cap, and Last Sale values for each listed company¶
In [3]:
listings = pd.read_csv('listings mod.csv')
In [4]:
listings.set_index('Stock Symbol', inplace=True) # reset index
In [5]:
listings.dropna(subset=['Sector'], inplace=True) # drop rows that have no data for Sector
The components of the index will be the companies we have stock info for (in a separate file): AAPL, AMGN, AMZN, CPRT, EL, GS, ILMN, MA, PAA, RIO, TEF, UPS¶
In [6]:
components = listings.loc[['AAPL', 'AMGN', 'AMZN', 'CPRT', 'EL', 'GS', 'ILMN', 'MA', 'PAA', 'RIO', 'TEF', 'UPS']]
In [7]:
# Pull just the stock symbols
tickers = components.index
In [8]:
# Pull out specific info on the components
component_info = listings.loc[tickers.to_list(), ['Company Name', 'Market Capitalization', 'Last Sale']]
pd.options.display.float_format = '{:,.2f}'.format
These companies are the components of the index¶
In [9]:
component_info
Out[9]:
Company Name | Market Capitalization | Last Sale | |
---|---|---|---|
Stock Symbol | |||
AAPL | Apple Inc. | 740,024.47 | 141.05 |
AMGN | Amgen Inc. | 118,927.21 | 161.61 |
AMZN | Amazon.com, Inc. | 422,138.53 | 884.67 |
CPRT | Copart, Inc. | 13,620.92 | 29.65 |
EL | Estee Lauder Companies, Inc. (The) | 31,122.51 | 84.94 |
GS | Goldman Sachs Group, Inc. (The) | 88,840.59 | 223.32 |
ILMN | Illumina, Inc. | 25,409.38 | 173.68 |
MA | Mastercard Incorporated | 123,330.09 | 111.22 |
PAA | Plains All American Pipeline, L.P. | 22,223.00 | 30.72 |
RIO | Rio Tinto Plc | 70,431.48 | 38.94 |
TEF | Telefonica SA | 54,609.81 | 10.84 |
UPS | United Parcel Service, Inc. | 90,180.89 | 103.74 |
In [10]:
# Get stock price info for the companies of interest
prices = pd.read_csv('stock_data.csv', index_col='Date', parse_dates=True)
The stock price data contains daily prices for each company between 2010 and 2016¶
In [11]:
prices.head()
Out[11]:
AAPL | AMGN | AMZN | CPRT | EL | GS | ILMN | MA | PAA | RIO | TEF | UPS | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | ||||||||||||
2010-01-04 | 30.57 | 57.72 | 133.90 | 4.55 | 24.27 | 173.08 | 30.55 | 25.68 | 27.00 | 56.03 | 28.55 | 58.18 |
2010-01-05 | 30.63 | 57.22 | 134.69 | 4.55 | 24.18 | 176.14 | 30.35 | 25.61 | 27.30 | 56.90 | 28.53 | 58.28 |
2010-01-06 | 30.14 | 56.79 | 132.25 | 4.53 | 24.25 | 174.26 | 32.22 | 25.56 | 27.29 | 58.64 | 28.23 | 57.85 |
2010-01-07 | 30.08 | 56.27 | 130.00 | 4.50 | 24.56 | 177.67 | 32.77 | 25.39 | 26.96 | 58.65 | 27.75 | 57.41 |
2010-01-08 | 30.28 | 56.77 | 133.52 | 4.52 | 24.66 | 174.31 | 33.15 | 25.40 | 27.05 | 59.30 | 27.57 | 60.17 |
In [12]:
# Calculate number of shares for each company
num_shares = component_info['Market Capitalization'] / component_info['Last Sale']
In [13]:
# Create a market cap series by multiplying the stock price of each company by the number of shares
market_cap_series = prices.mul(num_shares)
In [14]:
# Sum the market cap values for each row
raw_index = market_cap_series.sum(axis=1)
In [15]:
# Create a normalized index
index = raw_index.div(raw_index.iloc[0]).mul(100)
A plot of the index shows how the aggregated market cap changed over time¶
In [16]:
index.plot(title='Market Cap Weighted Index')
Out[16]:
<Axes: title={'center': 'Market Cap Weighted Index'}, xlabel='Date'>
Total value added¶
In [17]:
# Total value added by all companies over the time period
value_added = raw_index.iloc[-1] - raw_index.iloc[0]
In [18]:
print(f'Total Value Added: {round(value_added, 2)}')
Total Value Added: 880043.98
Contribution of each company¶
In [ ]:
# Contribution of each company
change = pd.concat([market_cap_series.first('D'), market_cap_series.last('D')])
In [21]:
change.diff().iloc[-1].sort_values() # last row of the result is each companies' contribution
Out[21]:
2016-12-30 | |
---|---|
TEF | -97,481.53 |
RIO | -31,779.17 |
PAA | 3,826.81 |
CPRT | 4,272.33 |
ILMN | 14,262.79 |
EL | 19,133.71 |
GS | 26,403.14 |
UPS | 49,080.52 |
AMGN | 65,118.92 |
MA | 86,016.14 |
AMZN | 293,922.79 |
AAPL | 447,267.53 |
In [22]:
# Contribution of each company to the index's return (percentage)
market_cap = component_info['Market Capitalization']
weights = market_cap.div(market_cap.sum())
weights.sort_values().mul(100) # displays the weights
Out[22]:
Market Capitalization | |
---|---|
Stock Symbol | |
CPRT | 0.76 |
PAA | 1.23 |
ILMN | 1.41 |
EL | 1.73 |
TEF | 3.03 |
RIO | 3.91 |
GS | 4.93 |
UPS | 5.01 |
AMGN | 6.60 |
MA | 6.85 |
AMZN | 23.44 |
AAPL | 41.09 |
Return¶
In [23]:
# Total index return
index_return = (index.iloc[-1] / index.iloc[0] - 1) * 100
In [24]:
print(f'Total Index Return: {round(index_return, 2)}%')
Total Index Return: 126.66%
In [25]:
# Weighted returns
weighted_returns = weights.mul(index_return)
weighted_returns.sort_values().plot(kind='barh', title='Contribution of Each Component to Total Return')
Out[25]:
<Axes: title={'center': 'Contribution of Each Component to Total Return'}, ylabel='Stock Symbol'>
Evaluate performance compared to the S&P 500¶
In [26]:
perf_data = index.to_frame('Index')
In [27]:
perf_data['SP500'] = pd.read_csv('sp500.csv', index_col='date', parse_dates=True).loc['2010':'2016']
In [28]:
# Normalize the S&P 500 data
perf_data['SP500'] = perf_data['SP500'].div(perf_data['SP500'].iloc[0]).mul(100)
The plot shows the performance of the Index against the S&P 500¶
In [29]:
perf_data.plot(title='Index vs S&P 500')
Out[29]:
<Axes: title={'center': 'Index vs S&P 500'}, xlabel='Date'>
Performance in 30-day periods¶
Allows you to spot the periods where the index out/under-performed the benchmark
In [30]:
def multi_period_return(r):
return (np.prod(r + 1) - 1) * 100
perf_data.pct_change().rolling('30D').apply(multi_period_return).plot()
Out[30]:
<Axes: xlabel='Date'>