In this project we have analysed the dynamics of the Financial Markets through Network Analysis.We have built networks of the equities that are part of the S and P 500 index over a range Time Periods between 2007 and 2017 based on Winner Take All and the Minimum Spanning Tree method .Both these methods utilise the correlation coefficient computed between the attributes of these stocks such as Price,Volume,Returns etc . Community detection techniques were then applied to the constructed networks. The resulting communities were compared for consistency with the identified market sections using Standard Industrial Classification code.We also studied the evolution of the network and the communities over the study period and found interesting behaviors. We have compared our results from both the methods for each of our analysis.We created a GEXF file for this dynamic network and visuvalised the same in Gephi a open source network visuvalisation software.The visualization results offer a very intuitive way to look at the overall correlation structure of different the equities in the S and P 500 and evolution of these networks over a period of time .
Network analysis of Equities is a extensively researched topic and in section 3 we have detailed current literature which was utilised as part of this project.In all the current studies the focus has been on studying the properties of the market in a stationary view for a fixed time period.By leveraging the techniques noted in these current literature we have as part of this study built multiple networks of the stocks in the S and P 500 index for mutiple non overlapping windows of Time Period (T) between 2007 and 2017 and studied how the network evolves and how the communities in the network behave with the changing dynamics of the market.
Below are key Objectives of the project.For each of these items we have compared the results we got for for the networks built based on both these methods.
1. Build network for the stocks in S and P 500 index based on the correlations between Prices/Volume for Multiple Time Periods using Winner Take All method and Minimum Spanning Tree Method
2. Analyse the topology of the networks in multiple time periods.Does the network of stocks exhibit scale free properties at each of the time period?
3. Detect communities in these networks and find out if the stocks actually trade in groups based on the SIC(Standard Industry Classification) Code.
4. Studied the evolution of these communities
5. Find important stocks and sectors they belong to based on the Network Properties at Different time periods.
6. Visuvalize the network and also the dynamic evolution of network by building dynamic graphs using Gephi
Current Studies about network analysis for stock market can be classified into below categories :
(1) Applying network analysis techniques for different markets and analyze the topological characteristics of each market Statistical Analysis of Financial Markets, Hierarchical structure in financial markets
(2) Propose different correlation metric analysis among various stock markets to suggest different definitions of edges between stocks and study the impact on the network using different edge definitions Network analysis of a financial market based on genuine correlation and threshold method, Network of Equities in Financial Markets, A network perspective of the stock market
Approach to construct the edges of stock market network is not unique. In the current literature, multiple measures were investigated to construct the edges between nodes namely Zero-lag correlation,Detrended covariance,Time-lag correlations of prices changes over a certain period of time
Studies have covered both emerging and mature markets. Authors claim that understanding the topological properties can help to understand correlation patterns among stocks, thus providing guidance for risk management. Topological properties often of interest include degree distribution, clustering and component structure. In this subcategory study, usually only one correlation measure is proposed to establish the connections between nodes. In the introduction session of Statistical Analysis of Financial Markets, the author covered a wide range of previous studies in this category
We collected the prices for the stocks that trade in both NASDAQ and the NYSE stock exchange from Eod Data . The data consisted of Opening , Closing prices , Volume information for each trading day for the period of 2007 to 2017.From this data we filtered and selected only the prices that are a part of S and P 500 . We chose the S and P 500 since the index had a well balanced portfolio of stocks from different industry segments .
## Read S and P 500 list
import pandas as pd
import numpy as np
dfsp500 = pd.read_csv('data/SANDP500.csv')
companies=dfsp500['Symbol'].tolist()
companies=np.random.choice(companies, size=500, replace=False)
import glob
import os
path = r'data/NASDAQ'
all_files = glob.glob(os.path.join(path, "*.txt"))
df_from_each_file = (pd.read_csv(f) for f in all_files)
concatenated_df_NAS = pd.concat(df_from_each_file, ignore_index=True)
concatenated_df_NAS=concatenated_df_NAS[concatenated_df_NAS['<ticker>'].isin(companies)]
path = r'data/NYSE'
all_files = glob.glob(os.path.join(path, "*.txt"))
df_from_each_file = (pd.read_csv(f) for f in all_files)
concatenated_df_NYS = pd.concat(df_from_each_file, ignore_index=True)
concatenated_df_NYS=concatenated_df_NYS[concatenated_df_NYS['<ticker>'].isin(companies)]
concatenated_df = pd.concat([concatenated_df_NAS,concatenated_df_NYS])
col_p = 'close'
concatenated_df.columns = ['ticker','date','open','high','low','close','vol']
concatenated_df=concatenated_df[concatenated_df['ticker'].isin(companies)]
concatenated_df=concatenated_df.merge(dfsp500,left_on='ticker',right_on='Symbol')
concatenated_df['ticker'] = concatenated_df['ticker']
df_price = concatenated_df[['ticker','date',col_p]]
df_price=df_price.drop_duplicates( keep='last')
df_price['date'] = pd.to_datetime(df_price['date'], format='%Y%m%d', errors='ignore')
df_price.set_index(['date','ticker'],inplace=True)
df_price=df_price.unstack()[col_p]
df_price.reset_index(inplace=True)
df_price.fillna(method='bfill',inplace=True)
df_price.fillna(method='ffill',inplace=True)
Of the methods in current literature number we choose Time-lag correlations of prices changes over a certain period of time .One of the keys challenges in computing the correlation on stock prices is that the values are moving time series and have inherent trends which can lead to spurious correlations if the data is not properly normalised.
If we think about a time series of prices, you could write it out as
[P0,P1,P2,...,PN], or [P0,P0+R1,P0+R1+R2,...,P0+R1+...+RN], where Ri = Pi-P(i-1).
Written this way we can see that the first return R1, contributes to every entry in the series, whereas the last only contributes to one. This gives the early values in the correlation of prices more weight than they should have.
So for computing the correlation we take the difference between the prices for each day giving us the returns for each .We computed the log returns between two days since it has a key benefit of being additive over multiple time periods .
Though log returns can be computed over multiple time periods of 7 , 30 , 60 , 100 days for the sake of simplicity we kept the return window to 1 day.
import scipy.signal
t = 1
for key in df_price.columns:
if key not in companies:
continue
try:
df_price[key] = np.log(df_price[key]) - np.log(df_price[key].shift(t))
except:
print (key)
df_price.set_index('date',inplace=True)
## A quick visualization: detrended data
import matplotlib.pyplot as plt
%matplotlib inline
import random as rn
NUM_COLORS = len(companies)
cm = plt.get_cmap('gist_rainbow')
colors = [cm(i/NUM_COLORS) for i in range(NUM_COLORS)]
rn.seed = len(companies) # for choosing random colors
fig, ax = plt.subplots(nrows=5,ncols=2,figsize=(20, 20))
y=2007
for row in ax:
for col in row:
yfs = str(y) + '0101'
yfe = str(y) + '1231'
n = 0
col.set_ylim([0.5, -0.5])
for i in df_price.columns:
df_price.loc[yfs:yfe][i].plot(ax=col,color=colors[n])
n = n + 1
y = y + 1
plt.tight_layout()
plt.show()
Next we computed the pearson correlation between the log returns.The data is divided into windows of width (T) in order to uncover dynamic characteristics of the networks. The window width corresponds to the number of daily returns included in the computation of the correlation between Stocks. The method of time windows division to construct asset graphs can be found in the literature Asset trees and asset graphs in financial markets
To determine the ideal length of the window we computed the mean correlation for window values of 21,42,63,84 and 105 and plotted the variations in the correlation.As seen in the plot the window of 63 captures the fluctuations of the market well.Values less than this are too noisy and higher than this we lose the sensitivity in the changes in the market .Also from a market perspective 63 days ideally falls into the Quarterly reporting cycle of these companies so we felt it would be appropriate choice
The correlation matrix is then computed based on this window length of 63 by dividing the period between 2007 and 2017 into multiple windows.
import matplotlib.pyplot as plt
%matplotlib inline
corr_dict = {}
T = 1
for w in range(21,126,21):
x = []
y = []
W = w
for i in range(t,len(df_price),W):
dkey = i
corr_dict[dkey]=df_price.iloc[i:(i+W)].corr(method='pearson')
corr_dict[dkey].fillna(0,inplace=True)
x.append(dkey)
y.append(np.mean([abs(j) for j in corr_dict[dkey].values.flatten().tolist()]))
plt.plot(x,y)
plt.xlabel('Days')
plt.ylabel('Mean Correlation')
plt.legend(list(range(21,126,21)), loc='center left', bbox_to_anchor=(1, 0.5),ncol=1)
plt.show()
W = 63
corr_dict = {}
for i in range(t,len(df_price),W):
dkey = i
corr_dict[dkey]=df_price.iloc[i:(i+W)].corr(method='pearson')
corr_dict[dkey].fillna(0,inplace=True)
Literature A network perspective of the stock market details the winner take all method where we build a network based on the correlation matrix if the value of the Correlation index is greater than a threshold.
What is the ideal value of threshold ?.Since one the key objectives of the study is to find if the stocks behave in groups we wanted to select a threshold which maximises the modularity .However we were also consious not overfit the data in which case we might loose some of the underlying dynamics at work in the market.
In the next step we built the network based on different thresholds ranging from 0.6 to 0.99 for the windows identified .We then picked the value of the threshold from the values where the modularity is maximum.On analysis of this result we noted that most networks tended to have a high modularity for threshold value between 0.75 and 0.85 .There were certains windows where the threshold for the best modularity was seen to be greater than 0.9 however at this threshold the number edges was very less so in such cases we set the threshold to 0.8 and built the network.We treated both positive and negative correlations the same and looked at the absolute value.
import networkx as nx
import community
def get_modularity(y,threshold):
df_price_corr = corr_dict[y]
elist = []
outdict=df_price_corr.to_dict()
for i in outdict.keys():
for j in outdict[i].keys():
if abs(outdict[i][j]) > threshold :
if i == j :
continue
if i < j:
elist.append([i,j,dict(weight=abs(outdict[i][j]),start=y,end=y+W)])
#elist.append([i,j,dict(start=y,end=y+1)])
else:
None
#print (len(elist))
G=nx.Graph()
G.add_edges_from(elist)
#print (nx.info(G))
partition = community.best_partition(G)
try:
m = community.modularity(partition, G)
except:
m = 0
return m
# This will be our list of fractions to run the simulation over
fractions = np.linspace(0.6, 0.99, 20)
M_list = {}
for y in corr_dict.keys():
M_list[y] = [ get_modularity(y, frac) for frac in fractions ]
The below is the plot between the threshold value and the computed markdown in the various windows
import matplotlib.pyplot as plt
%matplotlib inline
plt.figure(figsize=(20,10))
for y in corr_dict.keys():
plt.plot(fractions, M_list[y], lw=2)
plt.legend(list(M_list.keys()), loc='center left', bbox_to_anchor=(1, 0.5),ncol=2)
plt.xlabel('Threshold')
plt.ylabel('Modularity')
plt.show()
# Pick value of threshold for each Window (T)
T_val = {}
for y in M_list.keys():
val, idx = max((val, idx) for (idx, val) in enumerate(M_list[y]))
if fractions[idx] > 0.8:
T_val[y] = 0.8
else:
T_val[y] = fractions[idx]
#print (str(y) + ":" + str(T_val[y]))
# Create Edge List
elist_dict={}
for y in corr_dict.keys():
df_price_corr = corr_dict[y]
threshold = T_val[y]
elist = []
outdict=df_price_corr.to_dict()
for i in outdict.keys():
for j in outdict[i].keys():
if abs(outdict[i][j]) > threshold :
if i == j :
continue
if i < j:
elist.append([i,j,dict(weight=1,start=y,end=y+W-1)])
else:
None
elist_dict[y] = elist
# Constructing the graph different windows
import networkx as nx
import community
G_dict = {}
for y in elist_dict.keys():
G=nx.Graph()
elist = elist_dict[y]
G.add_edges_from(elist)
values = dfsp500.set_index('Symbol').to_dict(orient='dict')['Sector']
for node, value in values.items():
try:
G.node[node]['Sector'] = value
except:
#name = (value[0:3] + '-' + node)
#G.add_node(name,Sector=value)
None
partition = community.best_partition(G)
deg_cent=dict((k,float(v)) for k,v in nx.degree_centrality(G).items())
degree = dict((k,float(v)) for k,v in nx.degree(G).items())
#katz_cent=nx.katz_centrality(G)
#eigen_cent= dict((k,float(v)) for k,v in nx.eigenvector_centrality(G).items())
close_cent= dict((k,float(v)) for k,v in nx.closeness_centrality(G).items())
betw_cent= dict((k,float(v)) for k,v in nx.betweenness_centrality(G).items())
nx.set_node_attributes(G, "community", partition)
nx.set_node_attributes(G, "degreecent", deg_cent)
nx.set_node_attributes(G, "degree", degree)
#nx.set_node_attributes(G, "katz", katz_cent)
#nx.set_node_attributes(G, "eigenvector", eigen_cent)
nx.set_node_attributes(G, "closeness", close_cent)
nx.set_node_attributes(G, "betweenness", betw_cent)
nx.set_node_attributes(G, 'start',y)
nx.set_node_attributes(G, 'end',y+W-1)
#G.remove_nodes_from(nx.isolates(G))
#T = nx.minimum_spanning_tree(G)
T = G
G_dict[y] = T
# Collect the node level attributes for the nodes for all the windows
df_list = []
for k in G_dict.keys():
G = G_dict[k]
a = G.node
df_list.append(pd.DataFrame(a).T.reset_index())
attrib_df = pd.concat(df_list)
attrib_df.fillna(0,inplace=True)
attrib_df1=attrib_df.merge(dfsp500,left_on='index',right_on='Symbol')
attrib_df = attrib_df1[['index','Sector_x','betweenness','closeness','community','degree','degreecent','start','Name']]
attrib_df.columns=['ticker','Sector','Betweeness','Closeness','Community','Degree','DegreeCent','start','Name']
# Collect the graph level attributes for all the windows
from scipy.stats import linregress
G_val_dict = {}
for Y in G_dict.keys():
G_val = {}
G= G_dict[Y]
G_val['nodes'] = int(nx.number_of_nodes(G))
G_val['edges'] = int(nx.number_of_edges(G))
#G_val['AvgDegree'] = nx.average_degree(G)
G_val['AvgClustering'] = nx.average_clustering(G)
try:
G_val['AvgShortestPathLength'] = nx.average_shortest_path_length(G)
except:
G_val['AvgShortestPathLength'] = 99999
try:
G_val['Diameter'] = nx.diameter(G)
except:
G_val['Diameter'] = 99999
degs = {}
for n in G.nodes() :
deg = G.degree(n)
if deg not in degs.keys() :
degs[deg] = 0
degs[deg] += 1
items = sorted(degs.items())
x= [k for (k , v ) in items ]
y= [ v for (k ,v ) in items ]
xlog= np.array([np.log(k) for (k , v ) in items ])
ylog= np.array([np.log(v) for (k ,v ) in items ])
slope,intercept,rvalue,pvalue,stderr=linregress(xlog,ylog)
G_val['Slope'] = slope
G_val['No of Communities'] = attrib_df.groupby(by=['start'])['Community'].nunique().ix[Y]
G_val_dict[Y] = G_val
Gvaldf=pd.DataFrame(G_val_dict).T
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:32: DeprecationWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing
See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate_ix
The below table lists the network properties for the networks created using the Winner Take All Method Method over multiple Time Periods.
pd.options.display.max_rows = 999
Gvaldf
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
AvgClustering | AvgShortestPathLength | Diameter | No of Communities | Slope | edges | nodes | |
---|---|---|---|---|---|---|---|
1 | 0.437301 | 99999.000000 | 99999.0 | 17.0 | -0.734717 | 462.0 | 139.0 |
64 | 0.583921 | 99999.000000 | 99999.0 | 19.0 | -0.555489 | 897.0 | 178.0 |
127 | 0.538656 | 99999.000000 | 99999.0 | 16.0 | -1.081694 | 307.0 | 112.0 |
190 | 0.464502 | 99999.000000 | 99999.0 | 17.0 | -0.917059 | 222.0 | 84.0 |
253 | 0.343108 | 99999.000000 | 99999.0 | 21.0 | -1.244365 | 379.0 | 149.0 |
316 | 0.337280 | 99999.000000 | 99999.0 | 20.0 | -1.183432 | 253.0 | 121.0 |
379 | 0.472535 | 99999.000000 | 99999.0 | 22.0 | -0.892980 | 781.0 | 171.0 |
442 | 0.497699 | 99999.000000 | 99999.0 | 11.0 | -0.829045 | 3798.0 | 344.0 |
505 | 0.439454 | 99999.000000 | 99999.0 | 18.0 | -1.018640 | 696.0 | 192.0 |
568 | 0.429914 | 99999.000000 | 99999.0 | 20.0 | -0.967200 | 1108.0 | 215.0 |
631 | 0.397613 | 99999.000000 | 99999.0 | 18.0 | -1.110171 | 303.0 | 120.0 |
694 | 0.489646 | 99999.000000 | 99999.0 | 20.0 | -1.072235 | 266.0 | 110.0 |
757 | 0.351498 | 99999.000000 | 99999.0 | 14.0 | -1.209373 | 142.0 | 83.0 |
820 | 0.378896 | 99999.000000 | 99999.0 | 22.0 | -1.131378 | 470.0 | 173.0 |
883 | 0.618682 | 99999.000000 | 99999.0 | 14.0 | -0.514376 | 6146.0 | 309.0 |
946 | 0.340429 | 99999.000000 | 99999.0 | 18.0 | -1.257725 | 219.0 | 106.0 |
1009 | 0.111712 | 99999.000000 | 99999.0 | 11.0 | -1.735520 | 33.0 | 37.0 |
1072 | 0.128993 | 99999.000000 | 99999.0 | 22.0 | -1.509773 | 59.0 | 64.0 |
1135 | 0.371843 | 99999.000000 | 99999.0 | 22.0 | -1.279903 | 381.0 | 173.0 |
1198 | 0.732433 | 1.884846 | 5.0 | 4.0 | -0.251248 | 22994.0 | 409.0 |
1261 | 0.586980 | 99999.000000 | 99999.0 | 13.0 | -0.751500 | 2997.0 | 271.0 |
1324 | 0.151453 | 99999.000000 | 99999.0 | 17.0 | -1.969121 | 53.0 | 59.0 |
1387 | 0.374351 | 99999.000000 | 99999.0 | 20.0 | -1.112039 | 464.0 | 172.0 |
1450 | 0.123333 | 99999.000000 | 99999.0 | 15.0 | -2.144086 | 39.0 | 50.0 |
1513 | 0.235170 | 99999.000000 | 99999.0 | 22.0 | -1.584047 | 70.0 | 70.0 |
1576 | 0.050292 | 99999.000000 | 99999.0 | 24.0 | -2.596231 | 35.0 | 57.0 |
1639 | 0.369880 | 99999.000000 | 99999.0 | 26.0 | -1.521337 | 301.0 | 165.0 |
1702 | 0.422181 | 99999.000000 | 99999.0 | 16.0 | -1.106065 | 129.0 | 71.0 |
1765 | 0.203037 | 99999.000000 | 99999.0 | 20.0 | -2.030411 | 60.0 | 69.0 |
1828 | 0.260765 | 99999.000000 | 99999.0 | 20.0 | -1.401260 | 93.0 | 72.0 |
1891 | 0.304440 | 99999.000000 | 99999.0 | 22.0 | -1.711807 | 95.0 | 79.0 |
1954 | 0.448140 | 99999.000000 | 99999.0 | 21.0 | -0.996135 | 234.0 | 100.0 |
2017 | 0.344473 | 99999.000000 | 99999.0 | 36.0 | -1.140130 | 514.0 | 206.0 |
2080 | 0.515361 | 99999.000000 | 99999.0 | 32.0 | -0.840034 | 706.0 | 187.0 |
2143 | 0.323645 | 99999.000000 | 99999.0 | 27.0 | -0.978314 | 244.0 | 116.0 |
2206 | 0.537092 | 99999.000000 | 99999.0 | 26.0 | -0.815566 | 3269.0 | 342.0 |
2269 | 0.443097 | 99999.000000 | 99999.0 | 37.0 | -1.063193 | 521.0 | 178.0 |
2332 | 0.301949 | 99999.000000 | 99999.0 | 43.0 | -1.113032 | 557.0 | 223.0 |
2395 | 0.469023 | 99999.000000 | 99999.0 | 31.0 | -0.826418 | 472.0 | 136.0 |
2458 | 0.454574 | 99999.000000 | 99999.0 | 32.0 | -0.914775 | 1348.0 | 246.0 |
2521 | 0.446138 | 99999.000000 | 99999.0 | 37.0 | -1.093756 | 1307.0 | 313.0 |
2584 | 0.470094 | 99999.000000 | 99999.0 | 23.0 | -0.797446 | 306.0 | 111.0 |
2647 | 0.439164 | 99999.000000 | 99999.0 | 28.0 | -0.794508 | 328.0 | 109.0 |
2710 | 0.352363 | 99999.000000 | 99999.0 | 40.0 | -1.442425 | 297.0 | 190.0 |
One of the negatives of the Winner Take all method is that in certain period the threshold gave us highly noisy data .There were two many edges in the graph and results during this period are difficult to visuvalise .In this section we built the same network based on the Minimum Spanning tree method as noted in Network of Equities in Financial Markets .
In case of the minimum spanning tree method a metric distance dij is calculated using the cross correlation matrix.
dij = (2(1-Cij))^(0.5)
Where dij is the edge distance between stock i and stock j.
To find the ideal window size for constructing the network we computed the mean distance metric for mutiple windows ranging from 21 to 105 .The plot of the results is below.The Results largely indicate similar pattern to mean correlation we saw in the Winner take all method.Here to we can see that the window size of 63 resonable captures the fluctuations in the market.So we used the window width of 63 to compute the correlation and the corresponding distance metric and built the networks using the Minimum Spanning Tree Method.
import math
import matplotlib.pyplot as plt
%matplotlib inline
def calc_d(x):
x = round(x,3)
d = math.sqrt(2 * (1 - x))
return d
corr_dict = {}
corr_dist_dict = {}
T = 1
W = 63
x = []
y = []
for w in range(21,126,21):
x = []
y = []
W = w
for i in range(t,len(df_price),W):
dkey = i
corr_dict[dkey]=df_price.iloc[i:(i+W)].corr(method='pearson')
corr_dict[dkey].fillna(1,inplace=True)
corr_dist_dict[dkey] = corr_dict[dkey].applymap(calc_d)
x.append(dkey)
y.append(np.mean([abs(j) for j in corr_dist_dict[dkey].values.flatten().tolist()]))
plt.plot(x,y)
plt.xlabel('Days')
plt.ylabel('Mean Distance')
plt.legend(list(range(21,126,21)), loc='center left', bbox_to_anchor=(1, 0.5),ncol=1)
W = 63
corr_dict = {}
corr_dist_dict = {}
for i in range(t,len(df_price),W):
dkey = i
corr_dict[dkey]=df_price.iloc[i:(i+W)].corr(method='pearson')
corr_dict[dkey].fillna(1,inplace=True)
corr_dist_dict[dkey] = corr_dict[dkey].applymap(calc_d)
#MST Start
elistmst_dict={}
for y in corr_dist_dict.keys():
df_price_corr = corr_dist_dict[y]
elistmst = []
outdict=df_price_corr.to_dict()
for i in outdict.keys():
for j in outdict[i].keys():
if (abs(outdict[i][j]) > 0 and (i>j)):
elistmst.append([i,j,dict(weight=abs(outdict[i][j]),start=y,end=y+W-1)])
elistmst_dict[y] = elistmst
import networkx as nx
import community
GMST_dict = {}
for y in elistmst_dict.keys():
G=nx.Graph()
elist = elistmst_dict[y]
G.add_edges_from(elist)
T = nx.minimum_spanning_tree(G)
G = T
values = dfsp500.set_index('Symbol').to_dict(orient='dict')['Sector']
for node, value in values.items():
try:
G.node[node]['Sector'] = value
except:
#name = (value[0:3] + '-' + node)
#G.add_node(name,Sector=value)
None
partition = community.best_partition(G)
deg_cent=dict((k,float(v)) for k,v in nx.degree_centrality(G).items())
degree = dict((k,float(v)) for k,v in nx.degree(G).items())
#katz_cent=nx.katz_centrality(G)
#eigen_cent= dict((k,float(v)) for k,v in nx.eigenvector_centrality(G).items())
close_cent= dict((k,float(v)) for k,v in nx.closeness_centrality(G).items())
betw_cent= dict((k,float(v)) for k,v in nx.betweenness_centrality(G).items())
nx.set_node_attributes(G, "community", partition)
nx.set_node_attributes(G, "degreecent", deg_cent)
nx.set_node_attributes(G, "degree", degree)
#nx.set_node_attributes(G, "katz", katz_cent)
#nx.set_node_attributes(G, "eigenvector", eigen_cent)
nx.set_node_attributes(G, "closeness", close_cent)
nx.set_node_attributes(G, "betweenness", betw_cent)
nx.set_node_attributes(G, 'start',y)
nx.set_node_attributes(G, 'end',y+W)
T = G
GMST_dict[y] = T
df_list = []
for k in GMST_dict.keys():
G = GMST_dict[k]
a = G.node
df_list.append(pd.DataFrame(a).T.reset_index())
attribMST_df = pd.concat(df_list)
attribMST_df.fillna(0,inplace=True)
attribMST_df1=attribMST_df.merge(dfsp500,left_on='index',right_on='Symbol')
attribMST_df = attribMST_df1[['index','Sector_x','betweenness','closeness','community','degree','degreecent','start','Name']]
attribMST_df.columns=['ticker','Sector','Betweeness','Closeness','Community','Degree','DegreeCent','start','Name']
from scipy.stats import linregress
G_valMST_dict = {}
for Y in GMST_dict.keys():
G_val = {}
G= GMST_dict[Y]
G_val['nodes'] = nx.number_of_nodes(G)
G_val['edges'] = nx.number_of_edges(G)
#G_val['AvgDegree'] = nx.average_degree(G)
G_val['AvgClustering'] = nx.average_clustering(G)
try:
G_val['AvgShortestPathLength'] = nx.average_shortest_path_length(G)
except:
G_val['AvgShortestPathLength'] = 99999
try:
G_val['Diameter'] = nx.diameter(G)
except:
G_val['Diameter'] = 99999
degs = {}
for n in G.nodes() :
deg = G.degree(n)
if deg not in degs.keys() :
degs[deg] = 0
degs[deg] += 1
items = sorted(degs.items())
x= [k for (k , v ) in items ]
y= [ v for (k ,v ) in items ]
xlog= np.array([np.log(k) for (k , v ) in items ])
ylog= np.array([np.log(v) for (k ,v ) in items ])
slope,intercept,rvalue,pvalue,stderr=linregress(xlog,ylog)
G_val['Slope'] = slope
G_val['No of Communities'] = attribMST_df.groupby(by=['start'])['Community'].nunique().ix[Y]
G_valMST_dict[Y] = G_val
GMST_df=pd.DataFrame(G_valMST_dict).T
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:31: DeprecationWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing
See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate_ix
The below table lists the network properties for the networks created using the Minimum Spanning Tree Method over multiple Time Periods.
GMST_df
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
AvgClustering | AvgShortestPathLength | Diameter | No of Communities | Slope | edges | nodes | |
---|---|---|---|---|---|---|---|
1 | 0.0 | 10.006982 | 25.0 | 23.0 | -2.302370 | 429.0 | 430.0 |
64 | 0.0 | 11.656817 | 30.0 | 24.0 | -2.511209 | 431.0 | 432.0 |
127 | 0.0 | 11.685365 | 25.0 | 24.0 | -2.414082 | 434.0 | 435.0 |
190 | 0.0 | 12.705015 | 33.0 | 23.0 | -2.094925 | 436.0 | 437.0 |
253 | 0.0 | 13.100791 | 30.0 | 24.0 | -2.414212 | 436.0 | 437.0 |
316 | 0.0 | 12.929396 | 34.0 | 25.0 | -2.226890 | 440.0 | 441.0 |
379 | 0.0 | 11.736004 | 37.0 | 23.0 | -2.397362 | 442.0 | 443.0 |
442 | 0.0 | 9.870432 | 24.0 | 23.0 | -2.176792 | 444.0 | 445.0 |
505 | 0.0 | 11.700028 | 29.0 | 26.0 | -2.423807 | 448.0 | 449.0 |
568 | 0.0 | 11.991151 | 28.0 | 25.0 | -2.337993 | 448.0 | 449.0 |
631 | 0.0 | 12.206835 | 32.0 | 25.0 | -2.094288 | 447.0 | 448.0 |
694 | 0.0 | 11.966274 | 28.0 | 23.0 | -2.401395 | 448.0 | 449.0 |
757 | 0.0 | 10.225780 | 26.0 | 25.0 | -2.400149 | 448.0 | 449.0 |
820 | 0.0 | 10.720473 | 25.0 | 22.0 | -2.286885 | 450.0 | 451.0 |
883 | 0.0 | 9.268146 | 24.0 | 25.0 | -2.270480 | 451.0 | 452.0 |
946 | 0.0 | 7.232595 | 17.0 | 24.0 | -2.094576 | 453.0 | 454.0 |
1009 | 0.0 | 11.460324 | 27.0 | 23.0 | -2.404244 | 455.0 | 456.0 |
1072 | 0.0 | 9.898856 | 27.0 | 23.0 | -2.202451 | 457.0 | 458.0 |
1135 | 0.0 | 11.048054 | 26.0 | 24.0 | -2.306812 | 458.0 | 459.0 |
1198 | 0.0 | 10.048669 | 24.0 | 25.0 | -2.201338 | 459.0 | 460.0 |
1261 | 0.0 | 9.375795 | 21.0 | 26.0 | -2.121953 | 468.0 | 469.0 |
1324 | 0.0 | 10.258025 | 25.0 | 22.0 | -2.509338 | 466.0 | 467.0 |
1387 | 0.0 | 10.286589 | 26.0 | 26.0 | -2.438308 | 468.0 | 469.0 |
1450 | 0.0 | 9.377163 | 25.0 | 22.0 | -2.249755 | 469.0 | 470.0 |
1513 | 0.0 | 11.060595 | 28.0 | 24.0 | -2.509590 | 470.0 | 471.0 |
1576 | 0.0 | 9.406258 | 27.0 | 23.0 | -2.332476 | 471.0 | 472.0 |
1639 | 0.0 | 9.620221 | 21.0 | 23.0 | -2.281254 | 475.0 | 476.0 |
1702 | 0.0 | 10.191933 | 27.0 | 27.0 | -2.419526 | 475.0 | 476.0 |
1765 | 0.0 | 9.666815 | 26.0 | 25.0 | -2.572109 | 479.0 | 480.0 |
1828 | 0.0 | 10.022886 | 25.0 | 23.0 | -2.278903 | 479.0 | 480.0 |
1891 | 0.0 | 13.378051 | 41.0 | 27.0 | -2.429328 | 481.0 | 482.0 |
1954 | 0.0 | 12.746888 | 38.0 | 26.0 | -2.497661 | 482.0 | 483.0 |
2017 | 0.0 | 12.801533 | 33.0 | 25.0 | -2.468109 | 483.0 | 484.0 |
2080 | 0.0 | 11.445382 | 30.0 | 26.0 | -2.523927 | 486.0 | 487.0 |
2143 | 0.0 | 11.845514 | 36.0 | 25.0 | -2.380275 | 485.0 | 486.0 |
2206 | 0.0 | 11.853195 | 31.0 | 26.0 | -2.531080 | 491.0 | 492.0 |
2269 | 0.0 | 14.269999 | 38.0 | 25.0 | -2.344246 | 493.0 | 494.0 |
2332 | 0.0 | 11.389670 | 27.0 | 24.0 | -2.401746 | 494.0 | 495.0 |
2395 | 0.0 | 15.049847 | 33.0 | 25.0 | -2.560258 | 496.0 | 497.0 |
2458 | 0.0 | 12.346634 | 32.0 | 26.0 | -2.450851 | 497.0 | 498.0 |
2521 | 0.0 | 16.771448 | 47.0 | 25.0 | -2.231318 | 497.0 | 498.0 |
2584 | 0.0 | 15.656228 | 36.0 | 25.0 | -2.619341 | 498.0 | 499.0 |
2647 | 0.0 | 14.443294 | 41.0 | 25.0 | -2.464404 | 497.0 | 498.0 |
2710 | 0.0 | 14.880367 | 34.0 | 26.0 | -2.511511 | 497.0 | 498.0 |
We plotted the degree distribution histogram and also plotted the degree distribution on a log log plotted and regression fitted a line whose slope will give as the Power law exponent.The plot show that network shows scale free properties in most of the windows .The scale free nature is more evident in the Networks generated based on the Minimum Spanning Tree Method .
## Explore graph properties
import matplotlib.pyplot as plt
%matplotlib inline
from scipy.stats import linregress
fig, ax = plt.subplots(nrows=10,ncols=4,figsize=(20, 30))
Y = 1
for row in ax:
for col in row:
degs = {}
for n in G_dict[Y].nodes() :
deg = G_dict[Y].degree(n)
if deg not in degs.keys() :
degs[deg] = 0
degs[deg] += 1
items = sorted(degs.items())
x= [k for (k , v ) in items ]
y= [ v for (k ,v ) in items ]
xlog= np.array([np.log(k) for (k , v ) in items ])
ylog= np.array([np.log(v) for (k ,v ) in items ])
col.scatter(xlog, ylog)
slope,intercept,rvalue,pvalue,stderr=linregress(xlog,ylog)
col.plot(xlog, (slope * xlog + intercept), color='red')
#ax.set_xscale( 'log' )
#ax.set_yscale( 'log' )
col.set_title ( " Day :" + str(Y) + " - Slope :" + str(round(slope,2) ),fontsize=8)
Y = Y + W
plt.tight_layout()
plt.show()
## Explore graph properties
import matplotlib.pyplot as plt
fig, ax = plt.subplots(nrows=10,ncols=4,figsize=(20, 30))
y = 1
for row in ax:
for col in row:
deg_dist = [v for k,v in nx.degree(G_dict[y]).items()]
deg_dist.sort(reverse=True)
pdf, bins, patch = col.hist(deg_dist, bins=10)
col.set_title ( " Day :" + str(y),fontsize=8 )
y = y + W
plt.tight_layout()
plt.show()
## Explore graph properties
import matplotlib.pyplot as plt
%matplotlib inline
from scipy.stats import linregress
fig, ax = plt.subplots(nrows=10,ncols=4,figsize=(20, 30))
Y = 1
for row in ax:
for col in row:
degs = {}
for n in GMST_dict[Y].nodes() :
deg = GMST_dict[Y].degree(n)
if deg not in degs.keys() :
degs[deg] = 0
degs[deg] += 1
items = sorted(degs.items())
x= [k for (k , v ) in items ]
y= [ v for (k ,v ) in items ]
xlog= np.array([np.log(k) for (k , v ) in items ])
ylog= np.array([np.log(v) for (k ,v ) in items ])
col.scatter(xlog, ylog)
slope,intercept,rvalue,pvalue,stderr=linregress(xlog,ylog)
col.plot(xlog, (slope * xlog + intercept), color='red')
#ax.set_xscale( 'log' )
#ax.set_yscale( 'log' )
col.set_title ( " Day :" + str(Y) + " - Slope :" + str(round(slope,2) ),fontsize=8)
Y = Y + W
plt.tight_layout()
plt.show()
## Explore graph properties
import matplotlib.pyplot as plt
%matplotlib inline
fig, ax = plt.subplots(nrows=10,ncols=4,figsize=(20, 30))
y = 1
for row in ax:
for col in row:
deg_dist = [v for k,v in nx.degree(GMST_dict[y]).items()]
deg_dist.sort(reverse=True)
pdf, bins, patch = col.hist(deg_dist, bins=10)
col.set_title ( " Degree Dist Day :" + str(y),fontsize=8 )
y = y + W
plt.tight_layout()
plt.show()
We looked at the change in the average degree degree over time .For the networks from the Winner Take All method we can see that this varies widely overtime indicating the dynamic nature of the stock market.The peaks in the graph correspond well to major events in the market such as the 2008 - 2009 subprime crisis.However for the networks based on the MST method the Average degree is constant this is one of the key drawbacks of the MST method were major fluctuations in the market are not well represented in the network.We can also see in the plot that during normal time periods the Average Degree between the networks from both the methods is the same.
import matplotlib.pyplot as plt
%matplotlib inline
avgdf=attrib_df.groupby(by=['start'])['Degree'].mean()
avgdf.plot()
avgdf=attribMST_df.groupby(by=['start'])['Degree'].mean()
avgdf.plot()
plt.xlabel('Days')
plt.ylabel('Average Degree')
plt.legend(['Winner Take All','MST'], loc='center left', bbox_to_anchor=(1, 0.5),ncol=1)
plt.show()
We looked at the high degree stocks in the network at different windows to get out the important stocks which have high influence or which are a good indicator of how the stock market as whole is moving.The results are below.As we can see the stocks from financials sector have the highest degree in a number of windows .Is there a pattern here?
attrib_df.sort_values(['start','Degree'],ascending=False).groupby(['start']).head(3)[['start','ticker','Name','Degree','Sector']]
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
start | ticker | Name | Degree | Sector | |
---|---|---|---|---|---|
995 | 2710 | DVN | Devon Energy Corp. | 14.0 | Energy |
2774 | 2710 | RIG | Transocean | 14.0 | Energy |
2026 | 2710 | MRO | Marathon Oil Corp. | 12.0 | Energy |
2576 | 2647 | PRU | Prudential Financial | 24.0 | Financials |
3202 | 2647 | UNM | Unum Group | 24.0 | Financials |
2516 | 2647 | PNC | PNC Financial Services | 23.0 | Financials |
3572 | 2584 | AEE | Ameren Corp | 18.0 | Utilities |
4092 | 2584 | LNT | Alliant Energy Corp | 18.0 | Utilities |
4257 | 2584 | PNW | Pinnacle West Capital | 18.0 | Utilities |
2058 | 2521 | MS | Morgan Stanley | 39.0 | Financials |
4819 | 2521 | DFS | Discover Financial Services | 38.0 | Financials |
2271 | 2521 | O | Realty Income Corporation | 36.0 | Real Estate |
172 | 2458 | AMP | Ameriprise Financial | 60.0 | Financials |
4514 | 2458 | ETN | Eaton Corporation | 56.0 | Industrials |
5307 | 2458 | MCO | Moody's Corp | 56.0 | Financials |
2572 | 2395 | PRU | Prudential Financial | 29.0 | Financials |
546 | 2395 | C | Citigroup Inc. | 28.0 | Financials |
2056 | 2395 | MS | Morgan Stanley | 28.0 | Financials |
2055 | 2332 | MS | Morgan Stanley | 35.0 | Financials |
2511 | 2332 | PNC | PNC Financial Services | 33.0 | Financials |
475 | 2332 | BK | The Bank of New York Mellon Corp. | 32.0 | Financials |
413 | 2269 | BBT | BB&T Corporation | 27.0 | Financials |
2510 | 2269 | PNC | PNC Financial Services | 26.0 | Financials |
2570 | 2269 | PRU | Prudential Financial | 26.0 | Financials |
5261 | 2206 | HON | Honeywell Int'l Inc. | 105.0 | Industrials |
3391 | 2206 | WFC | Wells Fargo | 96.0 | Financials |
6474 | 2206 | BRK.B | Berkshire Hathaway | 96.0 | Financials |
509 | 2143 | BXP | Boston Properties | 18.0 | Real Estate |
2716 | 2143 | REG | Regency Centers Corporation | 17.0 | Real Estate |
1263 | 2143 | FRT | Federal Realty Investment Trust | 16.0 | Real Estate |
167 | 2080 | AMP | Ameriprise Financial | 27.0 | Financials |
3021 | 2080 | STI | SunTrust Banks | 25.0 | Financials |
3390 | 2080 | WFC | Wells Fargo | 25.0 | Financials |
166 | 2017 | AMP | Ameriprise Financial | 25.0 | Financials |
1989 | 2017 | MET | MetLife Inc. | 25.0 | Financials |
1503 | 2017 | HES | Hess Corporation | 21.0 | Energy |
845 | 1954 | D | Dominion Resources | 20.0 | Utilities |
3353 | 1954 | WEC | Wec Energy Group Inc | 18.0 | Utilities |
3468 | 1954 | XEL | Xcel Energy Inc | 18.0 | Utilities |
649 | 1891 | CMS | CMS Energy | 10.0 | Utilities |
3467 | 1891 | XEL | Xcel Energy Inc | 9.0 | Utilities |
3078 | 1891 | TMK | Torchmark Corp. | 8.0 | Financials |
163 | 1828 | AMP | Ameriprise Financial | 11.0 | Financials |
2563 | 1828 | PRU | Prudential Financial | 8.0 | Financials |
1862 | 1828 | LNC | Lincoln National | 7.0 | Financials |
3388 | 1765 | WFC | Wells Fargo | 7.0 | Financials |
5812 | 1765 | L | Loews Corp. | 6.0 | Financials |
162 | 1765 | AMP | Ameriprise Financial | 4.0 | Financials |
2342 | 1702 | PEG | Public Serv. Enterprise Inc. | 12.0 | Utilities |
3153 | 1702 | UDR | UDR Inc | 12.0 | Real Estate |
3349 | 1702 | WEC | Wec Energy Group Inc | 12.0 | Utilities |
2371 | 1639 | PFG | Principal Financial Group | 21.0 | Financials |
3348 | 1639 | WEC | Wec Energy Group Inc | 17.0 | Utilities |
4077 | 1639 | LNT | Alliant Energy Corp | 16.0 | Utilities |
1858 | 1576 | LNC | Lincoln National | 5.0 | Financials |
2560 | 1576 | PRU | Prudential Financial | 3.0 | Financials |
130 | 1576 | AMG | Affiliated Managers Group Inc | 2.0 | Financials |
3462 | 1513 | XEL | Xcel Energy Inc | 8.0 | Utilities |
39 | 1513 | AEP | American Electric Power | 6.0 | Utilities |
943 | 1513 | DTE | DTE Energy Co. | 6.0 | Utilities |
2291 | 1450 | OXY | Occidental Petroleum | 5.0 | Energy |
3461 | 1450 | XEL | Xcel Energy Inc | 4.0 | Utilities |
5136 | 1450 | IVZ | Invesco Ltd. | 4.0 | Financials |
465 | 1387 | BK | The Bank of New York Mellon Corp. | 20.0 | Financials |
1257 | 1387 | FRT | Federal Realty Investment Trust | 19.0 | Real Estate |
1855 | 1387 | LNC | Lincoln National | 19.0 | Financials |
2230 | 1324 | NTRS | Northern Trust Corp. | 8.0 | Financials |
532 | 1324 | C | Citigroup Inc. | 6.0 | Financials |
3385 | 1324 | WFC | Wells Fargo | 5.0 | Financials |
5808 | 1261 | L | Loews Corp. | 119.0 | Financials |
5255 | 1261 | HON | Honeywell Int'l Inc. | 99.0 | Industrials |
127 | 1261 | AMG | Affiliated Managers Group Inc | 98.0 | Financials |
2905 | 1198 | SNA | Snap-On Inc. | 294.0 | Consumer Discretionary |
5197 | 1198 | CTAS | Cintas Corporation | 289.0 | Industrials |
5624 | 1198 | EFX | Equifax Inc. | 280.0 | Industrials |
1852 | 1135 | LNC | Lincoln National | 18.0 | Financials |
205 | 1135 | APA | Apache Corporation | 17.0 | Energy |
2961 | 1135 | SPG | Simon Property Group Inc | 16.0 | Real Estate |
3274 | 1072 | VNO | Vornado Realty Trust | 9.0 | Real Estate |
98 | 1072 | AIV | Apartment Investment & Mgmt | 6.0 | Real Estate |
498 | 1072 | BXP | Boston Properties | 6.0 | Real Estate |
3181 | 1009 | UNM | Unum Group | 5.0 | Financials |
280 | 1009 | AVB | AvalonBay Communities, Inc. | 4.0 | Real Estate |
1099 | 1009 | EQR | Equity Residential | 4.0 | Real Estate |
4208 | 946 | PCAR | PACCAR Inc. | 22.0 | Industrials |
1913 | 946 | MAA | Mid-America Apartments | 17.0 | Real Estate |
2902 | 946 | SNA | Snap-On Inc. | 16.0 | Consumer Discretionary |
5251 | 883 | HON | Honeywell Int'l Inc. | 169.0 | Industrials |
5194 | 883 | CTAS | Cintas Corporation | 152.0 | Industrials |
730 | 883 | COL | Rockwell Collins | 149.0 | Industrials |
3270 | 820 | VNO | Vornado Realty Trust | 22.0 | Real Estate |
2590 | 820 | PSA | Public Storage | 21.0 | Real Estate |
1096 | 820 | EQR | Equity Residential | 20.0 | Real Estate |
276 | 757 | AVB | AvalonBay Communities, Inc. | 12.0 | Real Estate |
493 | 757 | BXP | Boston Properties | 12.0 | Real Estate |
3269 | 757 | VNO | Vornado Realty Trust | 11.0 | Real Estate |
2466 | 694 | PLD | Prologis | 20.0 | Real Estate |
492 | 694 | BXP | Boston Properties | 17.0 | Real Estate |
1094 | 694 | EQR | Equity Residential | 17.0 | Real Estate |
2170 | 631 | NOV | National Oilwell Varco Inc. | 20.0 | Energy |
2283 | 631 | OXY | Occidental Petroleum | 20.0 | Energy |
2640 | 631 | PXD | Pioneer Natural Resources | 19.0 | Energy |
1628 | 568 | JPM | JPMorgan Chase & Co. | 50.0 | Financials |
2493 | 568 | PNC | PNC Financial Services | 44.0 | Financials |
3098 | 568 | TROW | T. Rowe Price Group | 42.0 | Financials |
489 | 505 | BXP | Boston Properties | 29.0 | Real Estate |
2463 | 505 | PLD | Prologis | 28.0 | Real Estate |
3848 | 505 | EMN | Eastman Chemical | 28.0 | Materials |
806 | 442 | CVX | Chevron Corp. | 110.0 | Energy |
919 | 442 | DIS | The Walt Disney Company | 109.0 | Consumer Discretionary |
5366 | 442 | A | Agilent Technologies Inc | 104.0 | Health Care |
3095 | 379 | TROW | T. Rowe Price Group | 39.0 | Financials |
2949 | 379 | SPG | Simon Property Group Inc | 38.0 | Real Estate |
1243 | 379 | FRT | Federal Realty Investment Trust | 37.0 | Real Estate |
523 | 316 | C | Citigroup Inc. | 18.0 | Financials |
1088 | 316 | EQR | Equity Residential | 15.0 | Real Estate |
1697 | 316 | KIM | Kimco Realty | 15.0 | Real Estate |
1696 | 253 | KIM | Kimco Realty | 23.0 | Real Estate |
3261 | 253 | VNO | Vornado Realty Trust | 22.0 | Real Estate |
2691 | 253 | REG | Regency Centers Corporation | 21.0 | Real Estate |
1695 | 190 | KIM | Kimco Realty | 19.0 | Real Estate |
2581 | 190 | PSA | Public Storage | 19.0 | Real Estate |
267 | 190 | AVB | AvalonBay Communities, Inc. | 18.0 | Real Estate |
520 | 127 | C | Citigroup Inc. | 22.0 | Financials |
2728 | 127 | RF | Regions Financial Corp. | 16.0 | Financials |
2063 | 127 | MTB | M&T Bank Corp. | 14.0 | Financials |
997 | 64 | ED | Consolidated Edison | 24.0 | Utilities |
931 | 64 | DTE | DTE Energy Co. | 23.0 | Utilities |
2916 | 64 | SO | Southern Co. | 23.0 | Utilities |
142 | 1 | AMP | Ameriprise Financial | 21.0 | Financials |
387 | 1 | BBT | BB&T Corporation | 21.0 | Financials |
2687 | 1 | REG | Regency Centers Corporation | 20.0 | Real Estate |
attribMST_df.sort_values(['start','Degree'],ascending=False).groupby(['start']).head(3)[['start','ticker','Name','Degree','Sector']]
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
start | ticker | Name | Degree | Sector | |
---|---|---|---|---|---|
8578 | 2710 | HON | Honeywell Int'l Inc. | 9.0 | Industrials |
8314 | 2710 | HCP | HCP Inc. | 7.0 | Real Estate |
14900 | 2710 | RIG | Transocean | 7.0 | Energy |
3996 | 2647 | CMS | CMS Energy | 11.0 | Utilities |
1757 | 2647 | APH | Amphenol Corp | 8.0 | Information Technology |
14987 | 2647 | ROK | Rockwell Automation Inc. | 8.0 | Industrials |
5395 | 2584 | DOV | Dover Corp. | 9.0 | Industrials |
11918 | 2584 | MRO | Marathon Oil Corp. | 9.0 | Energy |
13622 | 2584 | PEP | PepsiCo Inc. | 9.0 | Consumer Staples |
2767 | 2521 | BRK.B | Berkshire Hathaway | 8.0 | Financials |
3774 | 2521 | CL | Colgate-Palmolive | 8.0 | Consumer Staples |
18505 | 2521 | XEL | Xcel Energy Inc | 8.0 | Utilities |
6361 | 2458 | ETN | Eaton Corporation | 12.0 | Industrials |
3949 | 2458 | CMI | Cummins Inc. | 10.0 | Industrials |
11388 | 2458 | MCO | Moody's Corp | 10.0 | Financials |
1313 | 2395 | AMG | Affiliated Managers Group Inc | 9.0 | Financials |
9585 | 2395 | ITW | Illinois Tool Works | 8.0 | Industrials |
3992 | 2395 | CMS | CMS Energy | 7.0 | Utilities |
37 | 2332 | A | Agilent Technologies Inc | 12.0 | Health Care |
11958 | 2332 | MS | Morgan Stanley | 9.0 | Financials |
14146 | 2332 | PPG | PPG Industries | 9.0 | Materials |
4342 | 2269 | COP | ConocoPhillips | 9.0 | Energy |
10111 | 2269 | KMB | Kimberly-Clark | 9.0 | Consumer Staples |
8571 | 2269 | HON | Honeywell Int'l Inc. | 8.0 | Industrials |
8570 | 2206 | HON | Honeywell Int'l Inc. | 12.0 | Industrials |
11384 | 2206 | MCO | Moody's Corp | 11.0 | Financials |
9846 | 2206 | JPM | JPMorgan Chase & Co. | 10.0 | Financials |
430 | 2143 | ADP | Automatic Data Processing | 14.0 | Information Technology |
17399 | 2143 | UTX | United Technologies | 12.0 | Industrials |
11647 | 2143 | MMC | Marsh & McLennan | 11.0 | Financials |
1396 | 2080 | AMP | Ameriprise Financial | 14.0 | Financials |
9580 | 2080 | ITW | Illinois Tool Works | 11.0 | Industrials |
11690 | 2080 | MMM | 3M Company | 10.0 | Industrials |
1395 | 2017 | AMP | Ameriprise Financial | 14.0 | Financials |
6926 | 2017 | FIS | Fidelity National Information Services | 12.0 | Information Technology |
9579 | 2017 | ITW | Illinois Tool Works | 9.0 | Industrials |
11688 | 1954 | MMM | 3M Company | 10.0 | Industrials |
1394 | 1954 | AMP | Ameriprise Financial | 9.0 | Financials |
2626 | 1954 | BLK | BlackRock | 8.0 | Financials |
8565 | 1891 | HON | Honeywell Int'l Inc. | 14.0 | Industrials |
13831 | 1891 | PH | Parker-Hannifin | 9.0 | Industrials |
16471 | 1891 | TMK | Torchmark Corp. | 9.0 | Financials |
16470 | 1828 | TMK | Torchmark Corp. | 18.0 | Financials |
1260 | 1828 | AME | AMETEK Inc | 13.0 | Industrials |
1392 | 1828 | AMP | Ameriprise Financial | 9.0 | Financials |
13697 | 1765 | PFG | Principal Financial Group | 12.0 | Financials |
18009 | 1765 | WFC | Wells Fargo | 11.0 | Financials |
1391 | 1765 | AMP | Ameriprise Financial | 10.0 | Financials |
1390 | 1702 | AMP | Ameriprise Financial | 13.0 | Financials |
8562 | 1702 | HON | Honeywell Int'l Inc. | 12.0 | Industrials |
13564 | 1702 | PEG | Public Serv. Enterprise Inc. | 9.0 | Utilities |
13695 | 1639 | PFG | Principal Financial Group | 18.0 | Financials |
17963 | 1639 | WEC | Wec Energy Group Inc | 11.0 | Utilities |
1653 | 1639 | APC | Anadarko Petroleum Corp | 10.0 | Energy |
1388 | 1576 | AMP | Ameriprise Financial | 17.0 | Financials |
16554 | 1576 | TROW | T. Rowe Price Group | 11.0 | Financials |
10670 | 1576 | LNC | Lincoln National | 10.0 | Financials |
18621 | 1513 | XOM | Exxon Mobil Corp. | 11.0 | Energy |
10669 | 1513 | LNC | Lincoln National | 10.0 | Financials |
1299 | 1513 | AMG | Affiliated Managers Group Inc | 9.0 | Financials |
18488 | 1450 | XEL | Xcel Energy Inc | 13.0 | Utilities |
18926 | 1450 | IVZ | Invesco Ltd. | 13.0 | Financials |
1386 | 1450 | AMP | Ameriprise Financial | 12.0 | Financials |
15011 | 1387 | ROP | Roper Industries | 14.0 | Industrials |
5948 | 1387 | EMN | Eastman Chemical | 12.0 | Materials |
10667 | 1387 | LNC | Lincoln National | 10.0 | Financials |
11678 | 1324 | MMM | 3M Company | 11.0 | Industrials |
12854 | 1324 | NTRS | Northern Trust Corp. | 10.0 | Financials |
19399 | 1324 | L | Loews Corp. | 10.0 | Financials |
8555 | 1261 | HON | Honeywell Int'l Inc. | 14.0 | Industrials |
19398 | 1261 | L | Loews Corp. | 14.0 | Financials |
416 | 1261 | ADP | Automatic Data Processing | 9.0 | Information Technology |
15536 | 1198 | SNA | Snap-On Inc. | 19.0 | Consumer Discretionary |
4589 | 1198 | CTAS | Cintas Corporation | 13.0 | Industrials |
14216 | 1198 | PRU | Prudential Financial | 11.0 | Financials |
2393 | 1135 | BEN | Franklin Resources | 11.0 | Financials |
1601 | 1135 | APA | Apache Corporation | 9.0 | Energy |
4544 | 1135 | CSX | CSX Corp. | 9.0 | Industrials |
18920 | 1072 | IVZ | Invesco Ltd. | 12.0 | Financials |
17382 | 1072 | UTX | United Technologies | 11.0 | Industrials |
18482 | 1072 | XEL | Xcel Energy Inc | 11.0 | Utilities |
412 | 1009 | ADP | Automatic Data Processing | 11.0 | Information Technology |
16545 | 1009 | TROW | T. Rowe Price Group | 9.0 | Financials |
17161 | 1009 | UNM | Unum Group | 9.0 | Financials |
13376 | 946 | PCAR | PACCAR Inc. | 21.0 | Industrials |
15532 | 946 | SNA | Snap-On Inc. | 13.0 | Consumer Discretionary |
16544 | 946 | TROW | T. Rowe Price Group | 13.0 | Financials |
8549 | 883 | HON | Honeywell Int'l Inc. | 19.0 | Industrials |
4232 | 883 | COL | Rockwell Collins | 11.0 | Industrials |
3704 | 883 | CINF | Cincinnati Financial | 9.0 | Financials |
2388 | 820 | BEN | Franklin Resources | 10.0 | Financials |
8548 | 820 | HON | Honeywell Int'l Inc. | 10.0 | Industrials |
19391 | 820 | L | Loews Corp. | 10.0 | Financials |
5850 | 757 | EIX | Edison Int'l | 12.0 | Utilities |
6950 | 757 | FISV | Fiserv Inc | 11.0 | Information Technology |
8547 | 757 | HON | Honeywell Int'l Inc. | 11.0 | Industrials |
11888 | 694 | MRO | Marathon Oil Corp. | 13.0 | Energy |
14120 | 694 | PPG | PPG Industries | 12.0 | Materials |
19389 | 694 | L | Loews Corp. | 10.0 | Financials |
6332 | 631 | ETN | Eaton Corporation | 8.0 | Industrials |
8545 | 631 | HON | Honeywell Int'l Inc. | 8.0 | Industrials |
11447 | 631 | MET | MetLife Inc. | 8.0 | Financials |
4491 | 568 | CSCO | Cisco Systems | 12.0 | Information Technology |
6331 | 568 | ETN | Eaton Corporation | 12.0 | Industrials |
15922 | 568 | SWK | Stanley Black & Decker | 11.0 | Consumer Discretionary |
5934 | 505 | EMN | Eastman Chemical | 16.0 | Materials |
5362 | 505 | DOV | Dover Corp. | 10.0 | Industrials |
19386 | 505 | L | Loews Corp. | 9.0 | Financials |
4885 | 442 | DD | Du Pont (E.I.) | 12.0 | Materials |
15920 | 442 | SWK | Stanley Black & Decker | 12.0 | Consumer Discretionary |
535 | 442 | AEE | Ameren Corp | 10.0 | Utilities |
12663 | 379 | NOV | National Oilwell Varco Inc. | 10.0 | Energy |
3305 | 379 | CCL | Carnival Corp. | 9.0 | Consumer Discretionary |
10651 | 379 | LNC | Lincoln National | 9.0 | Financials |
1280 | 316 | AMG | Affiliated Managers Group Inc | 9.0 | Financials |
6503 | 316 | EXPD | Expeditors Int'l | 9.0 | Industrials |
1676 | 316 | APD | Air Products & Chemicals Inc | 8.0 | Materials |
1631 | 253 | APC | Anadarko Petroleum Corp | 12.0 | Energy |
2379 | 253 | BEN | Franklin Resources | 9.0 | Financials |
11661 | 253 | MMM | 3M Company | 9.0 | Industrials |
2070 | 190 | BAC | Bank of America Corp | 12.0 | Financials |
16532 | 190 | TROW | T. Rowe Price Group | 12.0 | Financials |
575 | 190 | AEP | American Electric Power | 8.0 | Utilities |
15915 | 127 | SWK | Stanley Black & Decker | 12.0 | Consumer Discretionary |
10647 | 127 | LNC | Lincoln National | 9.0 | Financials |
2905 | 127 | C | Citigroup Inc. | 8.0 | Financials |
12834 | 64 | NTRS | Northern Trust Corp. | 12.0 | Financials |
15826 | 64 | STT | State Street Corp. | 10.0 | Financials |
15562 | 64 | SO | Southern Co. | 9.0 | Utilities |
13669 | 1 | PFG | Principal Financial Group | 13.0 | Financials |
2199 | 1 | BBT | BB&T Corporation | 12.0 | Financials |
18465 | 1 | XEL | Xcel Energy Inc | 12.0 | Utilities |
We plotted the count of the sector of the high degree stock in the windows and we can see that Finance stocks definetly are the center of the market network.This kind of makes sense since Finance stocks are structurally dependent on what happens in the other sectors and we can expect them to be the important stocks which are correlated to many of the other stocks in the market.
df1=attrib_df.sort_values(['start','Degree'],ascending=False).groupby(['start']).head(3).groupby(['Sector']).count()['ticker']
df2=attribMST_df.sort_values(['start','Degree'],ascending=False).groupby(['start']).head(3).groupby(['Sector']).count()['ticker']
df3=pd.concat([df1, df2], axis=1).fillna(0)
df3.columns = ['t1','t2']
import matplotlib.pyplot as plt
%matplotlib inline
fig = plt.figure(figsize=(10,10))
df3['t1'].plot(kind='bar',color='red', position=0, width=0.25)
df3['t2'].plot(kind='bar',color='blue', position=1, width=0.25)
plt.xticks(rotation=90,fontsize=8)
plt.ylabel('Count')
plt.legend(['Winner Take All','MST'], loc='center left', bbox_to_anchor=(1, 0.5),ncol=1)
plt.plot()
[]
We looked at the stocks with high betweeness Centrality in the network at different windows to get out the important stocks which because of the position in the network will be good predictors in the movement of prices of the stocks.The results are below.As we can see financials stocks still lead in most periods .
attrib_df.sort_values(['start','Betweeness'],ascending=False).groupby(['start']).head(3)[['start','ticker','Name','Betweeness','Sector']]
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
start | ticker | Name | Betweeness | Sector | |
---|---|---|---|---|---|
6727 | 2710 | FOXA | Twenty-First Century Fox Class A | 0.007932 | Consumer Discretionary |
5244 | 2710 | HOLX | Hologic | 0.006344 | Health Care |
6673 | 2710 | MDLZ | Mondelez International | 0.004696 | Consumer Staples |
3202 | 2647 | UNM | Unum Group | 0.011359 | Financials |
2576 | 2647 | PRU | Prudential Financial | 0.007036 | Financials |
1653 | 2647 | JPM | JPMorgan Chase & Co. | 0.005982 | Financials |
1874 | 2584 | LNC | Lincoln National | 0.006935 | Financials |
1363 | 2584 | GS | Goldman Sachs Group | 0.006339 | Financials |
3547 | 2584 | ZION | Zions Bancorp | 0.005442 | Financials |
6454 | 2521 | MMC | Marsh & McLennan | 0.260341 | Financials |
5493 | 2521 | BDX | Becton Dickinson | 0.246517 | Health Care |
5376 | 2521 | A | Agilent Technologies Inc | 0.240836 | Health Care |
5221 | 2458 | DOV | Dover Corp. | 0.098698 | Industrials |
4514 | 2458 | ETN | Eaton Corporation | 0.052089 | Industrials |
2182 | 2458 | NOV | National Oilwell Varco Inc. | 0.051739 | Energy |
2572 | 2395 | PRU | Prudential Financial | 0.016133 | Financials |
171 | 2395 | AMP | Ameriprise Financial | 0.008358 | Financials |
2056 | 2395 | MS | Morgan Stanley | 0.005031 | Financials |
3792 | 2332 | DHR | Danaher Corp. | 0.114964 | Health Care |
6731 | 2332 | XYL | Xylem Inc. | 0.111858 | Industrials |
2414 | 2332 | PH | Parker-Hannifin | 0.101198 | Industrials |
544 | 2269 | C | Citigroup Inc. | 0.015129 | Financials |
689 | 2269 | COF | Capital One Financial | 0.012840 | Financials |
4141 | 2269 | MSFT | Microsoft Corp. | 0.010529 | Information Technology |
5261 | 2206 | HON | Honeywell Int'l Inc. | 0.178959 | Industrials |
5790 | 2206 | KMB | Kimberly-Clark | 0.099971 | Consumer Staples |
3504 | 2206 | XOM | Exxon Mobil Corp. | 0.096947 | Energy |
4569 | 2143 | NOC | Northrop Grumman Corp. | 0.003051 | Industrials |
4615 | 2143 | UTX | United Technologies | 0.003051 | Industrials |
4085 | 2143 | LNT | Alliant Energy Corp | 0.002730 | Utilities |
1614 | 2080 | ITW | Illinois Tool Works | 0.024743 | Industrials |
442 | 2080 | BEN | Franklin Resources | 0.021179 | Financials |
2411 | 2080 | PH | Parker-Hannifin | 0.017146 | Industrials |
1047 | 2017 | EMR | Emerson Electric Company | 0.043283 | Industrials |
441 | 2017 | BEN | Franklin Resources | 0.043158 | Financials |
6016 | 2017 | ROP | Roper Industries | 0.036402 | Industrials |
845 | 1954 | D | Dominion Resources | 0.010613 | Utilities |
1146 | 1954 | ESS | Essex Property Trust, Inc. | 0.008675 | Real Estate |
2343 | 1954 | PEG | Public Serv. Enterprise Inc. | 0.006047 | Utilities |
5258 | 1891 | HON | Honeywell Int'l Inc. | 0.068598 | Industrials |
3078 | 1891 | TMK | Torchmark Corp. | 0.047453 | Financials |
2504 | 1891 | PNC | PNC Financial Services | 0.039960 | Financials |
3077 | 1828 | TMK | Torchmark Corp. | 0.077264 | Financials |
6446 | 1828 | MMC | Marsh & McLennan | 0.076123 | Financials |
163 | 1828 | AMP | Ameriprise Financial | 0.072502 | Financials |
3388 | 1765 | WFC | Wells Fargo | 0.031168 | Financials |
5812 | 1765 | L | Loews Corp. | 0.024583 | Financials |
3755 | 1765 | CINF | Cincinnati Financial | 0.018437 | Financials |
2342 | 1702 | PEG | Public Serv. Enterprise Inc. | 0.013637 | Utilities |
104 | 1702 | AIV | Apartment Investment & Mgmt | 0.013070 | Real Estate |
3349 | 1702 | WEC | Wec Energy Group Inc | 0.012457 | Utilities |
2371 | 1639 | PFG | Principal Financial Group | 0.111754 | Financials |
2790 | 1639 | ROK | Rockwell Automation Inc. | 0.061661 | Industrials |
2013 | 1639 | MRO | Marathon Oil Corp. | 0.043269 | Energy |
1858 | 1576 | LNC | Lincoln National | 0.010065 | Financials |
159 | 1576 | AMP | Ameriprise Financial | 0.006494 | Financials |
130 | 1576 | AMG | Affiliated Managers Group Inc | 0.003896 | Financials |
1857 | 1513 | LNC | Lincoln National | 0.052643 | Financials |
1348 | 1513 | GS | Goldman Sachs Group | 0.040921 | Financials |
3387 | 1513 | WFC | Wells Fargo | 0.020460 | Financials |
2291 | 1450 | OXY | Occidental Petroleum | 0.040816 | Energy |
158 | 1450 | AMP | Ameriprise Financial | 0.029762 | Financials |
5136 | 1450 | IVZ | Invesco Ltd. | 0.017857 | Financials |
3735 | 1387 | CBG | CBRE Group | 0.174827 | Real Estate |
6013 | 1387 | ROP | Roper Industries | 0.113051 | Industrials |
3106 | 1387 | TROW | T. Rowe Price Group | 0.111419 | Financials |
2230 | 1324 | NTRS | Northern Trust Corp. | 0.059286 | Financials |
532 | 1324 | C | Citigroup Inc. | 0.039322 | Financials |
3385 | 1324 | WFC | Wells Fargo | 0.035995 | Financials |
4237 | 1261 | PNW | Pinnacle West Capital | 0.119275 | Utilities |
3498 | 1261 | XOM | Exxon Mobil Corp. | 0.096802 | Energy |
5808 | 1261 | L | Loews Corp. | 0.051756 | Financials |
5358 | 1198 | WYN | Wyndham Worldwide | 0.025580 | Consumer Discretionary |
2905 | 1198 | SNA | Snap-On Inc. | 0.020653 | Consumer Discretionary |
755 | 1198 | COP | ConocoPhillips | 0.018428 | Energy |
4539 | 1135 | GE | General Electric | 0.092994 | Industrials |
5211 | 1135 | DOV | Dover Corp. | 0.051663 | Industrials |
1852 | 1135 | LNC | Lincoln National | 0.049190 | Financials |
3274 | 1072 | VNO | Vornado Realty Trust | 0.014088 | Real Estate |
498 | 1072 | BXP | Boston Properties | 0.009269 | Real Estate |
2960 | 1072 | SPG | Simon Property Group Inc | 0.007442 | Real Estate |
280 | 1009 | AVB | AvalonBay Communities, Inc. | 0.020106 | Real Estate |
2593 | 1009 | PSA | Public Storage | 0.019841 | Real Estate |
1137 | 1009 | ESS | Essex Property Trust, Inc. | 0.016667 | Real Estate |
1579 | 946 | HST | Host Hotels & Resorts | 0.204877 | Real Estate |
4208 | 946 | PCAR | PACCAR Inc. | 0.191795 | Industrials |
3101 | 946 | TROW | T. Rowe Price Group | 0.117985 | Financials |
5194 | 883 | CTAS | Cintas Corporation | 0.065674 | Industrials |
730 | 883 | COL | Rockwell Collins | 0.043076 | Industrials |
5251 | 883 | HON | Honeywell Int'l Inc. | 0.033122 | Industrials |
5803 | 820 | L | Loews Corp. | 0.147791 | Financials |
6323 | 820 | NWS | News Corp. Class B | 0.132988 | Consumer Discretionary |
2590 | 820 | PSA | Public Storage | 0.127557 | Real Estate |
2096 | 757 | MUR | Murphy Oil | 0.073472 | Energy |
2007 | 757 | MRO | Marathon Oil Corp. | 0.061427 | Energy |
2642 | 757 | PXD | Pioneer Natural Resources | 0.061126 | Energy |
4997 | 694 | PPG | PPG Industries | 0.256966 | Materials |
2362 | 694 | PFG | Principal Financial Group | 0.255081 | Financials |
864 | 694 | DD | Du Pont (E.I.) | 0.243555 | Materials |
3407 | 631 | WMB | Williams Cos. | 0.152520 | Energy |
5800 | 631 | L | Loews Corp. | 0.145706 | Financials |
1969 | 631 | MET | MetLife Inc. | 0.145504 | Financials |
5682 | 568 | FLS | Flowserve Corporation | 0.138415 | Industrials |
4205 | 568 | PCAR | PACCAR Inc. | 0.136017 | Industrials |
4119 | 568 | MMM | 3M Company | 0.102029 | Industrials |
3848 | 505 | EMN | Eastman Chemical | 0.228678 | Materials |
5798 | 505 | L | Loews Corp. | 0.121144 | Financials |
3097 | 505 | TROW | T. Rowe Price Group | 0.120766 | Financials |
5124 | 442 | IVZ | Invesco Ltd. | 0.064671 | Financials |
4632 | 442 | CSCO | Cisco Systems | 0.055969 | Information Technology |
919 | 442 | DIS | The Walt Disney Company | 0.055591 | Consumer Discretionary |
1966 | 379 | MET | MetLife Inc. | 0.063732 | Financials |
4535 | 379 | GE | General Electric | 0.047302 | Industrials |
1840 | 379 | LNC | Lincoln National | 0.045903 | Financials |
523 | 316 | C | Citigroup Inc. | 0.140048 | Financials |
1624 | 316 | JPM | JPMorgan Chase & Co. | 0.051430 | Financials |
1839 | 316 | LNC | Lincoln National | 0.048165 | Financials |
348 | 253 | BAC | Bank of America Corp | 0.104403 | Financials |
2065 | 253 | MTB | M&T Bank Corp. | 0.075769 | Financials |
1470 | 253 | HD | Home Depot | 0.065637 | Consumer Discretionary |
2581 | 190 | PSA | Public Storage | 0.115885 | Real Estate |
2729 | 190 | RF | Regions Financial Corp. | 0.074904 | Financials |
320 | 190 | AXP | American Express Co | 0.031737 | Financials |
389 | 127 | BBT | BB&T Corporation | 0.184087 | Financials |
4576 | 127 | TRV | The Travelers Companies Inc. | 0.183383 | Financials |
3446 | 127 | XEL | Xcel Energy Inc | 0.176740 | Utilities |
4301 | 64 | STT | State Street Corp. | 0.076092 | Financials |
3482 | 64 | XOM | Exxon Mobil Corp. | 0.066149 | Energy |
2539 | 64 | PRU | Prudential Financial | 0.046668 | Financials |
142 | 1 | AMP | Ameriprise Financial | 0.026139 | Financials |
116 | 1 | AMG | Affiliated Managers Group Inc | 0.025558 | Financials |
1032 | 1 | EMR | Emerson Electric Company | 0.022638 | Industrials |
attribMST_df.sort_values(['start','Betweeness'],ascending=False).groupby(['start']).head(3)[['start','ticker','Name','Betweeness','Sector']]
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
start | ticker | Name | Betweeness | Sector | |
---|---|---|---|---|---|
2418 | 2710 | BEN | Franklin Resources | 0.649713 | Financials |
18946 | 2710 | IVZ | Invesco Ltd. | 0.581805 | Financials |
8578 | 2710 | HON | Honeywell Int'l Inc. | 0.477113 | Industrials |
17187 | 2647 | UNM | Unum Group | 0.570723 | Financials |
2109 | 2647 | BAC | Bank of America Corp | 0.565530 | Financials |
9589 | 2647 | ITW | Illinois Tool Works | 0.548996 | Industrials |
10686 | 2584 | LNC | Lincoln National | 0.623492 | Financials |
11038 | 2584 | MA | Mastercard Inc. | 0.599686 | Information Technology |
13710 | 2584 | PFG | Principal Financial Group | 0.509717 | Financials |
2767 | 2521 | BRK.B | Berkshire Hathaway | 0.618615 | Financials |
11653 | 2521 | MMC | Marsh & McLennan | 0.543698 | Financials |
2371 | 2521 | BDX | Becton Dickinson | 0.524445 | Health Care |
1402 | 2458 | AMP | Ameriprise Financial | 0.654670 | Financials |
11388 | 2458 | MCO | Moody's Corp | 0.632002 | Financials |
6361 | 2458 | ETN | Eaton Corporation | 0.462152 | Industrials |
9453 | 2395 | IR | Ingersoll-Rand PLC | 0.625261 | Industrials |
18941 | 2395 | IVZ | Invesco Ltd. | 0.606101 | Financials |
1313 | 2395 | AMG | Affiliated Managers Group Inc | 0.553152 | Financials |
2104 | 2332 | BAC | Bank of America Corp | 0.720245 | Financials |
11958 | 2332 | MS | Morgan Stanley | 0.605990 | Financials |
2940 | 2332 | C | Citigroup Inc. | 0.493196 | Financials |
18017 | 2269 | WFC | Wells Fargo | 0.675696 | Financials |
3726 | 2269 | CINF | Cincinnati Financial | 0.547420 | Financials |
2763 | 2269 | BRK.B | Berkshire Hathaway | 0.489289 | Financials |
9846 | 2206 | JPM | JPMorgan Chase & Co. | 0.680760 | Financials |
14232 | 2206 | PRU | Prudential Financial | 0.601696 | Financials |
15816 | 2206 | STI | SunTrust Banks | 0.509497 | Financials |
430 | 2143 | ADP | Automatic Data Processing | 0.642566 | Information Technology |
11647 | 2143 | MMC | Marsh & McLennan | 0.627954 | Financials |
17399 | 2143 | UTX | United Technologies | 0.569319 | Industrials |
1396 | 2080 | AMP | Ameriprise Financial | 0.721395 | Financials |
9580 | 2080 | ITW | Illinois Tool Works | 0.666124 | Industrials |
11558 | 2080 | MKC | McCormick & Co. | 0.279403 | Consumer Staples |
1395 | 2017 | AMP | Ameriprise Financial | 0.750213 | Financials |
15021 | 2017 | ROP | Roper Industries | 0.543654 | Industrials |
15857 | 2017 | STT | State Street Corp. | 0.501954 | Financials |
2626 | 1954 | BLK | BlackRock | 0.657465 | Financials |
11468 | 1954 | MET | MetLife Inc. | 0.614263 | Financials |
14228 | 1954 | PRU | Prudential Financial | 0.511184 | Financials |
8565 | 1891 | HON | Honeywell Int'l Inc. | 0.641060 | Industrials |
17395 | 1891 | UTX | United Technologies | 0.638150 | Industrials |
9577 | 1891 | ITW | Illinois Tool Works | 0.495383 | Industrials |
16470 | 1828 | TMK | Torchmark Corp. | 0.720504 | Financials |
11642 | 1828 | MMC | Marsh & McLennan | 0.528795 | Financials |
10674 | 1828 | LNC | Lincoln National | 0.479425 | Financials |
1391 | 1765 | AMP | Ameriprise Financial | 0.751050 | Financials |
19406 | 1765 | L | Loews Corp. | 0.505638 | Financials |
13697 | 1765 | PFG | Principal Financial Group | 0.450503 | Financials |
1390 | 1702 | AMP | Ameriprise Financial | 0.836793 | Financials |
14532 | 1702 | R | Ryder System | 0.409247 | Industrials |
19014 | 1702 | DFS | Discover Financial Services | 0.320586 | Financials |
13695 | 1639 | PFG | Principal Financial Group | 0.847559 | Financials |
2577 | 1639 | BK | The Bank of New York Mellon Corp. | 0.352938 | Financials |
15543 | 1639 | SNA | Snap-On Inc. | 0.340600 | Consumer Discretionary |
1388 | 1576 | AMP | Ameriprise Financial | 0.802485 | Financials |
10670 | 1576 | LNC | Lincoln National | 0.483489 | Financials |
16554 | 1576 | TROW | T. Rowe Price Group | 0.348268 | Financials |
10669 | 1513 | LNC | Lincoln National | 0.680996 | Financials |
18621 | 1513 | XOM | Exxon Mobil Corp. | 0.601379 | Energy |
4330 | 1513 | COP | ConocoPhillips | 0.507735 | Energy |
1386 | 1450 | AMP | Ameriprise Financial | 0.752720 | Financials |
18926 | 1450 | IVZ | Invesco Ltd. | 0.627248 | Financials |
13252 | 1450 | OXY | Occidental Petroleum | 0.332039 | Energy |
15011 | 1387 | ROP | Roper Industries | 0.660618 | Industrials |
16551 | 1387 | TROW | T. Rowe Price Group | 0.586394 | Financials |
5948 | 1387 | EMN | Eastman Chemical | 0.555885 | Materials |
12854 | 1324 | NTRS | Northern Trust Corp. | 0.663861 | Financials |
18924 | 1324 | IVZ | Invesco Ltd. | 0.633735 | Financials |
11678 | 1324 | MMM | 3M Company | 0.539462 | Industrials |
1383 | 1261 | AMP | Ameriprise Financial | 0.703993 | Financials |
19398 | 1261 | L | Loews Corp. | 0.545069 | Financials |
17253 | 1261 | UPS | United Parcel Service | 0.449057 | Industrials |
15536 | 1198 | SNA | Snap-On Inc. | 0.704655 | Consumer Discretionary |
5813 | 1198 | EFX | Equifax Inc. | 0.546698 | Industrials |
4589 | 1198 | CTAS | Cintas Corporation | 0.530249 | Industrials |
2393 | 1135 | BEN | Franklin Resources | 0.666307 | Financials |
14127 | 1135 | PPG | PPG Industries | 0.571020 | Materials |
4544 | 1135 | CSX | CSX Corp. | 0.549951 | Industrials |
18920 | 1072 | IVZ | Invesco Ltd. | 0.684374 | Financials |
17382 | 1072 | UTX | United Technologies | 0.618229 | Industrials |
2392 | 1072 | BEN | Franklin Resources | 0.519780 | Financials |
16545 | 1009 | TROW | T. Rowe Price Group | 0.659050 | Financials |
5414 | 1009 | DOW | Dow Chemical | 0.563267 | Materials |
17161 | 1009 | UNM | Unum Group | 0.558852 | Financials |
13376 | 946 | PCAR | PACCAR Inc. | 0.769189 | Industrials |
15532 | 946 | SNA | Snap-On Inc. | 0.619479 | Consumer Discretionary |
16544 | 946 | TROW | T. Rowe Price Group | 0.426928 | Financials |
8549 | 883 | HON | Honeywell Int'l Inc. | 0.775314 | Industrials |
4584 | 883 | CTAS | Cintas Corporation | 0.551220 | Industrials |
18171 | 883 | WMB | Williams Cos. | 0.499975 | Energy |
11626 | 820 | MMC | Marsh & McLennan | 0.647800 | Financials |
8548 | 820 | HON | Honeywell Int'l Inc. | 0.545726 | Industrials |
16542 | 820 | TROW | T. Rowe Price Group | 0.537194 | Financials |
19390 | 757 | L | Loews Corp. | 0.685513 | Financials |
11889 | 757 | MRO | Marathon Oil Corp. | 0.625609 | Energy |
8547 | 757 | HON | Honeywell Int'l Inc. | 0.413511 | Industrials |
14120 | 694 | PPG | PPG Industries | 0.644525 | Materials |
13680 | 694 | PFG | Principal Financial Group | 0.556997 | Financials |
16012 | 694 | SWN | Southwestern Energy | 0.511855 | Energy |
18167 | 631 | WMB | Williams Cos. | 0.654137 | Energy |
12667 | 631 | NOV | National Oilwell Varco Inc. | 0.574091 | Energy |
13239 | 631 | OXY | Occidental Petroleum | 0.520811 | Energy |
13370 | 568 | PCAR | PACCAR Inc. | 0.598444 | Industrials |
6331 | 568 | ETN | Eaton Corporation | 0.580038 | Industrials |
5363 | 568 | DOV | Dover Corp. | 0.558415 | Industrials |
5934 | 505 | EMN | Eastman Chemical | 0.743458 | Materials |
1283 | 505 | AMG | Affiliated Managers Group Inc | 0.526646 | Financials |
16537 | 505 | TROW | T. Rowe Price Group | 0.470168 | Financials |
17372 | 442 | UTX | United Technologies | 0.709607 | Industrials |
7 | 442 | A | Agilent Technologies Inc | 0.599770 | Health Care |
5141 | 442 | DIS | The Walt Disney Company | 0.457273 | Consumer Discretionary |
16535 | 379 | TROW | T. Rowe Price Group | 0.627205 | Financials |
17635 | 379 | VNO | Vornado Realty Trust | 0.544936 | Real Estate |
15611 | 379 | SPG | Simon Property Group Inc | 0.532274 | Real Estate |
6503 | 316 | EXPD | Expeditors Int'l | 0.630099 | Industrials |
13674 | 316 | PFG | Principal Financial Group | 0.579633 | Financials |
10650 | 316 | LNC | Lincoln National | 0.545993 | Financials |
17193 | 253 | UNP | Union Pacific | 0.610661 | Industrials |
12057 | 253 | MTB | M&T Bank Corp. | 0.568301 | Financials |
2071 | 253 | BAC | Bank of America Corp | 0.532743 | Financials |
14728 | 190 | RF | Regions Financial Corp. | 0.664726 | Financials |
2070 | 190 | BAC | Bank of America Corp | 0.533955 | Financials |
2862 | 190 | BXP | Boston Properties | 0.514363 | Real Estate |
2905 | 127 | C | Citigroup Inc. | 0.668533 | Financials |
793 | 127 | AIG | American International Group, Inc. | 0.544492 | Financials |
16575 | 127 | TRV | The Travelers Companies Inc. | 0.536425 | Financials |
12834 | 64 | NTRS | Northern Trust Corp. | 0.600108 | Financials |
15826 | 64 | STT | State Street Corp. | 0.581946 | Financials |
9812 | 64 | JPM | JPMorgan Chase & Co. | 0.549021 | Financials |
13669 | 1 | PFG | Principal Financial Group | 0.658857 | Financials |
2067 | 1 | BAC | Bank of America Corp | 0.626898 | Financials |
1363 | 1 | AMP | Ameriprise Financial | 0.535597 | Financials |
We plotted the count of the sector of the high betweenness stocks in the windows and we can see that Finance stocks still lead however industrials come in number two.
df1=attrib_df.sort_values(['start','Betweeness'],ascending=False).groupby(['start']).head(3).groupby(['Sector']).count()['ticker']
df2=attribMST_df.sort_values(['start','Betweeness'],ascending=False).groupby(['start']).head(3).groupby(['Sector']).count()['ticker']
df3=pd.concat([df1, df2], axis=1).fillna(0)
df3.columns = ['t1','t2']
import matplotlib.pyplot as plt
%matplotlib inline
fig = plt.figure(figsize=(10,10))
df3['t1'].plot(kind='bar',color='red', position=0, width=0.25)
df3['t2'].plot(kind='bar',color='blue', position=1, width=0.25)
plt.xticks(rotation=90,fontsize=8)
plt.ylabel('Count')
plt.legend(['Winner Take All','MST'], loc='center left', bbox_to_anchor=(1, 0.5),ncol=1)
plt.plot()
[]
We looked at the number of communities detected over different periods.The variation in the number of communities denotes the dynamic nature of the market where a number of new communities of stocks are formed in each window and each of which can die or continue into subsequent periods
import matplotlib.pyplot as plt
%matplotlib inline
cdf=attrib_df.groupby(by=['start'])['Community'].nunique()
cdf.plot()
cdf=attribMST_df.groupby(by=['start'])['Community'].nunique()
cdf.plot()
plt.legend(['Winner Take All','MST'], loc='center left', bbox_to_anchor=(1, 0.5),ncol=1)
plt.xlabel('Days')
plt.ylabel('No of Communities')
plt.show()
One the key points we wanted to look as part of the project was too see if the network created based on the stock prizes has any relationship to the SIC(Standard Industry classification Code) of these stocks .That is do the Financial,Industrial and IT stocks trade in a group ?.So to study the similarity of the communities detected with the SIC code of these stocks in the SP500 we computed the Jaccard similarity coefficient of these communities in each of the time windows with the stock list grouped by industry code.
We converted the stock list in the communities detected and the stock list as per the SIC code into a binary matrix.The presence of the stock in the community/group is denoted by 1 and the absence by 0.
Given two objects, A and B, each with n binary attributes, the Jaccard coefficient is a useful measure of the overlap that A and B share with their attributes. Each attribute of A and B can either be 0 or 1. The total number of each combination of attributes for both A and B are specified as follows:
- M_11 represents the total number of attributes where A and B both have a value of 1.
- M_01 represents the total number of attributes where the attribute of A is 0 and the attribute of B is 1.
- M_10 represents the total number of attributes where the attribute of A is 1 and the attribute of B is 0.
- M_00 represents the total number of attributes where A and B both have a value of 0.
Each attribute must fall into one of these four categories, meaning that M_11 + M_01 + M_10 + M_00 = n.
The Jaccard similarity coefficient, J, is given as
So if a particular community has a Jaccard coefficient of more than 0.25 with the the group of stocks classified by the SIC code we classified the community as that particular SIC code .The plot is below and provides a interesting result.
So as we can visuvalize some sectors are well correlated or connected and the communities detected clearly indicate these stocks trade in groups over different periods
Finance
Real Estate
Utilities
Energy
Telecommunication
Industrials
However , the interesting result is that there are certain sectors which don't trade as groups .We don't see any communities being detected in these sectors.This seems to indicate that these sectors has various other stock specific factors at play in the market and they don't generally trade as a group
Information Technology
Materials
Consumer Staples
Consumer Discretionary
Health Care
cmatrix = attrib_df.groupby(by=['start','Community','ticker'])['ticker'].count().unstack()
cmatrix.fillna(0,inplace=True)
smatrix = attrib_df.groupby(by=['Sector','ticker'])['ticker'].count().unstack()
smatrix.fillna(0,inplace=True)
smatrix[smatrix != 0] = 1
csmatrix=pd.concat([smatrix,cmatrix])
jval = {}
for idxs,rows in smatrix.iterrows():
xlist = []
ylist = []
for idxc,rowc in cmatrix.iterrows():
x = [int(i) for i in rows]
y = [int(i) for i in rowc]
jc = (np.double(np.bitwise_and(x, y).sum()) / np.double(np.bitwise_or(x, y).sum()))
if jc > 0.25:
xlist.append(idxc[0])
ylist.append(jc)
jval[idxs] = [xlist,ylist]
import matplotlib.pyplot as plt
%matplotlib inline
plt.figure(figsize=(20,10))
cm = plt.get_cmap('gist_rainbow')
NUM_COLORS = 11
colors = [cm(i) for i in np.linspace(0,1,NUM_COLORS)]
color_dict=dict(zip(list(jval.keys()), colors))
for key in jval:
x = jval[key][0]
y = jval[key][1]
plt.scatter(x,y,color=color_dict[key],label=key)
#plt.legend(list(jval.keys()), loc='center left', bbox_to_anchor=(1, 0.5))
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.xlabel('Days')
plt.ylabel('Jaccard Simmilarity')
plt.title('Community Evolution-Winner Take All')
plt.show()
cmatrix = attribMST_df.groupby(by=['start','Community','ticker'])['ticker'].count().unstack()
cmatrix.fillna(0,inplace=True)
smatrix = attribMST_df.groupby(by=['Sector','ticker'])['ticker'].count().unstack()
smatrix.fillna(0,inplace=True)
smatrix[smatrix != 0] = 1
csmatrix=pd.concat([smatrix,cmatrix])
jval = {}
for idxs,rows in smatrix.iterrows():
xlist = []
ylist = []
for idxc,rowc in cmatrix.iterrows():
x = [int(i) for i in rows]
y = [int(i) for i in rowc]
jc = (np.double(np.bitwise_and(x, y).sum()) / np.double(np.bitwise_or(x, y).sum()))
if jc > 0.25:
xlist.append(idxc[0])
ylist.append(jc)
jval[idxs] = [xlist,ylist]
import matplotlib.pyplot as plt
%matplotlib inline
plt.figure(figsize=(20,10))
cm = plt.get_cmap('gist_rainbow')
NUM_COLORS = 11
colors = [cm(i) for i in np.linspace(0,1,NUM_COLORS)]
color_dict=dict(zip(list(jval.keys()), colors))
for key in jval:
x = jval[key][0]
y = jval[key][1]
plt.scatter(x,y,color=color_dict[key],label=key)
#plt.legend(list(jval.keys()), loc='center left', bbox_to_anchor=(1, 0.5))
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.xlabel('Days')
plt.ylabel('Jaccard Simmilarity')
plt.title('Community Evolution-MST')
plt.show()
We then merged the graphs from different windows into one graph assigned and exported the output to gexf format to visuvalise the dynamic evolution of the network.Since the networkx gexf export doesn't support dynamic attributes we uses XML parsing to convert the GEXF file to include the dynamic attributes and visuvalised the network and its dynamic evolution in Gephi
The video is available here
https://www.youtube.com/watch?v=KGpEs97YWJ4
# Constructing a graph
import networkx as nx
M=nx.MultiGraph()
for y in G_dict.keys():
M.add_nodes_from(G_dict[y].nodes(data=True))
M.add_edges_from(G_dict[y].edges(data=True))
nx.write_gexf(M,'data/price-sp500.gexf')
import xml.etree.ElementTree as ET
tree = ET.parse('data/price-sp500.gexf')
root = tree.getroot()
for element in root.iter("{http://www.gexf.net/1.1draft}node"):
#print (element.attrib)
noden = element.attrib['id']
for child in element:
for i in range(len(child)):
child.remove(child[0])
for y in G_dict.keys():
T = G_dict[y]
n = 0
try:
for att in T.node[noden]:
new = ET.Element("{http://www.gexf.net/1.1draft}attvalue")
new.set('for',str(n))
new.set('value',str(T.node[noden][att]))
new.set('start',str(y))
new.set('end',str(y+W-1))
child.append(new)
#print (T.node[noden][att])
n = n + 1
except:
continue
for element in root.iter("{http://www.gexf.net/1.1draft}edge"):
#print (element.attrib)
noden = element.attrib['id']
for child in element:
for i in range(len(child)):
child[i].set('start',str(element.attrib['start']))
child[i].set('end',str(element.attrib['end']))
tree.write('data/price-sp500-o.gexf')
# Constructing a graph
import networkx as nx
M=nx.MultiGraph()
for y in GMST_dict.keys():
M.add_nodes_from(GMST_dict[y].nodes(data=True))
M.add_edges_from(GMST_dict[y].edges(data=True))
nx.write_gexf(M,'data/price-MST-sp500.gexf')
import xml.etree.ElementTree as ET
tree = ET.parse('data/price-MST-sp500.gexf')
root = tree.getroot()
for element in root.iter("{http://www.gexf.net/1.1draft}node"):
#print (element.attrib)
noden = element.attrib['id']
for child in element:
for i in range(len(child)):
child.remove(child[0])
for y in GMST_dict.keys():
T = GMST_dict[y]
n = 0
try:
for att in T.node[noden]:
new = ET.Element("{http://www.gexf.net/1.1draft}attvalue")
new.set('for',str(n))
new.set('value',str(T.node[noden][att]))
new.set('start',str(y))
new.set('end',str(y+W-1))
child.append(new)
#print (T.node[noden][att])
n = n + 1
except:
continue
for element in root.iter("{http://www.gexf.net/1.1draft}edge"):
#print (element.attrib)
noden = element.attrib['id']
for child in element:
for i in range(len(child)):
child[i].set('start',str(element.attrib['start']))
child[i].set('end',str(element.attrib['end']))
tree.write('data/price-MST-sp500-o.gexf')
In conclusion we have been able to find answers to our questions and find some interesting results on the dynamics of the Equities in the S and P 500 index through Network Analysis.
We have been able to
1. Build networks for the various phases of the stock market between 2007 and 2017 split based on time periods, for the stocks in the S and P 500 based on the Winner Take All and the Minimum Spanning tree Method
2. Analyse the network and found that the network indeed exhibits scale free properties
3. Detect communities in these networks and compare these communities with the communities of the stocks based on the SIC code using Jaccard Similarity.
4. Study the evolution of these communities and noted the essential result that there are some sectors which essentially trade together as a group and some that don't.
5. Leverage the properties of the network to find the important stocks and the sectors which reflect movement of other stocks in the network
6. Identify stocks/sectors which based on their Betweeness cetrality are ideally placed to predict the movement of the prices in the market
7. Visuvalize the dynamic evolution of network by building dynamic graphs using Gephi
The Analysis and the results provide a interesting insight into the stock market and its dynamic nature.The results from this study and further research can be used in areas of Portfolio optimisation, Risk Mitigation etc.Further analysis can also be done by building networks between different asset classes such as currency , commodities and studying their evolution and behavior over time.
from IPython.display import Image
Image(filename='data/MST-Period-1702.png')
from IPython.display import Image
Image(filename='data/WTA-period-1702.png')
from IPython.display import Image
Image(filename='data/WTA-period-568.png')