Awards and streams#
We thought it would be interesting to see if the amount of awards would increase with the amount of streams. So if an artist is more liked by the critics that would result in higher streams on spotify. We used Spotify Weekly Top 200 Songs Streaming Data for streams and Data on Songs from Billboard 1999-2019, specifically grammyAlbums_199-2019.csv and grammySongs_1999-2019.csv for the awards won by the Artists. We merged these datasets to plot a graph to see if there is a trendline between streams and awards. By looking at the trendline and Pearson’s R (a way to measure correlation strength), there does not seem to be any correlations.
Show code cell source
import plotly.graph_objs as go
import plotly.express as px
import pandas as pd
df = pd.read_csv('../cleaned/final.csv', low_memory=False)
df = df.loc[df['streams'] != 'streams']
df['streams'] = df['streams'].astype(int)
df = df.groupby('artist_individual', as_index=False).agg({'streams': 'sum'})
df = df.rename(columns={'artist_individual': 'Artist'})
df = df.sort_values('streams')
df2 = pd.read_csv("../cleaned/grammyAlbums_199-2019.csv", index_col=0)
df3 = pd.read_csv("../cleaned/grammySongs_1999-2019.csv", index_col=0)
df3.rename(columns={'GrammyAward': 'Award'}, inplace=True)
df2 = pd.concat([df2, df3])
df2 = df2.groupby('Artist', as_index=False).count()
df2 = df2.loc[df2['Artist'] != 'Various Artists']
df2 = df2.loc[df2['Award'] > 1]
df2.sort_values('Award', inplace=True)
df4 = pd.merge(df,df2, on='Artist')
px.scatter(df4, x='streams', y='Award', hover_data='Artist', trendline='ols', trendline_color_override="orange", title='Awards and streams per artist').show()
print(df4['streams'].corr(df4['Award']))
0.10362306158825516
It is interesting to note that when we remove the outliers, the artists with more than 10 billion streams and the artists with less than 500 milion streams. We start to see a trend forming. Here we get a Pearson’s R greater than 0.5 which means that there is a strong positive linear relationship.
df = df.loc[df["streams"] > 500000000]
df = df.loc[df['streams'] < 10000000000]
df4 = pd.merge(df,df2, on='Artist')
px.scatter(df4, x='streams', y='Award', hover_data='Artist', trendline='ols',trendline_color_override="orange", title='Awards and streams per artist').show()
df4.head
print(df4['streams'].corr(df4['Award']))
0.5366735869775413