I am sure you might have experienced many racing charts. In this note, we will make an attempt to create one to describe the Indian telecom subscribers' growth since 2013 till date. You will be surprised that this is pretty simple unless we plan to customize it for more flexibility.
Data Preparation
The raw looks something like this.
Now we have to process the data so that it can be used for the purpose of visualization. The following code serves the purpose.
import os
import pandas as pd
from datetime import datetime
#changing directory
os.chdir("/Users/pkar/Documents/Documents - Parag’s MacBook Pro/parag_qc_files/2022/2022_11_26_New_Data_Processing/07_TRAI_Files") #working dir
print(os.getcwd())
#reading file from disk
df = pd.read_excel("2020_05_14_Telecom_Wireless_&_Wireline.xlsx")
#filtering all data below the year 2013
df = df.query("Year > 2012")
#converting each columns of the dataframe to specific data type
convert_dict = {'Year':int, 'Months':int, 'Circle':str, 'Category':str, 'Bharti':int, 'BSNL':int,
'Voda Idea':int, 'MTNL':int, 'RCOM':int, 'RJIO':int, 'Vodafone':int, 'Tata':int, 'Idea':int,
'Aircel':int, 'Telenor':int, 'MTS':int, 'Loop':int, 'Videocon':int, 'Quad':int, 'BSNL VNO':int}
df = df.astype(convert_dict)
#converting the date column data to date type
df['Date'] = pd.to_datetime(df['Date'])
df = df.sort_values(by=["Year","Date"])
df.reset_index(drop=True)
#grouping the data anchored around Date and Year
df1 = df.groupby(["Date","Year"]).sum()
df1=df1.reset_index()
#dropping all irrelavent columns from dataframe
df1.drop(["Months","Year"], axis = 1, inplace = True)
#tranposing the dataframe so that subscriber values of all operators falls in one single column
df1=df1.melt(id_vars=['Date'], value_vars=columns)
#naming the columns of new data frame properly
df1.columns = ["Date","Operator","Subscriber"]
#formating the date column so that it is palatable for procesing.
df1['Date'] = pd.to_datetime(df1['Date'], format='%d-%b-%Y').dt.date
#converting subscriber numbers to millions
df1["Subscriber"]=df1["Subscriber"]/1000000
df1["Subscriber"] = df1["Subscriber"].round()
This code is self-explanatory, the output data frame that emanates out of this processing looks something like this.
Data Visualization
Once the data gets processed in the proper format then the next steps are pretty simple. Here is the code that we have used for this purpose.
import pandas as pd
import plotly
import plotly.express as px
%matplotlib inline
init_notebook_mode(connected=True)
from raceplotly.plots import barplot
my_raceplot = barplot(df1,
item_column='Operator',
value_column='Subscriber',
time_column='Date',
top_entries= 10)
fig = my_raceplot.plot(item_label = 'Indian Telecom Operators',
value_label = 'Subscribers in Millions',
frame_duration = 200)
fig.update_layout(
title="Indian Top 10 Operator's Subscribers Trend",
title_x=0.10,
width=800,
height=600,
paper_bgcolor="floralwhite",
)
plotly.offline.plot(fig,filename = "Telecom Subs Racing Bars.html")
The final output looks something like this.
For experiencing the live link, please visit the following site.
https://paragkar.com/wp-content/uploads/2023/03/Telecom-Subs-Racing-Bars.html
Documentation for raceplotly is here — https://pypi.org/project/raceplotly/
Thanks for reading.
(I am aggregating all the articles on this topic here, for easy discovery and reference.)