Python — Subplot for displaying the Inflation of Indian States
The purpose of this note is to create a subplot for displaying the rising inflation of the top 12 Indian states using python. In order words, the chart should display not only the latest inflation number but also the trend for CPI for both the Urban and Rural dimensions.
Data Preparation
The ministry of statistics publishes the data for the CPI for every Indian state on a monthly basis. Unfortunately, this data is embedded in PDF tables which we need to extract and aggregate in a single excel sheet for processing. The following chart gives us a snapshot of this raw data, which will process using python for creating the subplots.
Data Processing
The following is the code that we can use to process this data for the purpose of feeding into the plotting module.
#importing libraries
import pandas as pd
import os
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
#setting ploting style
plt.style.use('fivethirtyeight')
color_pal=plt.rcParams["axes.prop_cycle"].by_key()["color"]
#changing dir
os.chdir("path of the working dir") #working dir
print(os.getcwd())
#naming file
file ="2022_05_18_State_CPI_Index.xlsx"
#loading file from disc
df= pd.read_excel(file)
#extracting names of all states form the dataframe
states = df["State/UT"].drop_duplicates()
#extracting all the weights for combined index from the dataframe
weights = df["Combined Weights"].drop_duplicates()
#extracting values of latest combined inflation from dataframe
comb_infla = df.query("Date == '2023-01-31'")["Combined inflation"]
#combining all the three values in one single dataframe
df1 = pd.concat([states,weights,comb_infla],axis=1)
#sorting the dataframe by "Comined Weights"
df1 = df1.sort_values(by=["Combined Weights"], axis=0, ascending=False)
#resets the values of row index of df1
df1.reset_index(drop=True, inplace = True)
#droping the first value which is "Pan India"
df1 = df1[1:]
#storing the values of states in a list for iterating
states = df1["State/UT"]
The purpose of the above code is to create another DataFrame that will hold the names of states, their weights, and the latest inflation number for the month of Jan 2023. The output DataFrame is embedded in the following picture.
Creating Subplots
The following is the code that I have used for creating the subplots.
#laying out the subplots canvas
fig, axs =plt.subplots(4,3, figsize =(25,25), sharex = True)
axs = axs.flatten()
i=0 #index for iterating
for state in states:
df2 = df[df["State/UT"]==state] #filtering the original DF for the specific state
df2.reset_index(drop=True, inplace=True) #reseting the row index of the new DF
df2 = df2.replace("--",0, regex=True) #replacing chr with zero
df2 = df2.replace("-",0, regex=True) #replacing chr with zero
ax2 = axs[i].twinx() #creating a twin axis in the plot canvas
#preparing axis for the first plot = "Rural CPI"
ax1 = df2.plot(ax = axs[i], kind='line',x="Date",y="Rural Index",lw=4)
#preparing axis for the first plot = "Urban CPI"
ax2 = df2.plot(ax = ax2, kind='line',x="Date",y="Urban Index",lw=4,color = color_pal[1])
# ax3 = df1.plot(ax = axs[i], kind='line',x="Date",y="Combined Index",lw=3.5,color = color_pal[2])
inflation = round(df1.values[i,2]*100,2) #extrating the values of inflation for display
weight = round(df1.values[i,1],2) #extracting values of weights for display
legend = state+" ("+str(inflation)+"%)"+" ["+str(weight)+"]" #proparing the legend
ax1.set_title(legend, fontsize =24) #printing the title
ax1.legend().remove() #preventing the legend from display
ax2.legend().remove() #preventing the legend from display
ax2.grid(False) #removing the grid for the 2nd plot
# ax3.legend().remove()
# ax3.grid(False)
ax1.set_facecolor("white") #setting up the background as white
i += 1
if i==12: #only picking the top 12 states for display
break
plt.tight_layout()
plt.show()
The code is embedded with comments for making it self-explanatory. Hence, an interested reader will not find any difficulty in understanding it. The code simply runs a loop by picking each state (initially sorted based on their weights) and plots them in canvas laid with the purpose of aggregating these subplots.
The output is captured in the figure below.
In this figure, the line marked “RED” indicates Urban CPI, and the one that is marked “BLUE” indicates the Rural CPI trend. The values enclosed in round brackets are the latest inflation numbers, and the ones that are in square brackets are weights attached to that state. Note, that these top 12 states make up 80% of the overall basket, and hence these states have maximum impact on the overall Pan-India numbers. Though “Rural” and “Urban” CPI numbers are closely related but in some states, some levels of diversion can be seen. The code is written with the purpose that this can be used as a reference for processing and plotting other matrices where such a bird’s eye view is needed. The processed version of the same data using Tableau can be found here. Thanks for reading.
(I am aggregating all the articles on this topic here, for easy discovery and reference.)