Python Matplotlib Tutorial | Data Visualization in Python - Part 1

Author : Analytics Educator

Import the library matplotlib.pyplot

This is a library of Python which allows the user to create different types of data visualization

In [1]:
import matplotlib.pyplot

We will learn how to create Scatter plot

We will create 2 data sets (known as list in python)

Weight of the car (wt) & Miles per Gallon (mpg); milage of a car

In [2]:
#creating lists
mpg = [21 ,	21 ,	22.8 ,	21.4 ,	18.7 ,	18.1 ,	14.3 ,	24.4 ,	22.8 ,	19.2 ,	17.8 ,	16.4 ,	17.3 ,	15.2 ,	10.4 ,	10.4 ,	14.7 ,	32.4 ,	30.4 ,	33.9 ,	21.5 ,	15.5 ,	15.2 ,	13.3 ,	19.2 ,	27.3 ,	26 ,	30.4 ,	15.8 ,	19.7 ,	15 ,	21.4]
wt = [2.62 ,	2.875 ,	2.32 ,	3.215 ,	3.44 ,	3.46 ,	3.57 ,	3.19 ,	3.15 ,	3.44 ,	3.44 ,	4.07 ,	3.73 ,	3.78 ,	5.25 ,	5.424 ,	5.345 ,	2.2 ,	1.615 ,	1.835 ,	2.465 ,	3.52 ,	3.435 ,	3.84 ,	3.845 ,	1.935 ,	2.14 ,	1.513 ,	3.17 ,	2.77 ,	3.57 ,	2.78]
In [3]:
# Create the scatter plot
matplotlib.pyplot.scatter(mpg,wt) 
# creating the scatter plot with mpg & wt variable
Out[3]:
<matplotlib.collections.PathCollection at 0x18ec74a8>

We may also use a shorter name for matplotlib.pyplot by providing a shorter name. We can give this name while importing the library

In [9]:
import matplotlib.pyplot as plt
#(you may provide any name instead of plt but usually data scientists use plt as a standard name)
In [5]:
# Create the scatter plot once again with the shorter name
plt.scatter(mpg,wt) 
# We will get the same result as above
Out[5]:
<matplotlib.collections.PathCollection at 0x6888358>
In [6]:
# You may Change colour with "c" function
plt.scatter(mpg,wt, c= "red")
Out[6]:
<matplotlib.collections.PathCollection at 0x68edcf8>
In [10]:
# You may Change colour with colour code as well
plt.scatter(mpg,wt, c= "#ed7c31")
#ed7c31    orange
#0273bf    blue	
Out[10]:
<matplotlib.collections.PathCollection at 0x5536128>
In [15]:
# change size of the dots with "s" function
plt.scatter(mpg,wt, c= "purple",s=50)
Out[15]:
<matplotlib.collections.PathCollection at 0x5670d30>
In [16]:
# change axis limit of x axis with plt.xlim or y axis with plt.ylim

plt.scatter(mpg,wt, c= "red",s=50)
plt.xlim(10,40)
plt.ylim(0,6)
Out[16]:
(0, 6)

We will learn how to create Line Chart

We will create 2 data sets - (called list in python);

Goals scored by 2 football teams - East Bengal (EB) & Mohun Bagan (MB)

In [7]:
# Creating the lists

EB = [6 ,	0 ,	2 ,	1 ,	2 ,	1 ,	2 ,	2 ,	2 ,	0 ,	1 ,	3 ,	0 ,	3 ,	1 ,	0 ,	4 ,	1 ,	3 ,	3 ,	6 ,	2 ,	0 ,	0 ,	4 ,	1 ,	2 ,	1 ,	1 ,	0 ,	1 ,	3 ,	0 ,	1 ,	1 ,	1 ,	3 ,	1 ,	3 ,	0 ,	2 ,	2 ,	2 ,	1 ,	2 ,	2 ,	1 ,	2 ,	1 ,	2 ,	1 ,	2 ,	4 ,	0 ,	1 ,	1 ,	2 ,	2 ,	2 ,	0 ,	2 ,	2 ,	0 ,	0 ,	2 ,	3 ,	2 ,	2 ,	3 ,	0 ,	3 ,	3 ,	0 ,	1 ,	0 ,	0 ,	2 ,	1 ,	1 ,	3 ,	1 ,	3 ,	1 ,	1 ,	0 ,	2 ,	0 ,	3 ,	3 ,	3 ,	2 ,	0 ,	0 ,	1 ,	4 ,	1 ,	0 ,	2 ,	1 ,	1 ,	2 ,	2 ,	0 ,	2 ,	1 ,	1 ,	2 ,	3 ,	1 ,	4 ,	4 ,	1 ,	2 ,	2 ,	2 ,	1 ,	0 ,	0 ,	5 ,	1 ,	1 ,	1 ,	0 ,	1 ,	1 ,	0 ,	2 ,	1 ,	3 ,	2 ,	1 ,	1 ,	1 ,	0 ,	1 ,	1 ,	3 ,	2 ,	1 ,	3 ,	1 ,	2 ,	1 ,	1 ]
MB = [0 ,	2 ,	1 ,	0 ,	0 ,	1 ,	1 ,	1 ,	2 ,	2 ,	0 ,	1 ,	2 ,	1 ,	0 ,	0 ,	4 ,	2 ,	0 ,	1 ,	0 ,	1 ,	4 ,	0 ,	1 ,	3 ,	0 ,	1 ,	0 ,	2 ,	5 ,	0 ,	0 ,	1 ,	3 ,	2 ,	1 ,	0 ,	3 ,	1 ,	3 ,	0 ,	1 ,	1 ,	1 ,	1 ,	1 ,	2 ,	0 ,	0 ,	2 ,	0 ,	0 ,	0 ,	1 ,	1 ,	2 ,	1 ,	1 ,	0 ,	1 ,	1 ,	1 ,	3 ,	0 ,	1 ,	1 ,	1 ,	3 ,	3 ,	1 ,	0 ,	0 ,	3 ,	2 ,	0 ,	0 ,	3 ,	2 ,	0 ,	0 ,	0 ,	1 ,	0 ,	3 ,	5 ,	2 ,	2 ,	1 ,	1 ,	3 ,	0 ,	1 ,	0 ,	2 ,	2 ,	1 ,	2 ,	2 ,	4 ,	1 ,	1 ,	0 ,	2 ,	1 ,	1 ,	1 ,	1 ,	3 ,	1 ,	0 ,	1 ,	1 ,	1 ,	3 ,	1 ,	2 ,	0 ,	0 ,	1 ,	0 ,	0 ,	1 ,	0 ,	0 ,	1 ,	2 ,	0 ,	1 ,	0 ,	1 ,	1 ,	0 ,	1 ,	2 ,	1 ,	0 ,	1 ,	2 ,	0 ,	1 ,	0 ,	1 ,	2 ]
In [10]:
# create line chart for EB (use keyword plot for line chart)
plt.plot(EB)
Out[10]:
[<matplotlib.lines.Line2D at 0x18fb6da0>]
In [20]:
# create 2 line charts in same graph and decrease the limit of x axis; 0 to 145 to remove white space at the end
plt.plot(EB)
plt.plot(MB)
plt.xlim(0,145)
Out[20]:
(0, 145)
In [21]:
# change the colour to brown & show seperately by putting plt.show() in between plot statement of EB & MB
# New lines of codes written in this cell, which were not there in the previous cell, are marked as #new code
plt.plot(EB, c="brown")#new code
plt.show()#new code
plt.plot(MB)
plt.xlim(0,145)
Out[21]:
(0, 145)
In [22]:
# Put Title with title keyword (given some title to both the charts)

plt.plot(EB, c="brown")
plt.title("Plot no. 1")#new code
plt.show()
plt.plot(MB)
plt.title("Plot no. 2")#new code
plt.xlim(0,145)
Out[22]:
(0, 145)
In [23]:
# Put some name to x axis and y axis with xlabel keyword
# Here we have removed the plt.show(), hence both the lines have been plotted on the same graph
plt.plot(EB, c="brown")
plt.title("Plot no. 1")
plt.plot(MB)
plt.xlim(0,145)
plt.xlabel("East Bengal")#new code
plt.ylabel("Mohun Bagan")#new code
Out[23]:
Text(0,0.5,'Mohun Bagan')

X axis & Y axis are marked with their values like 0, 20, 40 & 0, 1, 2 etc respectively. If we wish then we may change this check points and put custom made check points like 0 = Bad, 60 = Average & 100 = Good etc

In [24]:
# Custom made check points of x axis or y axis can be changed with plt.xtricks or plt.ytricks keyword  

plt.plot(EB, c="brown")
plt.title("Plot no. 1")
plt.plot(MB)
plt.xlim(0,145)
plt.xlabel("East Bengal")
plt.ylabel("Mohun Bagan")
plt.xticks([0,60,100,140],["Bad","Avg","Good","super",])#new code
                
Out[24]:
([<matplotlib.axis.XTick at 0x1a549518>,
  <matplotlib.axis.XTick at 0x1a542e10>,
  <matplotlib.axis.XTick at 0x1a542a58>,
  <matplotlib.axis.XTick at 0x1a565d68>],
 <a list of 4 Text xticklabel objects>)
In [27]:
# May change the angle of the text Bad, Average etc with rotation function
plt.plot(EB, c="brown")
plt.title("Plot no. 1")
plt.plot(MB)
plt.xlim(0,145)
plt.xlabel("East Bengal")
plt.ylabel("Mohun Bagan")
plt.xticks([0,60,100,140],["Bad","Avg","Good","super",],
                                                    rotation = 45)#new code; we may also make it as 90 degree
Out[27]:
([<matplotlib.axis.XTick at 0x1a463f98>,
  <matplotlib.axis.XTick at 0x1a4549b0>,
  <matplotlib.axis.XTick at 0x1a4545f8>,
  <matplotlib.axis.XTick at 0x1a40ceb8>],
 <a list of 4 Text xticklabel objects>)
In [28]:
## We may also replace the line style with hyphen or dot etc
# Change the line style of EB with ls function

plt.plot(EB, c="brown", ls="--")#new code
plt.title("Plot no. 1")
plt.plot(MB)
plt.xlim(0,145)
plt.xlabel("East Bengal")
plt.ylabel("Mohun Bagan")
plt.xticks([0,60,100,140],["Bad","Avg","Good","super",],rotation=45)
Out[28]:
([<matplotlib.axis.XTick at 0x1a3bcf98>,
  <matplotlib.axis.XTick at 0x1a3b1c88>,
  <matplotlib.axis.XTick at 0x1a3b18d0>,
  <matplotlib.axis.XTick at 0x6befc18>],
 <a list of 4 Text xticklabel objects>)

You may also add an extra text into the chart to point out something; it's called annotate. e.g. we may want to point out the highest spike of the graph.

In [29]:
# Annotate
# Use plt.annotate keyword to annotate
# xy shows the point which we want to highlight (x and y value shows the coordinates which we want to highlight). 
# Here, the maximum spike is at the point where x axis =20 & y axis = 6
# xytext shows the coordinate where we want to put the remark or text which is to be displayed in the graph
# arrowprops means it is to be pointed with an arrow sign
# dict means dictionary of arrow properties; here we have mentioned only the colour

plt.plot(EB, c="brown")
plt.title("Plot no. 1")
plt.plot(MB)
plt.xlim(0,145)
plt.xlabel("East Bengal")
plt.ylabel("Mohun Bagan")
plt.annotate("Max value", xy=(20,6), xytext=(70,6),
arrowprops=dict(facecolor="magenta"))#new code
Out[29]:
Text(70,6,'Max value')
In [30]:
# we may incrase the figure size with plt.figure keyword
# here we mentioned 10 as width and 5 as length

plt.figure(figsize=(10,5))#new code
plt.plot(EB, c="brown")
plt.title("Plot no. 1")
plt.plot(MB)
plt.xlim(0,145)
plt.xlabel("East Bengal")
plt.ylabel("Mohun Bagan")
plt.annotate("Max value", xy=(20,6), xytext=(70,6),
arrowprops=dict(facecolor="magenta"))
Out[30]:
Text(70,6,'Max value')
In [31]:
# legend can be added with label & legend keyword

plt.figure(figsize=(10,5))
plt.plot(EB, c="brown", label="East Bengal")#new code
plt.title("Plot no. 1")
plt.plot(MB, label="mohun Bagan")#new code
plt.legend()#new code
plt.xlim(0,145)
plt.xlabel("East Bengal")
plt.ylabel("Mohun Bagan")
plt.annotate("Max value", xy=(20,6), xytext=(70,6),
arrowprops=dict(facecolor="magenta"))
Out[31]:
Text(70,6,'Max value')
In [11]:
# legend location can be altered as upper right (default location), lower right, lower left etc
# usually data scientists prefer upper right position - the default one

plt.figure(figsize=(10,5))
plt.plot(EB, c="brown", label="East Bengal")
plt.title("Plot no. 1")
plt.plot(MB, label="mohun Bagan")
plt.legend(loc = "upper left")#upper left/lower right #new code
plt.xlim(0,145)
plt.xlabel("East Bengal")
plt.ylabel("Mohun Bagan")
plt.annotate("Max value", xy=(20,6), xytext=(70,6),
arrowprops=dict(facecolor="magenta"))
Out[11]:
Text(70,6,'Max value')
In [ ]:
# we may save the figure as pdf or png or jpeg etc
# We may also put a location to save it into that very folder

plt.figure(figsize=(10,5))
plt.plot(EB, c="brown", label="East Bengal")
plt.title("Plot no. 1")
plt.plot(MB, label="mohun Bagan")
plt.legend(loc = "upper right")#upper left/lower right #new code
plt.xlim(0,145)
plt.xlabel("East Bengal")
plt.ylabel("Mohun Bagan")
plt.annotate("Max value", xy=(20,6), xytext=(70,6),
arrowprops=dict(facecolor="magenta"))
plt.savefig("C:/Users/Desktop/python plot/Line chart.pdf")#new code; 
#it will be saved in python plot folder placed in desktop as line chart.pdf; 
#change the extension to save it in other format of .jpeg or .png

We will learn how to create Histogram

We will be using the same lists - EB & MB

In [12]:
# we can create histogram with hist keyword
plt.hist(EB)     
Out[12]:
(array([28., 49.,  0., 38.,  0., 20.,  6.,  0.,  1.,  2.]),
 array([0. , 0.6, 1.2, 1.8, 2.4, 3. , 3.6, 4.2, 4.8, 5.4, 6. ]),
 <a list of 10 Patch objects>)
In [13]:
# The above graph is pretty ugly looking, we need to make it look better.
# We will add bins - different discrete values
plt.hist(EB, bins=range(8))  
Out[13]:
(array([28., 49., 38., 20.,  6.,  1.,  2.]),
 array([0, 1, 2, 3, 4, 5, 6, 7]),
 <a list of 7 Patch objects>)
In [14]:
# We will add figsize and legends

plt.figure(figsize=(8,5))#figure size
plt.hist(EB, bins=range(8), label="East Bengal")
plt.legend()#legends        
Out[14]:
<matplotlib.legend.Legend at 0x19117400>
In [15]:
# We will remove the extra space from both sides by using xlim

plt.figure(figsize=(8,5))
plt.hist(EB, bins=range(8), label="East Bengal")
plt.legend()        
plt.xlim(0,7)#removes the extra white space
plt.xlabel("Goals scored")#x axis name
plt.ylabel("No of Goals")#y axis name
Out[15]:
Text(0,0.5,'No of Goals')

We will now plot 2 variables - EB & MB simulteniously in the same chart to compare

In [18]:
# We may plot goals of EB & MB in same chart

plt.figure(figsize=(8,5))
plt.hist((EB,MB), bins=range(8),
color=["red","green"],#add colors
label=["East Bengal","Mohun Bagan"])
plt.legend()        
plt.xlim(0,7)#removes the extra white space
plt.xlabel("Goals scored")#x axis name
plt.ylabel("No of Goals")#y axis name
Out[18]:
Text(0,0.5,'No of Goals')

End note

we hope this tutorial will help you to create visualization on your own. However, in case if you face any problem, or want to send any feedback then please write to us at analyticseducator@gmail.com