All customers having their different kind of needs. Increasing customer’s transaction and customer base it is not very much easy to understand the requirement of each customer. Identifying potential customers can improve the marketing campaign, which ultimately increases the sales and generate more cash for business. We are using customer Segmentation for grouping those customers into various segments
RFM Analysis:
RFM Analysisis a customer segmentation method based on:
Benefit:
Use Cases:
#import modules
import pandas as pd # for dataframes
import matplotlib.pyplot as plt # for plotting graphs
import seaborn as sns # for plotting graphs
import datetime as dt
Click here to download the data file.
############ Load Data ###############
data = pd.read_excel("E://Data//Online Retail.xlsx")
data.head()
############## Removing Duplicate Record ################
Python Offers (drop_duplicates) function to remove repeated or duplicate record
data= data[pd.notnull(data['CustomerID'])]
filtered_data=data[['Country','CustomerID']].drop_duplicates()
#Top ten country's customer
filtered_data.Country.value_counts()[:10]
We can observe the most of the Customers are from UK. So we can further drill down to UK
uk_data=data[data.Country=='United Kingdom']
uk_data.describe ()
In the above picture we can see customers have ordered in a negative value which cannot be possible so we need to filter quantity >0
uk_data = uk_data[(uk_data['Quantity']>0)]
uk_data.info()
#Filter Required coloumn for RFM Analysis
Invoice Dates helps us to calculate Recency, Invoice no helps to count the no of transaction (Frequency) & Quantity purchase of each transaction * unit price calculate to monetary value
uk_data=uk_data[['CustomerID','InvoiceDate','InvoiceNo','Quantity','UnitPrice']]
uk_data['TotalPrice'] = uk_data['Quantity'] * uk_data['UnitPrice']
uk_data['InvoiceDate'].min(),uk_data['InvoiceDate'].max()
PRESENT = dt.datetime(2011,12,10)
uk_data['InvoiceDate'] = pd.to_datetime(uk_data['InvoiceDate'])
#### RFM Analysis #############
rfm= uk_data.groupby('CustomerID').agg({'InvoiceDate': lambda date: (PRESENT - date.max()).days,
'InvoiceNo': lambda num: len(num),
'TotalPrice': lambda price: price.sum()})
rfm.columns
# Change the name of columns
rfm.columns=['monetary','frequency','recency']
rfm.head()
Customers with lowest recency value and highest frequency & monetary to be considered as Best customer . We have used Quantile statistical techniques to assigning the score
In Quantiles techniques we have we can make 4 equal parts based on available values.
#computing the Quantile of RFM value for segmentation
rfm['r_quartile'] = pd.qcut(rfm['recency'], 4, ['1','2','3','4'])
rfm['f_quartile'] = pd.qcut(rfm['frequency'], 4, ['4','3','2','1'])
rfm['m_quartile'] = pd.qcut(rfm['monetary'], 4, ['4','3','2','1'])
rfm.head()
Now we can combine all three Quantile score in a single column, and it will help us to create the logic base Customer segmentation as per the business criteria.
111 will be considered as the best performing customer as they have lowest recency and highest frequency and monetary. 444 will be the worst segment
We can consider with the following marketing segmenting path to boost the sales
Quantile 1 Considered as Top score and Quantile 4 Considered as the lowest score for all Recency Frequency & Monetary
#### Combine RFM Quantile score
rfm['RFM_Score'] = rfm.r_quartile.astype(str)+ rfm.f_quartile.astype(str) + rfm.m_quartile.astype(str)
rfm.head()
# Filter out Top/Best customers as (111)
rfm[rfm['RFM_Score']=='111'].sort_values('monetary', ascending=False).head()
Now we can group RFM score into a value range based segment and create attractive visualization graph for customer segmentation dashboard
I am attaching one sample tree map segment graph for visualizing our customer segment which you can create with various visualization tools like Tableau, Qliksens, Spotfire etc.
Conclusion:
In this Blog you covered a lot of detail about customer segmentation, Types of segmentation, RFM techniques, RFM implementation with Python,
We have also covered some basic data handling part of python & quantile based statistical techniques to create a scorecard of RFM