Customer Segmentation Using RFM with Python

Author : Samrat Chakraborty, Sr. Data Scientist, TCS Kolkata

In this blog you are going to learn how to implement customer segmentation using RFM (Recency, Frequency, and Monetary) analysis from scratch in Python

In Retail & e-Commerce sectors the chain of Supermarkets, Stores & Lots of e-Commerce Channel generating large amount of data on daily basis across all the stores. This wide range of customer’s transaction we need to analyze for making profitable strategies and decision.

All customers having their different kind of needs. Increasing customer’s transaction and customer base it is not very much easy to understand the requirement of each customer. Identifying potential customers can improve the marketing campaign, which ultimately increases the sales and generate more cash for business. We are using customer Segmentation for grouping those customers into various segments

RFM Analysis:
RFM Analysisis a customer segmentation method based on:

  • Recency: number of days that have passed since the customer last purchased - How recently did the customer purchase?
  • Frequency: number of purchases in a specific period (for example, last 12 months) - How often do they purchase.
  • Monetary: Value of Order in the specific period – How much do they spend.
It is based on Pareto principle: Targeting 20% of customers generate 80% of revenue.

Benefit:

  • It is very easy to group or segment the customer with These RFM technique
  • Good to fit with direct marketing : which are the best Target segment
  • Helps to Improve Customer life time value
  • Help to reduce customer churn
  • Provide Insight about Customer Loyalties

Use Cases:

  • Forecasting and sales MIS reporting based on RFM Scorecard
  • Churn Analysis
  • Customer Cluster Analysis
  • Marketing Analytics (Mailing ,Campaign)

Identify Potential Customer Segments using RFM in Python

#import modules import pandas as pd # for dataframes import matplotlib.pyplot as plt # for plotting graphs import seaborn as sns # for plotting graphs import datetime as dt

Click here to download the data file.

############ Load Data ############### data = pd.read_excel("E://Data//Online Retail.xlsx") data.head() ############## Removing Duplicate Record ################

Python Offers (drop_duplicates) function to remove repeated or duplicate record

data= data[pd.notnull(data['CustomerID'])] filtered_data=data[['Country','CustomerID']].drop_duplicates() #Top ten country's customer filtered_data.Country.value_counts()[:10]

We can observe the most of the Customers are from UK. So we can further drill down to UK

uk_data=data[data.Country=='United Kingdom'] uk_data.describe ()

In the above picture we can see customers have ordered in a negative value which cannot be possible so we need to filter quantity >0

uk_data = uk_data[(uk_data['Quantity']>0)] uk_data.info() #Filter Required coloumn for RFM Analysis

Invoice Dates helps us to calculate Recency, Invoice no helps to count the no of transaction (Frequency) & Quantity purchase of each transaction * unit price calculate to monetary value

uk_data=uk_data[['CustomerID','InvoiceDate','InvoiceNo','Quantity','UnitPrice']] uk_data['TotalPrice'] = uk_data['Quantity'] * uk_data['UnitPrice'] uk_data['InvoiceDate'].min(),uk_data['InvoiceDate'].max() PRESENT = dt.datetime(2011,12,10) uk_data['InvoiceDate'] = pd.to_datetime(uk_data['InvoiceDate'])

#### RFM Analysis ############# rfm= uk_data.groupby('CustomerID').agg({'InvoiceDate': lambda date: (PRESENT - date.max()).days, 'InvoiceNo': lambda num: len(num), 'TotalPrice': lambda price: price.sum()}) rfm.columns # Change the name of columns rfm.columns=['monetary','frequency','recency'] rfm.head()

Customers with lowest recency value and highest frequency & monetary to be considered as Best customer . We have used Quantile statistical techniques to assigning the score

In Quantiles techniques we have we can make 4 equal parts based on available values.

#computing the Quantile of RFM value for segmentation rfm['r_quartile'] = pd.qcut(rfm['recency'], 4, ['1','2','3','4']) rfm['f_quartile'] = pd.qcut(rfm['frequency'], 4, ['4','3','2','1']) rfm['m_quartile'] = pd.qcut(rfm['monetary'], 4, ['4','3','2','1']) rfm.head()

Now we can combine all three Quantile score in a single column, and it will help us to create the logic base Customer segmentation as per the business criteria.

111 will be considered as the best performing customer as they have lowest recency and highest frequency and monetary. 444 will be the worst segment

We can consider with the following marketing segmenting path to boost the sales

Quantile 1 Considered as Top score and Quantile 4 Considered as the lowest score for all Recency Frequency & Monetary

#### Combine RFM Quantile score rfm['RFM_Score'] = rfm.r_quartile.astype(str)+ rfm.f_quartile.astype(str) + rfm.m_quartile.astype(str) rfm.head() # Filter out Top/Best customers as (111) rfm[rfm['RFM_Score']=='111'].sort_values('monetary', ascending=False).head()

Now we can group RFM score into a value range based segment and create attractive visualization graph for customer segmentation dashboard

I am attaching one sample tree map segment graph for visualizing our customer segment which you can create with various visualization tools like Tableau, Qliksens, Spotfire etc.

Conclusion:
In this Blog you covered a lot of detail about customer segmentation, Types of segmentation, RFM techniques, RFM implementation with Python,

We have also covered some basic data handling part of python & quantile based statistical techniques to create a scorecard of RFM