Create a Python application to measure customer lifetime value (CLV)

Pan Chuang AI sharing

Author amitvkulkarni

Compile | Flin

Source | analyticsvidhya

[guide] "if you don't care about your customers, your competitors will care" -- Bob Hooey


Customer lifetime value is the profit that an enterprise obtains from a specific customer during the period when a specific customer is associated with the enterprise. Each industry has its own set of indicators, which can be tracked and measured to help enterprises target the right customers and predict the future customer base.

CLV enables various departments such as marketing and sales to plan their strategies and provide specific products or customized services to the most valuable customers. It also provides a framework for customer service teams to understand the efforts needed to develop and retain customers.

When CLV is applied together with other tools (such as customer segmentation, pricing and marketing strategy), it is the most effective and adds great value, which means that it tells us who our most profitable customers are, but it doesn't tell us which products need to be sold at what price and quantity.

Therefore, CLV should be applied wisely, not as the only standard for making business decisions. CLV may change according to the business model and its objectives, which means that its definition and calculation need to be reviewed regularly.

The following is how CLV is used in various industries

Insurance: the marketing team wants to know which customers are most likely to pay high premiums rather than claims, which in turn helps them acquire new customers and grow their business.

Telecom: the predicted CLV is used to understand the likelihood of current customer loyalty and their continued use of programs or subscriptions.

Retail: CLV is used to understand purchase behavior and consume for specific customers with customized offers / discounts.

Benefits of CLV

  • Acquisition cost: help determine acceptable acquisition cost and where to put the marketing work
  • Prospects: help determine the future value of existing and potential new customers
  • Customer relationship: be able to establish a stronger and effective relationship with customers
  • Brand loyalty: good relationships help build brand loyalty


We will explore the following steps and, at the end of this blog, build a customer lifetime value simulator application using plot dash.

  • Customer lifetime value (CLV) overview
  • Benefits of CLV
  • Data exploration
  • CLV calculation
  • Developing applications using Plotly dash
  • Conclusion


We will use the machine learning repository from UCI( to build Python applications.

The property description can be found in the URL above.

Let's load the dataset and view the data.

data = pd.read_excel("./data/Online_Retail.xlsx")
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   InvoiceNo    40 non-null     int64         
 1   StockCode    40 non-null     object        
 2   Description  40 non-null     object        
 3   Quantity     40 non-null     int64         
 4   InvoiceDate  40 non-null     datetime64[ns]
 5   UnitPrice    40 non-null     float64       
 6   CustomerID   40 non-null     int64         
 7   Country      40 non-null     object        
dtypes: datetime64[ns](1), float64(1), int64(3), object(3)

Data preprocessing

Let's clean up the data and create the new functions we need to calculate the CLV at a later stage.

  1. Data cleaning: delete duplicate records
  2. Quantity: we will only consider positive quantity. Any negative value means that the product is returned for some reason.
  3. Total purchase: this will be the unit price x quantity of the product
  4. Aggregation: since the data is at the transaction level, we aggregate the data by CustomerID and Country. We will do this by using the group by function.
  5. Average order value: this will be the ratio of the amount spent to the number of transactions
  6. Purchase frequency: This is the ratio of the total number of transactions to the total number of transactions. It is the average number of orders per customer.
  7. Churn rate: This is the percentage of customers who have not reordered.
  8. CLTV: (average order value x purchase frequency) / turnover rate)
  9. In addition, let's rename some column names to make them easy to track.

After completing the above steps, the data will be as follows.

As we progress, we will further address this issue.

The complete code can be obtained from

Developing applications using Plotly Dash

We will use plot dash to develop our application, a Python framework for building data applications. Let's create a file called and start by loading the library.

Step 1: load the library
import pandas as pd
import matplotlib.pyplot as plt 
import seaborn as sns
import datetime as dt
import numpy as np
import dash
import dash_table
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output, State
Step 2: design layout (UI)

Card: all 4 KPI s we are tracking will be at the top of the page. Defines font size, color, etc. In addition, the unique ID of each card will later be used to populate the value.

html.H2('Total Customers', style={
    'font-weight': 'normal'}),
html.H2(id='id_total_customer', style = {'color': 'DarkSlateGray'}),
], className='box_emissions'),
    html.H2('Total Transactions', style={
        'font-weight': 'normal'}),
    html.H2(id='id_total_transactions', style = {'color': 'DarkSlateGray'}),
], className='box_emissions'),
    html.H2('Total Sales($)', style={
        'font-weight': 'normal'}),
    html.H2(id='id_total_sales', style = {'color': 'DarkSlateGray'}),
], className='box_emissions'),
    html.H2('Avg Order Value($)', style={
        'font-weight': 'normal'}),
    html.H2(id='id_order_value', style = {'color': 'DarkSlateGray'}),
], className='box_emissions'),

Chart: we have 2 charts, one bar chart shows the best-selling products, and the second shows the purchase trend of countries / regions.

df_plot_bar = pp.filtered_data.groupby('Description').agg({'TotalPurchase':'sum'}).sort_values(by = 'TotalPurchase', ascending=False).reset_index().head(5)
df_plot_bar['percent'] = round((df_plot_bar['TotalPurchase'] / df_plot_bar['TotalPurchase'].sum()) * 100,2).apply(lambda x : "{:,}".format(x))
fir_plotbar =, y='percent', x='Description', title='TOP SELLING PRODUCTS', text='percent', color='percent',)
fir_plotbar.update_traces(texttemplate='%{text:.2s}', textposition='inside')
fir_plotbar.update_layout(uniformtext_minsize=8, uniformtext_mode='hide', showlegend=False)
Scatter diagram:
df_plot = df.groupby(['Country','Description','UnitPrice','Quantity']).agg({'TotalPurchase': 'sum'},{'Quantity':'sum'}).reset_index()
fig_UnitPriceVsQuantity = px.scatter(df_plot[:25000], x="UnitPrice", y="Quantity", color = 'Country', 
        size='TotalPurchase',  size_max=20, log_y= True, log_x= True, title= "PURCHASE TREND ACROSS COUNTRIES")

Note: similar to cards and drawings, other UI components such as sidebars and tables for displaying results are designed. Please access the complete code from Github:

Step 3: define interactivity (callback)

We defined an update_output_All() function, which takes the value of the control as input and executes logic, which means generating visualization and data tables, which will be filled into the UI. Lack of interactivity in two aspects:

  1. Application loading: all cards, charts, KPI s and tables will contain numbers from all countries.
  2. User selection: once a user selects a specific country, all cards, charts and tables will contain data specific to the selected country.
def update_output_All(country_selected):
        if (country_selected != 'All' and country_selected != None):
            df_selectedCountry = pp.filtered_data.loc[pp.filtered_data['Country'] == country_selected]
            df_selectedCountry_p = pp.filtered_data_group.loc[pp.filtered_data_group['Country'] == country_selected]
            cnt_transactions = df_selectedCountry.Country.shape[0]
            cnt_customers = len(df_selectedCountry.CustomerID.unique())
            cnt_sales = round(df_selectedCountry.groupby('Country').agg({'TotalPurchase':'sum'})['TotalPurchase'].sum(),2)
            return [cnt_customers, cnt_transactions, cnt_sales, cnt_avgsales,  df_selectedCountry_p.drop(['num_days','num_units'], axis = 1).to_dict('records'),
                    fig_UnitPriceVsQuantity_country, fir_plotbar]
            cnt_transactions = pp.filtered_data.shape[0]
            cnt_customers = len(pp.filtered_data.CustomerID.unique())
            cnt_sales = round(pp.filtered_data.groupby('Country').agg({'TotalPurchase':'sum'})['TotalPurchase'].sum(),2)
            cnt_avgsales = round(pp.filtered_data_group.groupby('Country').agg({'avg_order_value': 'mean'})['avg_order_value'].mean())
            return [cnt_customers, cnt_transactions, cnt_sales,cnt_avgsales,  pp.filtered_data_group.drop(['num_days','num_units'], axis = 1).to_dict('records'),
                    pp.fig_UnitPriceVsQuantity, fir_plotbar]
    except Exception as e:
        logging.exception('Something went wrong with interaction logic:', e)

The complete code can be accessed from


The purpose of this blog is to introduce a formula method for calculating customer life cycle value (CLV) using Python, and build a dashboard / Web application that can help business users make real-time decisions.

We also cover all aspects of building data applications, from data exploration to formulas, and some industry cases that can take advantage of CLV.

  • This project setting can be used as a template to quickly copy it for other use cases.
  • You can build more complex prediction models to calculate CLV.
  • Add more controls and drawings related to your case and have more interactivity.

Posted by toolman on Fri, 19 Nov 2021 00:12:36 -0800