Food for thought is that it costs 5-10 times more to recruit a new customer than to retain an existing one. The annual churn rate for the telecommunications industry is an estimated 30-35 percent on average. Similarly, for other industries, customer retention has now become even more important than customer acquisition. Some researchers found that a 5% increase in customer retention could increase company profitability from 25% to 85%. Customer Lifetime Value (CLV) or Lifetime Value (LTV) is a key metric that illustrates a prediction of the net profit of an entire future relationship with a customer in marketing. Just to summarise: knowing the lifetime value of your customers helps you:

• Segment your customers and develop and deliver unique segment-specific marketing Treatments.
• Define your return on Investment.
• Forecast customer satisfaction.
• Innovate and optimize marketing tools, tactics and channels.
•  Adjust communication campaigns and messages.
• Conduct profitable loyalty programs.
• Cross-sell and up-sell based on individual patterns of buying.

There are tons of different ways to calculate lifetime value of customers, such as Navie, RFM, Markov Chains, Hazard Functions, Survival Regressions, Machine learning approaches and distributed based approaches. In this post, we briefly explain Recency Frequency Monetary (RFM) approaches as well as distribution based approaches.

### RFM Method

The most popular approach for measuring the customer's’ lifetime value is RFM. RFM refers to a modelling technique that uses the following three factors from client records:

• Recency: Period since last purchase.
• Frequency: How many purchases an individual made during the observation period.
• Monetary: Cumulative total spent by client during observation period.

Bruce Hardie developed a spreadsheet to implement RFM that is easy to use and can be seen here

Grouping the RFM can be done through clustering algorithms which are a form of unsupervised machine learning. A popular method for clustering is to use Hartigan’s Rule which “essentially compares the ratio of the within-cluster sum of squares for a clustering with k clusters and one with k + 1 clusters, accounting for the number of rows and clusters. If that number is greater than 10, then it is worth using k + 1 clusters.” . In the example below, we have two columns of frequency and recency of my customers and wish to cluster them based on these two factors by using R:

library (ggplot2)
mydataCluster <- kmeans(mydata, 3, nstart = 20)
mydataCluster$cluster <- as.factor(mydataCluster$cluster)
ggplot(mydata, aes(frequency, recency, color = mydataCluster\$cluster)) + geom_point()

This code clusters our data into three groups and plots them:

### Distribution based Non-Contractual Models

An alternative to RFM is to use a more complicated approach which is the distributed based non-contractual model. Three stochastic models are the most popular ones for calculating CLV, namely BG/NBD, BG/BB, Pareto/NBD. A comparative summary of these models is in the following table:

Bruce Hardie has a number of excel spreadsheets and explanations for these models:

An R package called BTYD can be used to calculate these models.

In addition, Lifetimes is a Python library to calculate CLV. We use an example to show how this package can be used for measuring CLV. In this example you can use cdnow_customers.csv located in the datasets/ directory.

from lifetimes.datasets import load_cdnow
print (data)
     frequency   recency      T ID 1    2           30.43       38.86 2    1            1.71       38.86 3    0            0.00       38.86 4    0            0.00       38.86 5    0            0.00       38.86

T represents the age of the customer in whatever time units chosen. In this example, time unit is week.

This example uses BG/NBD model, however you can try different models by importing one of these: BetaGeoFitter', 'ParetoNBDFitter', 'GammaGammaFitter', 'ModifiedBetaGeoFitter', 'BetaGeoBetaBinomFitter

We build the model:

from lifetimes.datasets import load_cdnow
bgf = BetaGeoFitter()
bgf.fit(data['frequency'], data['recency'], data['T'])
print (bgf)

Assuming all is fine so far, we should have this output:

<lifetimes.BetaGeoFitter: fitted with 2357 subjects, r: 0.24, alpha: 4.41, a: 0.79, b: 2.43>

Now, we plot our model:

from lifetimes.plotting import plot_frequency_recency_matrix
plot_frequency_recency_matrix(bgf)
t = 1
data['predicted_purchases'] = data.apply(lambda r: bgf.conditional_expected_number_of_purchases_up_to_time(t, r['frequency'], r['recency'], r['T']), axis=1)
data.sort('predicted_purchases').tail(5)
from lifetimes.plotting import plot_period_transactionsplot_period_transactions(bgf)