Starbucks Offer Analysis

6 min readOct 3, 2021

Dataset contains simulated data that mimics customer behavior on Starbucks rewards mobile app. Starbucks sends out to offers to mobile app users. Offer can be advertisement of the drink or an actual offer as discount or BOGO(buy one get one). Not all users receive same offer and some users might not receive any offer in the week. Along these lines, Starbucks can most likely built the possibility that customer opens the offer after they get it and finish the transaction.

Problem Statement

The goal of this project is to combine transaction, demographic and offer data to determine which demographic groups respond to which offer type. This goal can be achieved by following mentioned strategy:

Exploring & visualizing data

2. Data pre-processing

3. Analysis on cleaned data

4. Scaling numerical features

5. Trying supervised learning models

6. Evaluate model performance and chose the best one

7. Use GriSearchCV to find optimal parameters for model training

Dataset Description

The data is contained in three files:

portfolio.json — containing offer ids and meta data about each offer (duration, type, etc.)
profile.json — demographic data for each customer
transcript.json — records for transactions, offers received, offers viewed, and offers completed

Here is the schema and explanation of each variable in the files:

portfolio.json

id (string) — offer id
offer_type (string) — type of offer ie BOGO, discount, informational
difficulty (int) — minimum required spend to complete an offer
reward (int) — reward given for completing an offer
duration (int) — time for offer to be open, in days
channels (list of strings)

profile.json

age (int) — age of the customer
became_member_on (int) — date when customer created an app account
gender (str) — gender of the customer (note some entries contain ‘O’ for other rather than M or F)
id (str) — customer id
income (float) — customer’s income

transcript.json

event (str) — record description (ie transaction, offer received, offer viewed, etc.)
person (str) — customer id
time (int) — time in hours since start of test. The data begins at time t=0
value — (dict of strings) — either an offer id or transaction amount depending on the record

Data Exploration:

To analyze the problem in a better way, we will explore the dataset by checking for missing values, visualizing the data distribution etc. This way we will have better understanding of the data and featurize data to make it ready for modeling.

We can see there are no nulls in portfolio dataset. We would need to one-hot encode channels column. We will also rename id to offer_id.

It has given 118 as dummy value for age column wherever its Null. On further analysis, we observed that all such records have no gender and income. So we will drop rows with no gender and no income.

Income and age distribution Views

Preprocessing of transcript dataset:

Convert time in hours to time in days

Process value column. Since value column contains multiple information. We will first perform some basic manipulation on dataset.

Data Preprocessing

We will first need to define ‘effective offer’ and then identify its main drivers. We also need to process data to merge events of each specific order sent so as to find out which offer was received, viewed or completed. We will create 4 groups for customers:

People who are influenced and successfully complete — effective offers
People who received and viewed an offer but did not successfully complete — ineffective offers
People who purchase/complete offers regardless of any awareness
People who received offers but no action taken

Target variable : ‘Effective Offer’ is our target variable i.e. group 1 customers. Group 2 customers are clearly not part of effective offer. Group 3 consists of users who may have received offers but would have purchased regardless. So, we won’t send offers to them. Group 4 is low-priority customers as anyway they are inactive.

Our data shows that offer_id is not associated with transaction. So, we will define a methodology to achieve the same.

BOGO and discount offers are where an effective offer is when offer is completed. But for informational offers, there is no offer completed step. Its still transaction only.

Next, we need to extract transactions which were completed. We can do this by comparing if the offer id before transaction is same as offer transaction id.

Our final processed dataset is ready.

Feature Engineering

became_member_on column was in date format. We convert it into days
No of offers received per person can have impact on offer being effective or not
Filter transactions which are not related to offer
Merge the datasets created above then drop missing values and create dummy variable for gender and split channel to categorical variable.

Model Building

Our target feature is effective_order. Since we have 3 different offer types, so we will have 3 models. We will compare performance of simple decision tree classifier model with an ensemble random forest classifier model. Reason for tree based model selection is that we want interpretability of the model. Also, it predicts whether customer will respond to different types of offer or not. We also selected Random Forest as an alternate model to compare with base model. We will ensure dataset is balanced. Since the classes for all three models are imbalanced, I decided to implement both accuracy and F1 score as model evaluation metric. F1 score provides better sense of model performance compared to purely accuracy as it takes both false positives and false negatives in calculation. F1 gives more weightage to true positive.

BOGO Model

Results of RF seems better than Decision Tree

2) Discount Offer Model

RF is better than DT

3) Informational Offer Model

We will use GridSearchCV to find optimal parameters for tuning the model

Conclusion

This project was challenging due to transcript dataset. Started with 2 questions:

What are main features influencing effectiveness of an offer

Can the data predict whether user can take up offer?

For BOGO & discount offers, distribution of feature importances were relatively equal. For informational offers, distribution is slightly more balanced.

Three separate models to predict the effectiveness of each offer type ended up with good accuracy for BOGO and discount models and slightly less for informational offers.

Github link: https://github.com/chetnashahi/Starbucks-Project/tree/master

Improvements

I can try doing more model performance tuning and see if any new features can improve the performance. We can also do some unsupervised learning on clustering the customers.

Starbucks Offer Analysis

Written by Chetna Shahi

No responses yet