Starbucks Offer Analysis
Dataset contains simulated data that mimics customer behavior on Starbucks rewards mobile app. Starbucks sends out to offers to mobile app users. Offer can be advertisement of the drink or an actual offer as discount or BOGO(buy one get one). Not all users receive same offer and some users might not receive any offer in the week. Along these lines, Starbucks can most likely built the possibility that customer opens the offer after they get it and finish the transaction.
Problem Statement
The goal of this project is to combine transaction, demographic and offer data to determine which demographic groups respond to which offer type. This goal can be achieved by following mentioned strategy:
- Exploring & visualizing data
2. Data pre-processing
3. Analysis on cleaned data
4. Scaling numerical features
5. Trying supervised learning models
6. Evaluate model performance and chose the best one
7. Use GriSearchCV to find optimal parameters for model training
Dataset Description
The data is contained in three files:
- portfolio.json — containing offer ids and meta data about each offer (duration, type, etc.)
- profile.json — demographic data for each customer
- transcript.json — records for transactions, offers received, offers viewed, and offers completed
Here is the schema and explanation of each variable in the files:
portfolio.json
- id (string) — offer id
- offer_type (string) — type of offer ie BOGO, discount, informational
- difficulty (int) — minimum required spend to complete an offer
- reward (int) — reward given for completing an offer
- duration (int) — time for offer to be open, in days
- channels (list of strings)
profile.json
- age (int) — age of the customer
- became_member_on (int) — date when customer created an app account
- gender (str) — gender of the customer (note some entries contain ‘O’ for other rather than M or F)
- id (str) — customer id
- income (float) — customer’s income
transcript.json
- event (str) — record description (ie transaction, offer received, offer viewed, etc.)
- person (str) — customer id
- time (int) — time in hours since start of test. The data begins at time t=0
- value — (dict of strings) — either an offer id or transaction amount depending on the record
Data Exploration:
To analyze the problem in a better way, we will explore the dataset by checking for missing values, visualizing the data distribution etc. This way we will have better understanding of the data and featurize data to make it ready for modeling.
We can see there are no nulls in portfolio dataset. We would need to one-hot encode channels column. We will also rename id to offer_id.
It has given 118 as dummy value for age column wherever its Null. On further analysis, we observed that all such records have no gender and income. So we will drop rows with no gender and no income.
Income and age distribution Views
Preprocessing of transcript dataset:
Convert time in hours to time in days
Process value column. Since value column contains multiple information. We will first perform some basic manipulation on dataset.
Data Preprocessing
We will first need to define ‘effective offer’ and then identify its main drivers. We also need to process data to merge events of each specific order sent so as to find out which offer was received, viewed or completed. We will create 4 groups for customers:
- People who are influenced and successfully complete — effective offers
- People who received and viewed an offer but did not successfully complete — ineffective offers
- People who purchase/complete offers regardless of any awareness
- People who received offers but no action taken
Target variable : ‘Effective Offer’ is our target variable i.e. group 1 customers. Group 2 customers are clearly not part of effective offer. Group 3 consists of users who may have received offers but would have purchased regardless. So, we won’t send offers to them. Group 4 is low-priority customers as anyway they are inactive.
Our data shows that offer_id is not associated with transaction. So, we will define a methodology to achieve the same.
BOGO and discount offers are where an effective offer is when offer is completed. But for informational offers, there is no offer completed step. Its still transaction only.
Next, we need to extract transactions which were completed. We can do this by comparing if the offer id before transaction is same as offer transaction id.
Our final processed dataset is ready.
Feature Engineering
- became_member_on column was in date format. We convert it into days
- No of offers received per person can have impact on offer being effective or not
- Filter transactions which are not related to offer
- Merge the datasets created above then drop missing values and create dummy variable for gender and split channel to categorical variable.
Model Building
Our target feature is effective_order. Since we have 3 different offer types, so we will have 3 models. We will compare performance of simple decision tree classifier model with an ensemble random forest classifier model. Reason for tree based model selection is that we want interpretability of the model. Also, it predicts whether customer will respond to different types of offer or not. We also selected Random Forest as an alternate model to compare with base model. We will ensure dataset is balanced. Since the classes for all three models are imbalanced, I decided to implement both accuracy and F1 score as model evaluation metric. F1 score provides better sense of model performance compared to purely accuracy as it takes both false positives and false negatives in calculation. F1 gives more weightage to true positive.
- BOGO Model
Results of RF seems better than Decision Tree
2) Discount Offer Model
RF is better than DT
3) Informational Offer Model
We will use GridSearchCV to find optimal parameters for tuning the model
Conclusion
This project was challenging due to transcript dataset. Started with 2 questions:
What are main features influencing effectiveness of an offer
Can the data predict whether user can take up offer?
For BOGO & discount offers, distribution of feature importances were relatively equal. For informational offers, distribution is slightly more balanced.
Three separate models to predict the effectiveness of each offer type ended up with good accuracy for BOGO and discount models and slightly less for informational offers.
Github link: https://github.com/chetnashahi/Starbucks-Project/tree/master
Improvements
I can try doing more model performance tuning and see if any new features can improve the performance. We can also do some unsupervised learning on clustering the customers.