Analyze Starbucks App Customer Data using Python/pandas

Credits to: Ye Joo Park, Sandip Sonawane from the University of Illinois Urbana-Champaign

<aside> 📌 In this data project, the main goal is to dig deep into the demographics of Starbucks App users and examine the effects of various loyalty offers that have been employed previously.

</aside>

The datasets used in this project will be directly from Starbucks(originally were provided to Udacity).

Transcript: a list of all purchases (transactions) and events related to loyalty offers
Profile: demographics data for each customer in the Rewards app; customers who have not provided their demographic information will show up as np.nan

Load pandas, NumPy, and datasets

import pandas as pd
import numpy as np
df_transcript = pd.read_csv('<https://github.com/bdi475/datasets/raw/main/starbucks-rewards/transcript.v2.csv.gz>')
df_profiles = pd.read_csv('<https://github.com/bdi475/datasets/raw/main/starbucks-rewards/profile.csv>')
df_transcript_backup = df_transcript.copy()
df_profiles_backup = df_profiles.copy()
df_transcript.tail(10)
df_profiles.head(10)

visualize transcript and profiles datasets:

Untitled

Find number of unique event type

unique_events = df_transcript['event'].unique()
print(f'Event types: {unique_events}')

Event types: ['offer received' 'offer viewed' 'transaction' 'offer completed']

Cleaning the datasets

#checking the num of missing values in our data
num_rows = df_profiles.shape[0]
num_cols = df_profiles.shape[1]
num_missing = df_profiles['gender'].isna().sum()

#remove missing values and unused columns
df_profiles = df_profiles[df_profiles['gender'].notna()]
df_transactions.drop(columns=['event','time','offer_id'],inplace=True)

Load pandas, NumPy, and datasets

Find number of unique event type

Cleaning the datasets

Merge profiles dataset into transactions dataset