Once the Predictive Score definition is set up, an AI model is trained on historical data to learn behavioral rules, which separate converting users vs non-converting users. We automatically derive user behaviors from user clickstream events and campaign engagements. Specifically, we derive hundreds of behaviors such as recency, frequency, time spent, catalog affinities (category affinity, brand affinity, product attribute affinity), etc for each event. These behaviors, along with user attributes, are referred to as features/variables and are fed into the AI model. The model then learns an optimal combination of features, which leads to conversion and a scoring function is learned.
We use a white box approach in building an AI model, which can be interpreted for insights and performance. We present the following reports in the dashboard, where each report is aimed at different aspects of the model.
- Analyze and Chart Scores
- How predictive are my scores for each probability bucket from 0 to 100
- Intended to be used by all dashboard users to slice and dice predicted probability scores vs the observed conversion rates
- Model accuracy
- What data is used for training, testing and standard model accuracy metrics
- Intended to be used by Data Analysts and Data scientists to judge model accuracy
- Which features (derived behaviors and user attributes) are most meaningful in predicting the goal events?
- Intended to be used by all dashboard users to understand what features/variables contribute the most to prediction
Predictive Scores are likelihood scores between 0-100, where lower and higher score represents lower and higher chances of conversion respectively. The expectation from the ideal model is that a group of users with score X convert at X% rate i.e conversion rate for users with score 0 is 0%, score 1 is 1%, …, score 100 is 100%.
To analyze the performance, we test the model on past unseen users (users intentionally set aside for testing). We score these users, group them by score deciles (0-10, 10-20, 20-30, ...), compare the score deciles with their actual conversion rate. A good model will have a conversion rate for each decile close enough (+-10) within the decile boundaries. If that’s not the case, we might need more data or likely model has been overfit. Sample model data is shown below.
We present more advanced stats on the model for our data science users. We show the size of the training and validation dataset, AUC (Area Under the RoC Curve) and log loss for the underlying AI model. AUCs above 0.65 are considered good, while AUC of 0.65 or below is indicative of low predictive accuracy. The below chart shows data for Training and Holdout dataset. Training data is used to build a model, whereas model is tested on holdout dataset to create performance reports.
A ranked list of feature/variable importance is also shown, where the top-ranked features are most used for decision making in the AI model. The length of the bar shows the relative strength of the feature.