Lab Objective
Analyze customer data to identify churn patterns and build a predictive model. You will write SQL queries, perform data analysis, and create visualizations to support business decisions.
Configure Dataset Settings
Configure these EXACT values in the Dataset Configuration panel:
- 1. Select Dataset:
E-Commerce Transactions (500K rows)
- 2. Date Range: Start:
2024-01-01 | End: 2024-12-31
- 3. Sample Size:
50000
- 4. Aggregation Level:
Monthly
- 5. Customer Segment:
Churned Customers (90+ days inactive)
Pro Tip
Monthly aggregation helps identify seasonal churn patterns while keeping the dataset manageable.
Configure Analysis Settings
Set these EXACT values in the Analysis Settings panel:
- 6. Churn Threshold (days):
90
- 7. RFM Recency Weight:
0.4
- 8. RFM Frequency Weight:
0.3
- 9. RFM Monetary Weight:
0.3
- 10. Statistical Test:
Chi-Square Test
Note
RFM weights must sum to 1.0 (0.4 + 0.3 + 0.3 = 1.0). Recency gets higher weight because recent inactivity is the strongest churn predictor.
Run SQL Query in Terminal
Copy and paste this EXACT SQL query into the SQL terminal:
SELECT c.customer_id, c.customer_name, c.signup_date,
COUNT(o.order_id) as total_orders,
SUM(o.amount) as lifetime_value,
DATEDIFF(NOW(), MAX(o.order_date)) as days_since_order,
CASE WHEN DATEDIFF(NOW(), MAX(o.order_date)) > 90
THEN 'High' ELSE 'Low' END as churn_risk
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name, c.signup_date
ORDER BY churn_risk DESC, days_since_order DESC;
Click query to copy
Then click "Run Query". The query must contain at least 5 of these 7 keywords: SELECT, FROM, customers, orders, JOIN, GROUP BY, churn
Important
The terminal strictly validates your query. Invalid queries will show an error with missing keywords. Copy the query exactly as shown above.
Configure ML & Visualization Settings
Set these EXACT values in the ML Model Settings and Visualization panels:
- 11. Prediction Model:
Random Forest
- 12. Train/Test Split:
80
- 13. Confidence Level:
95%
- 14. Cross-Validation Folds:
5
- 15. Chart Type:
Bar Chart
- 16. Color Scheme:
Viridis
- 17. Output Format:
CSV Export
- ☑️ Show Grid Lines - Checked
- ☑️ Show Legend - Checked
- ☑️ Show Confidence Intervals - Checked
Pro Tip
Random Forest with 5-fold cross-validation provides robust churn predictions while avoiding overfitting.
Submit Analysis
Review your findings and click "Submit Analysis" to complete the lab. Your analysis should identify at-risk customers for the retention campaign.
Hint
A good analysis identifies both high-value churning customers (prioritize) and patterns that predict churn early.