Interview Questions
ve 4x higher churn risk.' I'd use a simple decision tree visualization if possible."
9. Practice Problems with Solutions
Problem 1: Gini Calculation
Q: A node has 100 samples: 60 Class A, 25 Class B, 15 Class C. Calculate the Gini Index.
Solution:
p(A) = 60/100 = 0.60
p(B) = 25/100 = 0.25
p(C) = 15/100 = 0.15
Gini = 1 - (0.60² + 0.25² + 0.15²)
= 1 - (0.36 + 0.0625 + 0.0225)
= 1 - 0.445
= 0.555 ✅
This is moderately impure — the node is not dominated
by a single class, so further splitting may help.
Problem 2: Confusion Matrix Metrics
Q: A model produces these results on 500 test samples:
- TP = 80, FP = 20, FN = 30, TN = 370 Calculate Accuracy, Precision, Recall, and F1 Score.
Solution:
Accuracy = (80+370)/500 = 450/500 = 90.0%
Precision = 80/(80+20) = 80/100 = 80.0%
Recall = 80/(80+30) = 80/110 = 72.7%
F1 Score = 2×(0.80×0.727)/(0.80+0.727) = 1.163/1.527 = 76.2%
Key Insight: Accuracy looks great at 90%, but recall is only 73% —
the model misses 27% of positive cases. If these are fraud cases
or disease diagnoses, 30 missed cases could be costly.
Problem 3: Which Model to Choose?
Q: You're predicting customer churn for a telecom. You have two models:
- Model A: Accuracy 92%, Precision 85%, Recall 60%
- Model B: Accuracy 88%, Precision 70%, Recall 85% Which do you recommend and why?
Solution:
For churn prediction, MISSING a churner (FN) is expensive:
- A missed churner leaves → lost revenue (₹5000+ per customer)
- A false alarm (FP) only costs a retention call (₹200)
Model B is better because:
- Recall = 85% → catches 85% of churners (vs only 60% in A)
- Even though precision is lower (more false alarms),
retention calls are cheap compared to losing customers
Model A looks better by accuracy, but accuracy is misleading
when classes are imbalanced (usually only 10-20% churn).
F1 scores: Model A = 70.6%, Model B = 76.9% → Confirms B is better ✅
Problem 4: Overfitting Diagnosis
Q: Your decision tree gives these results:
- Training accuracy: 99.5%
- Test accuracy: 72.3% What's happening and how would you fix it?
Solution:
Diagnosis: OVERFITTING
The 27.2% gap between train and test accuracy indicates
the model memorized training data (including noise).
Fixes (in priority order):
1. Prune the tree: Set max_depth=5, min_samples_leaf=20
2. Use Random Forest: Ensemble reduces variance
3. Cross-validation: 5-fold CV to tune hyperparameters
4. Get more data: More training examples reduce overfitting
5. Feature selection: Remove noisy irrelevant features
After pruning (max_depth=5):
- Training: 85% (lower, but that's expected)
- Test: 83% (much better generalization!) ✅
Problem 5: Feature Importance Interpretation
Q: A churn prediction model reports these feature importances:
- Monthly Spend: 0.35
- Contract Length: 0.28
- Support Calls: 0.22
- Tenure: 0.10
- Age: 0.05 What business actions would you recommend?
Solution:
Top 3 drivers explain 85% of churn predictions:
1. Monthly Spend (35%): High spenders who churn = biggest revenue loss
→ Action: Create loyalty benefits for high-spend customers
→ Priority: Premium retention packages
2. Contract Length (28%): Short contracts = high churn risk
→ Action: Incentivize longer contracts (discounts for 12+ months)
→ Priority: Offer renewal bonuses at contract-end
3. Support Calls (22%): Frustrated customers leave
→ Action: Proactive outreach to customers with >3 calls/month
→ Priority: Implement escalation process for repeat callers
Age (5%) is nearly irrelevant — don't waste resources
segmenting by age for churn prevention.