Skip to main content

Interview Questions

ve 4x higher churn risk.' I'd use a simple decision tree visualization if possible."


9. Practice Problems with Solutions

Problem 1: Gini Calculation

Q: A node has 100 samples: 60 Class A, 25 Class B, 15 Class C. Calculate the Gini Index.

Solution:

p(A) = 60/100 = 0.60
p(B) = 25/100 = 0.25
p(C) = 15/100 = 0.15

Gini = 1 - (0.60² + 0.25² + 0.15²)
= 1 - (0.36 + 0.0625 + 0.0225)
= 1 - 0.445
= 0.555 ✅

This is moderately impure — the node is not dominated
by a single class, so further splitting may help.

Problem 2: Confusion Matrix Metrics

Q: A model produces these results on 500 test samples:

  • TP = 80, FP = 20, FN = 30, TN = 370 Calculate Accuracy, Precision, Recall, and F1 Score.

Solution:

Accuracy  = (80+370)/500 = 450/500 = 90.0%
Precision = 80/(80+20) = 80/100 = 80.0%
Recall = 80/(80+30) = 80/110 = 72.7%
F1 Score = 2×(0.80×0.727)/(0.80+0.727) = 1.163/1.527 = 76.2%

Key Insight: Accuracy looks great at 90%, but recall is only 73% —
the model misses 27% of positive cases. If these are fraud cases
or disease diagnoses, 30 missed cases could be costly.

Problem 3: Which Model to Choose?

Q: You're predicting customer churn for a telecom. You have two models:

  • Model A: Accuracy 92%, Precision 85%, Recall 60%
  • Model B: Accuracy 88%, Precision 70%, Recall 85% Which do you recommend and why?

Solution:

For churn prediction, MISSING a churner (FN) is expensive:
- A missed churner leaves → lost revenue (₹5000+ per customer)
- A false alarm (FP) only costs a retention call (₹200)

Model B is better because:
- Recall = 85% → catches 85% of churners (vs only 60% in A)
- Even though precision is lower (more false alarms),
retention calls are cheap compared to losing customers

Model A looks better by accuracy, but accuracy is misleading
when classes are imbalanced (usually only 10-20% churn).

F1 scores: Model A = 70.6%, Model B = 76.9% → Confirms B is better ✅

Problem 4: Overfitting Diagnosis

Q: Your decision tree gives these results:

  • Training accuracy: 99.5%
  • Test accuracy: 72.3% What's happening and how would you fix it?

Solution:

Diagnosis: OVERFITTING
The 27.2% gap between train and test accuracy indicates
the model memorized training data (including noise).

Fixes (in priority order):
1. Prune the tree: Set max_depth=5, min_samples_leaf=20
2. Use Random Forest: Ensemble reduces variance
3. Cross-validation: 5-fold CV to tune hyperparameters
4. Get more data: More training examples reduce overfitting
5. Feature selection: Remove noisy irrelevant features

After pruning (max_depth=5):
- Training: 85% (lower, but that's expected)
- Test: 83% (much better generalization!) ✅

Problem 5: Feature Importance Interpretation

Q: A churn prediction model reports these feature importances:

  • Monthly Spend: 0.35
  • Contract Length: 0.28
  • Support Calls: 0.22
  • Tenure: 0.10
  • Age: 0.05 What business actions would you recommend?

Solution:

Top 3 drivers explain 85% of churn predictions:

1. Monthly Spend (35%): High spenders who churn = biggest revenue loss
→ Action: Create loyalty benefits for high-spend customers
→ Priority: Premium retention packages

2. Contract Length (28%): Short contracts = high churn risk
→ Action: Incentivize longer contracts (discounts for 12+ months)
→ Priority: Offer renewal bonuses at contract-end

3. Support Calls (22%): Frustrated customers leave
→ Action: Proactive outreach to customers with >3 calls/month
→ Priority: Implement escalation process for repeat callers

Age (5%) is nearly irrelevant — don't waste resources
segmenting by age for churn prevention.