Decision Trees ⭐

2. Decision Trees — The Core Algorithm 🌳

A Decision Tree is a supervised learning algorithm that recursively splits data based on feature values to create a flowchart-like structure for classification or regression.

2.1 How It Works

The tree asks a series of questions, splitting data into increasingly pure groups:

                    Is Monthly Spend > ₹5000?
                    /                    \
                 YES                      NO
                /                          \
    Contract > 12 months?               Support Calls > 4?
    /              \                    /          \
   YES              NO               YES            NO
   /                \                 /              \
Will STAY ✅     Will CHURN ❌    Will CHURN ❌   Will STAY ✅
(High spender,   (Short contract   (Unhappy         (Low spender,
long contract)   = churn risk)     customer)         but satisfied)

Structure:

Root Node: Topmost node — represents the entire dataset
Decision Nodes: Where data splits based on a feature
Leaf Nodes: Final output — the predicted class or value
Branch: The outcome of a split (e.g., "Yes" or "No")
Depth: Number of edges from root to the deepest leaf

2.2 Splitting Criteria — How the Tree Decides Which Question to Ask

The tree selects the feature that produces the purest split — meaning the resulting groups are as homogeneous as possible.

Impurity

Impurity measures how mixed a group is. A bucket with only red balls = pure (impurity = 0). A bucket with 50% red, 50% blue = maximum impurity.

The tree's goal: each split should reduce impurity as much as possible.

Gini Index

Formula: Gini = 1 - Σ(pᵢ²) where pᵢ = proportion of class i

Worked Example — Complete Gini Calculation:

Dataset: 20 customers deciding between Churn/Stay. Feature: "Has Complaint" (Yes/No).

Overall: 12 Stay, 8 Churn → Gini(parent) = 1 - (0.6² + 0.4²) = 0.48

Split by "Has Complaint":
  Complaint = Yes (10 customers): 3 Stay, 7 Churn
    Gini(Yes) = 1 - (0.3² + 0.7²) = 1 - (0.09 + 0.49) = 0.42

  Complaint = No (10 customers): 9 Stay, 1 Churn
    Gini(No) = 1 - (0.9² + 0.1²) = 1 - (0.81 + 0.01) = 0.18

Weighted Gini = (10/20)×0.42 + (10/20)×0.18 = 0.21 + 0.09 = 0.30

Gini Gain = 0.48 - 0.30 = 0.18 ← This feature reduces impurity ✅

Gini Value	Meaning
0	Perfectly pure (all one class)
0.5	Maximum impurity for binary classification

Entropy & Information Gain

Entropy Formula: E = -Σ pᵢ × log₂(pᵢ)

Information Gain = Entropy(parent) - Weighted Average Entropy(children)

The feature with the highest Information Gain is chosen for the split.

Worked Example:

Before split: 100 customers (50 churn, 50 stay)
Entropy = -0.5×log₂(0.5) - 0.5×log₂(0.5) = 1.0 (maximum uncertainty)

Split by "Contract Length > 12 months":
  Left child:  60 customers (10 churn, 50 stay) → Entropy = 0.65
  Right child: 40 customers (40 churn, 0 stay)  → Entropy = 0.0 (PURE!)

Weighted Entropy = (60/100)×0.65 + (40/100)×0.0 = 0.39
Information Gain = 1.0 - 0.39 = 0.61 → HIGH gain = great feature ✅

🧠 "Gini vs Entropy kya fark hai?" Both measure impurity. Gini is computationally faster, Entropy is more theoretically grounded. In practice, results are very similar. Scikit-learn uses Gini by default.

2.3 Overfitting — The Biggest Problem

Overfitting occurs when a tree becomes too complex — it memorizes the training data (including noise) and performs poorly on new, unseen data.

Analogy: A student who memorizes past exam papers word-for-word. They score 100% on those exact questions but fail on any new question.

Signs: Very high training accuracy but poor test accuracy. Deep tree with hundreds of leaves.

Solution: Pruning

Type	What It Does	Parameters
Pre-pruning	Stops the tree from growing too deep	`max_depth`, `min_samples_split`, `min_samples_leaf`
Post-pruning	Grows the full tree, then removes unhelpful branches	`ccp_alpha` parameter

from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier(
    max_depth=5,              # Tree won't go deeper than 5 levels
    min_samples_split=20,     # A node needs ≥20 samples to split
    min_samples_leaf=10,      # Each leaf must have ≥10 samples
    random_state=42
)
model.fit(X_train, y_train)

🧠 Key parameters ratt lo: max_depth = kitna deep, min_samples_split = minimum samples to split, min_samples_leaf = minimum samples in leaf. In sab se tree ko "behave" karte hain.

2.4 Complete Scikit-learn Pipeline

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report, confusion_matrix

# 1. Load and prepare data
df = pd.read_csv('customer_churn.csv')
X = df[['monthly_spend', 'contract_length', 'support_calls', 'tenure']]
y = df['churned']  # 1 = Churned, 0 = Stayed

# 2. Split data (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# 3. Build and train model
model = DecisionTreeClassifier(max_depth=4, random_state=42)
model.fit(X_train, y_train)

# 4. Predict and evaluate
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))

# 5. Feature importance
for name, imp in zip(X.columns, model.feature_importances_):
    print(f"{name}: {imp:.3f}")

2. Decision Trees — The Core Algorithm 🌳​

2.1 How It Works​

2.2 Splitting Criteria — How the Tree Decides Which Question to Ask​

Impurity​

Gini Index​

Entropy & Information Gain​

2.3 Overfitting — The Biggest Problem​

2.4 Complete Scikit-learn Pipeline​