LEVEL 2: SQL + PYTHON ANALYSIS CASE STUDIES (Medium)
🧠 Yahan se SQL + Python dono use karoge. Interview mein typically bolte hain: "Describe your approach and write key code snippets." Poora running code expected nahi hota, but approach crystal clear hona chahiye.
Case Study 4: Customer Segmentation (RFM Analysis)
Common in CPG, Retail, and E-Commerce — DecisionTree's top 3 industries
The Problem
"An e-commerce client wants to segment their 50,000 customers to run targeted marketing campaigns. How would you approach this?"
Step 1: Extract Data with SQL
-- Pull the raw RFM data from the database
SELECT
customer_id,
MAX(order_date) AS last_order_date,
COUNT(DISTINCT order_id) AS frequency,
SUM(total_amount) AS monetary
FROM orders
WHERE status = 'completed'
AND order_date >= CURRENT_DATE - INTERVAL '12 months'
GROUP BY customer_id;
Step 2: Calculate RFM Scores in Python
import pandas as pd
import numpy as np
# Load the SQL output
rfm = pd.read_sql(query, connection) # or pd.read_csv('rfm_data.csv')
# Calculate Recency (days since last order)
reference_date = rfm['last_order_date'].max() + pd.Timedelta(days=1)
rfm['recency'] = (reference_date - rfm['last_order_date']).dt.days
# Assign quintile scores (1-5)
# Recency: LOWER is better → label 5,4,3,2,1
rfm['R'] = pd.qcut(rfm['recency'], q=5, labels=[5, 4, 3, 2, 1])
# Frequency: HIGHER is better
rfm['F'] = pd.qcut(rfm['frequency'].rank(method='first'), q=5, labels=[1, 2, 3, 4, 5])
# Monetary: HIGHER is better
rfm['M'] = pd.qcut(rfm['monetary'].rank(method='first'), q=5, labels=[1, 2, 3, 4, 5])
# Create combined score
rfm['RFM_score'] = rfm[['R', 'F', 'M']].astype(int).sum(axis=1)
🧠 RFM framework ratt lo — R = Recency (kab aakhri baar aaya), F = Frequency (kitni baar aaya), M = Monetary (kitna paisa kharch kiya). Har customer ko 3 scores do, phir segments banao.
Step 3: Create Business Segments
def segment_customer(row):
r, f, m = int(row['R']), int(row['F']), int(row['M'])
if r >= 4 and f >= 4 and m >= 4: return 'Champions'
elif r >= 3 and f >= 3 and m >= 3: return 'Loyal Customers'
elif r >= 4 and f <= 2: return 'New Customers'
elif r <= 2 and f >= 3 and m >= 3: return 'At Risk'
elif r <= 2 and f >= 4 and m >= 4: return "Can't Lose Them"
elif r <= 2 and f <= 2 and m <= 2: return 'Lost'
else: return 'Potential Loyalists'
rfm['segment'] = rfm.apply(segment_customer, axis=1)
print(rfm['segment'].value_counts())
Step 4: Marketing Actions Per Segment
| Segment | Who Are They | Action |
|---|---|---|
| Champions | Best customers — buy often, spend a lot, recent | Reward programs, early access, referral rewards |
| Loyal | Regular buyers | Upsell, loyalty tiers |
| New Customers | Recently joined, few orders | Welcome series, first-purchase incentive |
| At Risk | Were active, now going silent | Win-back campaign, "We miss you" email with discount |
| Can't Lose | Former VIPs who stopped | Urgent outreach, personal call from account manager |
| Lost | Long gone | Low-cost reactivation or remove from active marketing |
🧠 Interview mein yeh table dekhao aur bolo: "Each segment gets a DIFFERENT marketing strategy — we don't waste premium outreach on Lost customers, and we don't ignore Champions who drive most of our revenue." This is connecting data to business decisions — exactly what DecisionTree does.
Case Study 5: Sales Dashboard Design
Based on DecisionTree's "Visualization & Business Insights" capability
The Problem
"A multi-brand packaging distributor needs a unified dashboard showing sales, finance, and supply chain KPIs. Currently, each team uses separate Excel files."
Step 1: Identify KPIs Per Stakeholder
| Stakeholder | KPIs They Need |
|---|---|
| CEO | Total Revenue, YoY Growth, Top/Bottom Products, Profit Margin |
| Sales Head | Revenue by Region, Salesperson Performance, Pipeline Value |
| Finance | Gross Margin, EBITDA, Accounts Receivable Days, Cash Flow |
| Supply Chain | Inventory Turnover, Out-of-Stock Rate, Lead Time, Fill Rate |
🧠 Different stakeholders ko different data chahiye. CEO ko poora picture chahiye, Sales Head ko region-wise breakdown, Finance ko margins. Dashboard mein sab ke liye sections rakhne chahiye.
Step 2: Dashboard Layout Design
┌──────────────────────────────────────────────────────┐
│ HEADER: Company Name | Date Range Filter | Region │
├───────────┬───────────┬───────────┬──────────────────┤
│ Revenue │ Growth │ Margin │ Orders Today │
│ ₹45.2Cr │ +12.3% │ 28.5% │ 347 │
│ SCORECARD │ SCORECARD │ SCORECARD │ SCORECARD │
├───────────┴───────────┴───────────┴──────────────────┤
│ │
│ [LINE CHART: Monthly Revenue Trend — 12 months] │
│ │
├──────────────────────────┬────────────────────────────┤
│ │ │
│ [BAR: Revenue by Region]│ [PIE: Revenue by Product] │
│ │ │
├──────────────────────────┴────────────────────────────┤
│ │
│ [TABLE: Top 10 Customers by Revenue — with trend] │
│ │
└───────────────────────────────────────────────────────┘
Step 3: Key SQL for Dashboard Backend
-- KPI 1: Total Revenue with YoY comparison
SELECT
SUM(CASE WHEN EXTRACT(YEAR FROM order_date) = 2024 THEN amount END) AS current_year,
SUM(CASE WHEN EXTRACT(YEAR FROM order_date) = 2023 THEN amount END) AS previous_year,
ROUND(
(SUM(CASE WHEN EXTRACT(YEAR FROM order_date) = 2024 THEN amount END) -
SUM(CASE WHEN EXTRACT(YEAR FROM order_date) = 2023 THEN amount END)) * 100.0 /
NULLIF(SUM(CASE WHEN EXTRACT(YEAR FROM order_date) = 2023 THEN amount END), 0),
2
) AS yoy_growth_pct
FROM orders WHERE status = 'completed';
-- KPI 2: Revenue by Region (for bar chart)
SELECT region, SUM(amount) AS revenue
FROM orders o JOIN customers c ON o.customer_id = c.customer_id
WHERE o.status = 'completed' AND o.order_date >= '2024-01-01'
GROUP BY region ORDER BY revenue DESC;
Step 4: Key Design Decisions
| Decision | Recommendation | Why |
|---|---|---|
| Update frequency | Daily refresh (scheduled at 6 AM) | Balance between freshness and DB load |
| Connection type | Extract (not live) | Dashboard loads in < 3 seconds |
| Interactivity | Region, Date Range, Product filters | Users can drill into their area |
| Mobile-friendly | Yes — responsive layout | Executives check dashboards on phones |