📗 Excel for Data Analysts — Complete Guide
From Scratch for Fresher Data Analyst at DecisionTree Analytics
Why Excel matters: Excel is still the #1 tool in analytics consulting. At DecisionTree, you'll receive raw data in Excel, do quick ad-hoc analysis in Excel, and often hand over workbooks to clients. Interviewers will test your fluency.
1. VLOOKUP vs INDEX-MATCH — The Most Asked Excel Question
VLOOKUP
Syntax: =VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup])
Example: Find the price of "Laptop" from a product table
=VLOOKUP("Laptop", A2:C10, 3, FALSE)
A2:C10 → Table containing Product Name, Category, Price
3 → Return the 3rd column (Price)
FALSE → Exact match (ALWAYS use FALSE in interviews!)
Limitations of VLOOKUP:
- ❌ Can only look right — lookup column must be the leftmost
- ❌ Adding/deleting columns breaks the column index number
- ❌ Slower on large datasets
- ❌ Cannot return values from a column to the LEFT of the lookup column
INDEX-MATCH (The Superior Alternative)
Syntax: =INDEX(return_range, MATCH(lookup_value, lookup_range, 0))
Example: Same lookup — find price of "Laptop"
=INDEX(C2:C10, MATCH("Laptop", A2:A10, 0))
MATCH("Laptop", A2:A10, 0) → Finds the row position of "Laptop"
INDEX(C2:C10, position) → Returns the value from that row in column C
Why INDEX-MATCH is better:
- ✅ Can look in ANY direction (left, right, any column)
- ✅ Column insertions/deletions don't break it
- ✅ Faster on large datasets
- ✅ More flexible — can be used with multiple criteria
XLOOKUP (Excel 365 / 2021+)
=XLOOKUP("Laptop", A2:A10, C2:C10, "Not Found")
Simplest syntax, handles errors natively, looks in any direction.
Use when available, but know VLOOKUP/INDEX-MATCH for compatibility.
🧠 Interview mein bolo: "I prefer INDEX-MATCH over VLOOKUP because it's not limited to left-to-right lookups and doesn't break when columns are rearranged. But I'm comfortable with all three — VLOOKUP, INDEX-MATCH, and XLOOKUP."
Comparison Table
| Feature | VLOOKUP | INDEX-MATCH | XLOOKUP |
|---|---|---|---|
| Direction | Right only | Any direction | Any direction |
| Speed | Slower | Faster | Fastest |
| Column changes | Breaks | Resilient | Resilient |
| Error handling | Needs IFERROR wrapper | Needs IFERROR wrapper | Built-in |
| Availability | All versions | All versions | Excel 365+ only |
2. Pivot Tables — The Analyst's Swiss Army Knife
A Pivot Table summarizes large datasets into meaningful insights with drag-and-drop simplicity.
How to Create
1. Select your data range (Ctrl+A if in a table)
2. Insert → Pivot Table → New Worksheet
3. Drag fields:
- ROWS: What you want to group by (e.g., Region, Product)
- COLUMNS: Secondary grouping (e.g., Month, Year)
- VALUES: What to measure (e.g., SUM of Revenue, COUNT of Orders)
- FILTERS: Overall filter (e.g., Year = 2024)
Common Pivot Table Operations
| Operation | How | Example |
|---|---|---|
| Group dates | Right-click date → Group → Months/Quarters | See monthly trends |
| Calculated field | Analyze → Fields, Items → Calculated Field | Profit = Revenue - Cost |
| Show as % of total | Value Field Settings → Show Values As → % of Grand Total | Category contribution |
| Sort | Right-click value → Sort → Largest to Smallest | Top products by revenue |
| Slicers | Insert → Slicer → Select field | Interactive dashboard filter |
| Drill down | Double-click on any value cell | See underlying rows |
Pivot Table Interview Trick
Scenario: "Given sales data, show me the Top 5 products by revenue for each region."
Steps:
1. Create Pivot: Rows = Region + Product, Values = SUM(Revenue)
2. Right-click any product → Filter → Top 10
3. Set: Top 5, by Sum of Revenue
4. Now each region shows only its top 5 products ✅
🧠 Pivot Table = 2 minute mein analysis. Interview mein agar Excel question aaye, pehle Pivot suggest karo. Interviewer ko lagega expert hai.
3. Essential Formulas — Must Know
Conditional Logic
=IF(A1>100, "High", "Low")
=IF(A1>100, "High", IF(A1>50, "Medium", "Low")) -- Nested IF
=IFS(A1>100, "High", A1>50, "Medium", TRUE, "Low") -- Cleaner (365+)
Counting & Summing with Conditions
| Formula | What It Does | Example |
|---|---|---|
COUNTIF | Count cells matching ONE condition | =COUNTIF(B:B, "Delhi") |
COUNTIFS | Count cells matching MULTIPLE conditions | =COUNTIFS(B:B, "Delhi", C:C, ">50000") |
SUMIF | Sum values matching ONE condition | =SUMIF(B:B, "Delhi", D:D) |
SUMIFS | Sum values matching MULTIPLE conditions | =SUMIFS(D:D, B:B, "Delhi", C:C, "Technology") |
AVERAGEIF | Average matching a condition | =AVERAGEIF(B:B, "Delhi", D:D) |
Text Functions
| Formula | What It Does | Example |
|---|---|---|
LEFT(A1, 3) | First 3 characters | "Delhi" → "Del" |
RIGHT(A1, 2) | Last 2 characters | "Delhi" → "hi" |
MID(A1, 2, 3) | 3 chars starting from position 2 | "Delhi" → "elh" |
LEN(A1) | Length of text | "Delhi" → 5 |
TRIM(A1) | Remove extra spaces | " Delhi " → "Delhi" |
CONCATENATE(A1, " ", B1) | Join text | "First" & "Last" |
TEXT(A1, "MMM-YY") | Format numbers/dates as text | 45000 → "Jan-23" |
UPPER/LOWER/PROPER | Change case | "delhi" → "DELHI" / "Delhi" |
Date Functions
=TODAY() -- Current date
=YEAR(A1) -- Extract year
=MONTH(A1) -- Extract month
=DATEDIF(A1, B1, "M") -- Months between two dates
=EOMONTH(A1, 0) -- End of current month
=NETWORKDAYS(A1, B1) -- Working days between dates (excludes weekends)
4. Data Cleaning in Excel
| Problem | Solution |
|---|---|
| Extra spaces | =TRIM(A1) |
| Inconsistent case | =PROPER(A1) or =UPPER(A1) |
| Remove duplicates | Data → Remove Duplicates |
| Find blanks | Ctrl+G → Special → Blanks |
| Convert text to numbers | Data → Text to Columns → Finish |
| Split full name | Data → Text to Columns → Delimited → Space |
| Flash Fill (pattern-based) | Type pattern in first cell → Ctrl+E |
5. Conditional Formatting
Highlight cells > 1000: Home → Conditional Formatting → Highlight Rules
Color scales (heatmap effect): Home → Conditional Formatting → Color Scales
Data bars (in-cell bar chart): Home → Conditional Formatting → Data Bars
Top/Bottom rules: Home → Conditional Formatting → Top 10%
🧠 Quick tip: Color scales on a Pivot Table turn it into a heatmap instantly — great for spotting patterns.
6. Keyboard Shortcuts (Interviewers Notice Speed)
| Shortcut | Action |
|---|---|
Ctrl+Shift+L | Toggle filters on/off |
Ctrl+T | Convert range to Table |
Ctrl+; | Insert today's date |
Ctrl+Shift+; | Insert current time |
Ctrl+D | Fill down |
Ctrl+Shift+End | Select to last used cell |
Alt+= | Auto SUM |
F2 | Edit cell |
F4 | Toggle absolute reference ($A$1) |
Ctrl+`` | Show formulas instead of values |
Ctrl+Shift+~ | Apply General format |
Alt+Enter | New line within a cell |
7. Named Ranges & Tables
Named Range: Select cells → Name Box (left of formula bar) → Type name
Use: =SUMIF(RegionRange, "North", RevenueRange)
Excel Tables (Ctrl+T): Auto-expand, structured references
Use: =SUM(SalesTable[Revenue]) -- Much clearer than =SUM(D:D)
Benefits of Tables:
- Auto-expand when new rows are added
- Built-in filters and alternating row colors
- Structured references (
[@Revenue]instead ofD2) - Automatic totals row
8. Interview Questions (12 Questions)
Q1: "VLOOKUP vs INDEX-MATCH?"
Answer: "VLOOKUP searches the leftmost column and returns a value from a specified column to the right. INDEX-MATCH is more flexible — MATCH finds the row position, INDEX returns the value from any column. I prefer INDEX-MATCH because it works in any direction, is faster on large datasets, and doesn't break when columns are inserted or deleted."
Q2: "How would you handle 100,000 rows of messy data in Excel?"
Answer: "First, I'd convert the range to a Table (Ctrl+T) for structured references. Then: (1) Remove duplicates via Data tab. (2) Use TRIM and PROPER to clean text. (3) Text to Columns for splitting combined fields. (4) Conditional Formatting to spot outliers. (5) Pivot Table for summary analysis. If the dataset requires heavy transformation, I'd suggest moving to Power Query or Python."
Q3: "What are Pivot Tables and when would you use them?"
Answer: "Pivot Tables summarize large datasets by aggregating values across categories — like total revenue by region and product category. I use them for quick exploratory analysis: finding top customers, comparing monthly trends, spotting anomalies. The drag-and-drop interface makes it fast to iterate on different views without writing formulas."
Q4: "How do you find and remove duplicates?"
Answer: "Three methods: (1) Data → Remove Duplicates for quick removal. (2) COUNTIF to FLAG duplicates first: =COUNTIF($A$2:$A2, A2) — values > 1 are duplicates. (3) Conditional Formatting → Highlight Duplicates for visual identification before deciding what to remove."
Q5: "Explain absolute vs relative references."
Answer: "Relative references (A1) shift when you copy a formula — row and column adjust automatically. Absolute references ($A$1) stay fixed. Mixed references ($A1 or A$1) lock only the row or column. I use F4 to toggle between them. Common use: tax rate in a fixed cell — =B2*$C$1 so C1 doesn't shift when I drag down."
Q6: "What is a Pivot Chart?"
Answer: "A Pivot Chart is a visual representation of a Pivot Table — it updates automatically when the Pivot Table changes. I'd use it for quick dashboard-style visuals. However, for production dashboards, I'd move to Power BI or Tableau since Pivot Charts have limited formatting options."
Q7: "How do you handle errors like #N/A, #VALUE!, #DIV/0!?"
Answer: "Use IFERROR to wrap formulas: =IFERROR(VLOOKUP(...), 'Not Found'). For specific errors: #N/A usually means VLOOKUP didn't find a match — check for extra spaces with TRIM. #DIV/0! means division by zero — use =IF(B1=0, 0, A1/B1). #VALUE! means wrong data type — check if numbers are stored as text."
Q8: "What's the difference between COUNT, COUNTA, COUNTBLANK?"
Answer: "COUNT counts cells with numbers only. COUNTA counts all non-empty cells (numbers + text). COUNTBLANK counts empty cells. For counting specific values, I use COUNTIF. In interviews, I'd mention that COUNTA can be misleading if cells contain spaces — they look empty but aren't."
Q9: "How would you create a dropdown list in Excel?"
Answer: "Data → Data Validation → List → specify the source range or comma-separated values. I'd combine this with INDIRECT for dependent dropdowns — selecting 'North' in the first dropdown shows only North region cities in the second. This is useful for creating data entry forms."
Q10: "What are array formulas?"
Answer: "Array formulas perform calculations on multiple values simultaneously. In older Excel, they needed Ctrl+Shift+Enter. Modern Excel 365 has dynamic arrays that spill automatically. Example: =UNIQUE(A2:A100) returns all unique values. =SORT(FILTER(A2:C100, C2:C100>1000)) returns filtered and sorted results without helper columns."
Q11: "How do you protect a workbook vs a worksheet?"
Answer: "Worksheet protection (Review → Protect Sheet) prevents users from editing cells but allows viewing. I can selectively unlock cells for input. Workbook protection prevents adding/deleting/renaming sheets. For client deliverables, I'd protect the dashboard sheets but leave an 'Input' sheet editable for them to update parameters."
Q12: "When would you move from Excel to a BI tool like Power BI?"
Answer: "Three triggers: (1) Data exceeds 500K rows — Excel gets slow. (2) Multiple people need to view the same live report — Power BI Service handles sharing and auto-refresh. (3) Need for drill-through, row-level security, or complex DAX — Excel's analytical capabilities are limited. At DecisionTree, I'd start exploration in Excel and graduate to Power BI for the production deliverable."