The 4 Types of Data Analyst Interview Questions
- Technical Skills (SQL, Python, Statistics)
- Business Acumen (Understanding metrics, making decisions)
- Behavioral (Past experiences, teamwork)
- Case Studies (Live problem-solving)
SQL Interview Questions
Q1: What’s the difference between WHERE and HAVING?
Answer: - WHERE: Filters rows BEFORE grouping - HAVING: Filters groups AFTER aggregation
Example:
-- WHERE filters individual rows
SELECT department, AVG(salary)
FROM employees
WHERE salary > 50000 -- Filter individuals
GROUP BY department;
-- HAVING filters aggregated results
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 70000; -- Filter groupsQ2: Explain different types of JOINs
Answer: - INNER JOIN: Only matching rows from both tables - LEFT JOIN: All from left table + matches from right - RIGHT JOIN: All from right table + matches from left
- FULL OUTER JOIN: All rows from both tables
When to use: - INNER: Find customers who placed orders - LEFT: Find all customers (including those with no orders) - FULL OUTER: Rarely used, mostly for data audits
Q3: Write a query to find the 2nd highest salary
Answer:
-- Method 1: Using LIMIT/OFFSET
SELECT DISTINCT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;
-- Method 2: Using subquery
SELECT MAX(salary)
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);
-- Method 3: Using window function (best)
SELECT DISTINCT salary
FROM (
SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) as rnk
FROM employees
) ranked
WHERE rnk = 2;Python Interview Questions
Q4: What’s the difference between a list and a tuple?
Answer: - List: Mutable (can change), uses [] - Tuple: Immutable (cannot change), uses ()
Example:
# List - can modify
my_list = [1, 2, 3]
my_list[0] = 10 # Works
# Tuple - cannot modify
my_tuple = (1, 2, 3)
my_tuple[0] = 10 # Error!
# When to use tuple:
# - Fixed data (days of week, coordinates)
# - Dictionary keys (must be immutable)
# - Faster than listsQ5: Explain how you would remove duplicates from a pandas DataFrame
Answer:
import pandas as pd
# Remove all duplicates
df_clean = df.drop_duplicates()
# Remove duplicates based on specific columns
df_clean = df.drop_duplicates(subset=['customer_id'])
# Keep last occurrence instead of first
df_clean = df.drop_duplicates(keep='last')
# Before removal, check how many
print(f"Duplicates: {df.duplicated().sum()}")Statistics Interview Questions
Q6: Explain p-value in simple terms
Answer: “A p-value is the probability of observing results at least as extreme as what we got, if there really is no effect.
Example: We test a new ad campaign. P-value = 0.03 means there’s only a 3% chance these results happened by random luck if the campaign doesn’t actually work.
Since 3% is less than our threshold (usually 5%), we conclude the campaign likely does work.”
Q7: When would you use median instead of mean?
Answer: “Use median when data is skewed or has outliers, because median isn’t affected by extreme values.
Example: House prices in a neighborhood: - $200K, $210K, $205K, $215K, $5M (celebrity’s house) - Mean: $967K (misleading!) - Median: $210K (representative)
Use cases: Salaries, house prices, customer spending”
Q8: What’s the difference between Type I and Type II error?
Answer: - Type I (False Positive): Saying there’s an effect when there isn’t - Type II (False Negative): Missing a real effect
Medical Test Analogy: - Type I: Test says you’re sick, but you’re healthy - Type II: Test says you’re healthy, but you’re sick
Business Example: - Type I: Launching a useless feature thinking it’s good - Type II: Rejecting a good feature thinking it’s useless”
Business/Product Questions
Q9: How would you measure the success of Facebook’s News Feed?
Answer structure: 1. Clarify goal: User engagement, revenue, user growth? 2. Define metrics: - Primary: Daily Active Users (DAU), Time Spent - Secondary: Posts viewed, likes, comments, shares - Guardrail: User retention, complaints 3. Tradeoffs: More engagement vs. user well-being 4. How to measure: A/B testing, user surveys
Q10: Our website traffic is down 20% this week. How would you investigate?
Answer (systematic approach):
1. Validate the data: - Is tracking working correctly? - Any recent code changes? - Compare multiple data sources
2. Segment the drop: - Which pages? - Which traffic sources (organic, paid, direct)? - Which devices (mobile, desktop)? - Which countries/regions?
3. Check external factors: - Holidays or weekends? - Competitor campaigns? - Industry trends? - Technical issues (site speed, downtime)?
4. Correlate with changes: - Recent marketing campaigns ended? - SEO ranking changes? - Price changes? - Site redesign?
5. Recommend action based on findings
Technical Case Study
Q11: Design a dashboard for an e-commerce company
Answer structure:
1. Understand stakeholders: - CEO: Revenue, growth metrics - Marketing: Traffic, conversion, CAC - Operations: Orders, inventory, fulfillment
2. Key metrics: - Revenue metrics: Total revenue, AOV, revenue by category - Traffic metrics: Visits, unique visitors, traffic sources - Conversion: Conversion rate, cart abandonment - Customer: New vs. returning, LTV, retention
3. Dashboard design:
+------------------+------------------+
| Total Revenue | Total Orders |
| $1.2M | 15,234 |
+------------------+------------------+
| Revenue Trend Over Time |
| (Line chart) |
+-------------------------------------+
| Top Products | Traffic Sources |
| (Bar chart) | (Pie chart) |
+------------------+------------------+
4. Interactivity: - Date range filter - Category selector - Drill-down capabilities
Behavioral Questions
Q12: Tell me about a time you made a mistake in your analysis
Answer using STAR method:
Situation: “In my previous role, I was analyzing customer churn…”
Task: “I needed to calculate monthly retention rate…”
Action: “I initially calculated retention incorrectly by [specific error]. When my manager questioned the numbers, I realized my mistake. I immediately: 1. Acknowledged the error 2. Redid the analysis correctly 3. Documented the issue to prevent future mistakes 4. Created a checklist for similar analyses”
Result: “The correct analysis showed retention was actually 5% lower than reported, which led to earlier intervention and prevented further losses. I learned to always have someone peer-review my work and to create validation checks.”
Q13: Describe a time you had to explain technical results to non-technical stakeholders
Answer: “I once needed to present a machine learning model’s churn prediction to the CEO.
Challenge: CEO wanted to know ‘how it works’ but had no ML background.
Approach: 1. Used analogy: ‘Like how Netflix recommends shows based on similar users, our model predicts churn based on patterns from past churners’ 2. Showed business impact first: ‘85% accuracy means we can save $500K annually’ 3. Used visualizations instead of math 4. Prepared for ‘what if’ questions with interactive dashboard
Result: Got approval to implement, and CEO became advocate for data-driven decisions.”
Live Case Study Tips
Q14: You’re given a dataset and asked to find insights (30 minutes)
Approach (spend time wisely):
Minutes 0-5: Understand the data
import pandas as pd
df = pd.read_csv('data.csv')
# Quick checks
print(df.info())
print(df.describe())
print(df.head())
print(df.isnull().sum())Minutes 5-10: Clean data
# Handle missing values
# Remove duplicates
# Fix data typesMinutes 10-20: Explore & visualize
# Key distributions
# Correlations
# Trends over time
# Comparisons across categoriesMinutes 20-25: Find 3 insights - Make them specific and actionable - Quantify the impact
Minutes 25-30: Prepare presentation - 1 slide per insight - Visualization + interpretation + recommendation
Questions to Ask the Interviewer
About the Role: 1. “What does a typical day look like?” 2. “What tools and tech stack does the team use?” 3. “What are the biggest data challenges you’re facing?” 4. “How do you measure success for this role?”
About the Team: 5. “How large is the data team?” 6. “What’s the team’s background (technical vs. business)?” 7. “How does the team collaborate with other departments?”
About Growth: 8. “What opportunities are there for professional development?” 9. “What skills should I develop to advance?”
About the Company: 10. “How does the company use data to make decisions?” 11. “What’s the company’s biggest priority right now?”
Interview Preparation Checklist
1 Week Before: - [ ] Research company (products, mission, recent news) - [ ] Review job description - [ ] Prepare STAR stories (5-7 examples) - [ ] Practice SQL on StrataScratch - [ ] Refresh statistics concepts - [ ] Review your portfolio projects
1 Day Before: - [ ] Review common interview questions - [ ] Prepare questions to ask - [ ] Test tech setup (if remote) - [ ] Prepare notebook and pen
Day Of: - [ ] Arrive 10 minutes early - [ ] Bring copies of resume - [ ] Bring portfolio (printed or tablet)
FREE Interview Prep Resources
- StrataScratch - Real interview questions
- DataLemur - Company-specific SQL questions
- Interview Query - Practice problems
- Glassdoor - Company interview reviews
- LeetCode Database - SQL practice
Mock Interview (Do This!)
Find a partner or use Pramp: 1. Take turns being interviewer 2. Practice answering out loud 3. Time yourself 4. Get feedback
Record yourself: - Video yourself answering questions - Watch for nervous habits - Improve clarity and confidence
Related Posts: - Build a Portfolio That Gets You Hired - Master SQL in 30 Days - Statistics for Data Analysts
Tags: #Interview #Career #DataAnalyst #JobSearch #Preparation #SQL