SQL Interview Questions for Data Engineers — Complete Guide
SQL is tested in virtually every data engineering interview, even at companies that use Spark for everything. The SQL bar is high — interviewers expect you to write correct, efficient queries without hesitation. This guide covers the topics that come up most often.
Window Functions — 40% of SQL Interviews
Window functions are the single most tested SQL topic for data engineers. You must be completely fluent with ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, SUM OVER, and AVG OVER.
-- Classic interview question: find employees earning above their dept average
SELECT name, dept, salary, avg_salary
FROM (
SELECT name, dept, salary,
AVG(salary) OVER (PARTITION BY dept) as avg_salary
FROM employees
) t
WHERE salary > avg_salary;
-- Find the most recent order per customer
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) as rn
FROM orders
) t WHERE rn = 1;CTEs — Writing Readable Complex Queries
CTEs (Common Table Expressions) are essential for structuring complex queries. Interviewers evaluate both correctness and code quality — a well-structured CTE scores higher than a single unreadable nested subquery.
-- Multi-step analysis using CTEs
WITH monthly_revenue AS (
SELECT user_id, DATE_TRUNC('month', order_date) as month,
SUM(revenue) as monthly_rev
FROM orders
GROUP BY 1, 2
),
user_growth AS (
SELECT user_id, month, monthly_rev,
LAG(monthly_rev) OVER (PARTITION BY user_id ORDER BY month) as prev_month_rev
FROM monthly_revenue
)
SELECT user_id, month,
monthly_rev,
ROUND((monthly_rev - prev_month_rev) * 100.0 / prev_month_rev, 2) as pct_growth
FROM user_growth
WHERE prev_month_rev IS NOT NULL
ORDER BY pct_growth DESC;Tricky Interview Questions
These questions trip up many candidates. Practice them until they are automatic.
-- Q: Find users active on at least 3 consecutive days
WITH daily_active AS (
SELECT DISTINCT user_id, DATE(event_time) as active_date FROM events
),
with_gaps AS (
SELECT user_id, active_date,
active_date - INTERVAL (ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY active_date)) DAY as grp
FROM daily_active
)
SELECT user_id FROM with_gaps
GROUP BY user_id, grp
HAVING COUNT(*) >= 3;
-- Q: Median salary per department (no MEDIAN() in standard SQL)
SELECT dept, salary as median_salary
FROM (
SELECT dept, salary,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary)
OVER (PARTITION BY dept) as pct_median,
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary) as rn,
COUNT(*) OVER (PARTITION BY dept) as cnt
FROM employees
) t WHERE rn = CEIL(cnt / 2.0);Query Optimisation Questions
Senior data engineering interviews always include at least one query optimisation question. Know these patterns: • Use EXISTS instead of IN for large subqueries (EXISTS short-circuits) • Avoid SELECT * — specify columns for column pruning • Filter before joining — push WHERE conditions as early as possible • Use partitioned tables and ensure your WHERE clause uses the partition column • EXPLAIN ANALYZE is your friend — always check the query plan before shipping slow queries