PracticeTutorialsBlogPricing
Back to blog
SQL

SQL Interview Questions for Data Engineers — Complete Guide

SQL is tested in virtually every data engineering interview, even at companies that use Spark for everything. The SQL bar is high — interviewers expect you to write correct, efficient queries without hesitation. This guide covers the topics that come up most often.

Window Functions — 40% of SQL Interviews

Window functions are the single most tested SQL topic for data engineers. You must be completely fluent with ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, SUM OVER, and AVG OVER.

-- Classic interview question: find employees earning above their dept average
SELECT name, dept, salary, avg_salary
FROM (
  SELECT name, dept, salary,
         AVG(salary) OVER (PARTITION BY dept) as avg_salary
  FROM employees
) t
WHERE salary > avg_salary;

-- Find the most recent order per customer
SELECT * FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) as rn
  FROM orders
) t WHERE rn = 1;

CTEs — Writing Readable Complex Queries

CTEs (Common Table Expressions) are essential for structuring complex queries. Interviewers evaluate both correctness and code quality — a well-structured CTE scores higher than a single unreadable nested subquery.

-- Multi-step analysis using CTEs
WITH monthly_revenue AS (
  SELECT user_id, DATE_TRUNC('month', order_date) as month,
         SUM(revenue) as monthly_rev
  FROM orders
  GROUP BY 1, 2
),
user_growth AS (
  SELECT user_id, month, monthly_rev,
         LAG(monthly_rev) OVER (PARTITION BY user_id ORDER BY month) as prev_month_rev
  FROM monthly_revenue
)
SELECT user_id, month,
       monthly_rev,
       ROUND((monthly_rev - prev_month_rev) * 100.0 / prev_month_rev, 2) as pct_growth
FROM user_growth
WHERE prev_month_rev IS NOT NULL
ORDER BY pct_growth DESC;

Tricky Interview Questions

These questions trip up many candidates. Practice them until they are automatic.

-- Q: Find users active on at least 3 consecutive days
WITH daily_active AS (
  SELECT DISTINCT user_id, DATE(event_time) as active_date FROM events
),
with_gaps AS (
  SELECT user_id, active_date,
         active_date - INTERVAL (ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY active_date)) DAY as grp
  FROM daily_active
)
SELECT user_id FROM with_gaps
GROUP BY user_id, grp
HAVING COUNT(*) >= 3;

-- Q: Median salary per department (no MEDIAN() in standard SQL)
SELECT dept, salary as median_salary
FROM (
  SELECT dept, salary,
         PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary)
           OVER (PARTITION BY dept) as pct_median,
         ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary) as rn,
         COUNT(*) OVER (PARTITION BY dept) as cnt
  FROM employees
) t WHERE rn = CEIL(cnt / 2.0);

Query Optimisation Questions

Senior data engineering interviews always include at least one query optimisation question. Know these patterns: • Use EXISTS instead of IN for large subqueries (EXISTS short-circuits) • Avoid SELECT * — specify columns for column pruning • Filter before joining — push WHERE conditions as early as possible • Use partitioned tables and ensure your WHERE clause uses the partition column • EXPLAIN ANALYZE is your friend — always check the query plan before shipping slow queries

Practice SQL problems with real execution