PracticeTutorialsBlogPricing
Back to blog
SQL

Top 10 SQL Window Functions Every Data Engineer Must Know

SQL window functions are tested in almost every data engineering interview. Yet most candidates fail these questions — not because they do not know the syntax, but because they do not understand when to use one function over another. This guide covers the 10 you must know, with real interview examples for each.

What is a Window Function?

A window function performs a calculation across a set of rows that are related to the current row — without collapsing them into a single output row like GROUP BY does. The OVER() clause defines the window: which rows to include, how to partition them, and how to order them.

SELECT
  user_id,
  revenue,
  SUM(revenue) OVER (PARTITION BY country ORDER BY date) AS running_total
FROM orders;

1. ROW_NUMBER()

Assigns a unique sequential integer to each row within a partition. Unlike RANK, it never repeats numbers even when rows are tied. Use it to deduplicate — keep only the most recent record per user.

-- Keep only the latest order per user
SELECT * FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rn
  FROM orders
) t WHERE rn = 1;

2. RANK() and 3. DENSE_RANK()

Both assign rankings, but differ in how they handle ties. RANK() skips numbers after a tie (1, 2, 2, 4). DENSE_RANK() does not skip (1, 2, 2, 3). Interview question: "Find the top 3 revenue-generating products per category." Dense_rank is usually the right choice because the question means top 3 distinct revenue levels.

SELECT * FROM (
  SELECT
    category,
    product,
    revenue,
    DENSE_RANK() OVER (PARTITION BY category ORDER BY revenue DESC) AS dr
  FROM products
) t WHERE dr <= 3;

4. LAG() and 5. LEAD()

LAG accesses the previous row's value; LEAD accesses the next row's value. Essential for time-series analysis — month-over-month growth, session gaps, consecutive day streaks.

-- Month-over-month revenue change
SELECT
  month,
  revenue,
  LAG(revenue) OVER (ORDER BY month)              AS prev_revenue,
  revenue - LAG(revenue) OVER (ORDER BY month)    AS change,
  ROUND(100.0 * (revenue - LAG(revenue) OVER (ORDER BY month))
        / LAG(revenue) OVER (ORDER BY month), 2)  AS pct_change
FROM monthly_revenue;

6. SUM() OVER — Running Totals

Running totals are one of the most common interview questions. The trick is the frame clause: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

SELECT
  date,
  daily_sales,
  SUM(daily_sales) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total,
  AVG(daily_sales) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)         AS rolling_7day_avg
FROM daily_sales;

7. NTILE(n)

Divides rows into n equal buckets. Use it to segment users into quartiles, deciles, or percentiles for cohort analysis.

-- Segment users into 4 revenue quartiles
SELECT
  user_id, revenue,
  NTILE(4) OVER (ORDER BY revenue) AS quartile
FROM users;

8. FIRST_VALUE() and 9. LAST_VALUE()

Return the first or last value in the window frame. LAST_VALUE has a common gotcha: by default the frame ends at the current row, not the partition end. You need to specify ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING to get the true last value.

SELECT
  user_id,
  event,
  FIRST_VALUE(event) OVER (PARTITION BY user_id ORDER BY ts)  AS first_event,
  LAST_VALUE(event)  OVER (
    PARTITION BY user_id ORDER BY ts
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  )                                                           AS last_event
FROM events;

10. COUNT() OVER — Partition Size

Less obvious but frequently useful: COUNT(*) OVER (PARTITION BY ...) gives the total row count in each partition without collapsing rows. Use it to calculate percentages inline.

SELECT
  country,
  user_id,
  revenue,
  COUNT(*) OVER (PARTITION BY country)              AS country_users,
  ROUND(100.0 * revenue / SUM(revenue) OVER (PARTITION BY country), 2) AS pct_of_country
FROM users;

The fastest way to master all 10 of these is to solve real problems — not just read syntax. Every function above has a dedicated practice question on DataCodingHub.

Practice SQL window functions