Data Engineer 面试 SQL 高频题全解析(2026 更新):窗口函数 + 复杂查询
SQLdata-engineerwindow-functionsinterviewquery-optimizationETL

Data Engineer 面试 SQL 高频题全解析(2026 更新):窗口函数 + 复杂查询

Data Engineer 面试的 SQL 题比普通 SDE 难得多,窗口函数、复杂 JOIN、子查询是必考内容。本文精选 10 道高频 SQL 面试题并详细解析。

Sam · · 18 分钟阅读

Data Engineer 面试中 SQL 是必考环节,而且难度明显高于 SDE 面试。Meta、Google、Amazon、TikTok 等公司的 DE 面试都会考复杂的 SQL 查询。


题型一:窗口函数

题目一:部门薪资排名

题目:找出每个部门薪资前三的员工。

WITH ranked AS (
    SELECT 
        employee_id,
        name,
        department_id,
        salary,
        DENSE_RANK() OVER (
            PARTITION BY department_id 
            ORDER BY salary DESC
        ) as rank
    FROM employees
)
SELECT employee_id, name, department_id, salary
FROM ranked
WHERE rank <= 3;

关键知识点

  • RANK() vs DENSE_RANK() vs ROW_NUMBER()
    • RANK:1, 2, 2, 4(跳过排名)
    • DENSE_RANK:1, 2, 2, 3(不跳过)
    • ROW_NUMBER:1, 2, 3, 4(强制唯一)

题目二:连续登录天数

题目:找出连续登录 3 天以上的用户。

WITH login_dates AS (
    SELECT DISTINCT user_id, login_date
    FROM user_logins
),
with_gap AS (
    SELECT 
        user_id,
        login_date,
        DATE_SUB(login_date, INTERVAL ROW_NUMBER() OVER (
            PARTITION BY user_id 
            ORDER BY login_date
        ) DAY) as gap
    FROM login_dates
),
consecutive AS (
    SELECT user_id, gap, COUNT(*) as consecutive_days
    FROM with_gap
    GROUP BY user_id, gap
    HAVING COUNT(*) >= 3
)
SELECT user_id, consecutive_days
FROM consecutive;

思路:用行号减去日期,连续日期的差值是相同的。


题型二:复杂 JOIN

题目三:缺失记录查找

题目:找出有订单但没有用户信息的订单。

SELECT o.*
FROM orders o
LEFT JOIN users u ON o.user_id = u.id
WHERE u.id IS NULL;

变种:找出两个表都有的记录(INNER JOIN)和只在一个表中有的记录(FULL OUTER JOIN)。

题目四:自连接

题目:找出每个员工的经理。

SELECT 
    e.name AS employee,
    m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

题型三:聚合与分组

题目五:活跃用户统计

题目:计算过去 30 天每天的活跃用户数。

SELECT 
    DATE(event_time) as day,
    COUNT(DISTINCT user_id) as active_users
FROM events
WHERE event_time >= DATE_SUB(CURRENT_DATE, INTERVAL 30 DAY)
GROUP BY DATE(event_time)
ORDER BY day;

注意:面试官可能会问”如果某天没有数据怎么办”,这时需要生成日期序列。

题目六:留存率计算

题目:计算第 1、7、30 天的用户留存率。

WITH first_login AS (
    SELECT 
        user_id,
        MIN(DATE(login_date)) as first_day
    FROM user_logins
    GROUP BY user_id
),
retention AS (
    SELECT 
        fl.user_id,
        fl.first_day,
        u.login_date,
        DATEDIFF(u.login_date, fl.first_day) as days_since_first
    FROM first_login fl
    JOIN user_logins u ON fl.user_id = u.user_id
)
SELECT 
    CASE 
        WHEN days_since_first = 0 THEN 'Day 0'
        WHEN days_since_first = 1 THEN 'Day 1'
        WHEN days_since_first = 7 THEN 'Day 7'
        WHEN days_since_first = 30 THEN 'Day 30'
        ELSE 'Other'
    END as retention_day,
    COUNT(DISTINCT user_id) as retained_users,
    COUNT(DISTINCT user_id) * 100.0 / (
        SELECT COUNT(DISTINCT user_id) FROM first_login
    ) as retention_rate
FROM retention
WHERE days_since_first IN (0, 1, 7, 30)
GROUP BY retention_day;

题型四:子查询与 CTE

题目七:累计求和

题目:计算每天的交易总额及累计总额。

SELECT 
    transaction_date,
    daily_total,
    SUM(daily_total) OVER (
        ORDER BY transaction_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) as cumulative_total
FROM (
    SELECT 
        DATE(transaction_time) as transaction_date,
        SUM(amount) as daily_total
    FROM transactions
    GROUP BY DATE(transaction_time)
) daily;

题目八:Top N per Group

题目:找出每个类别下销量最高的 3 个产品。

WITH ranked AS (
    SELECT 
        category,
        product_id,
        product_name,
        total_sales,
        ROW_NUMBER() OVER (
            PARTITION BY category 
            ORDER BY total_sales DESC
        ) as rn
    FROM product_sales
)
SELECT category, product_id, product_name, total_sales
FROM ranked
WHERE rn <= 3;

题型五:数据清洗与转换

题目九:字符串处理

题目:将邮箱地址解析为域名部分并统计各域名的用户数。

SELECT 
    SUBSTRING(email, POSITION('@' IN email) + 1) as domain,
    COUNT(*) as user_count
FROM users
GROUP BY domain
ORDER BY user_count DESC;

注意:不同数据库的字符串函数不同(MySQL vs PostgreSQL vs BigQuery)。

题目十:NULL 处理

题目:找出有缺失值的字段。

SELECT 
    COUNT(*) as total_rows,
    COUNT(name) as non_null_names,
    COUNT(email) as non_null_emails,
    COUNT(phone) as non_null_phones,
    COUNT(*) - COUNT(name) as null_names,
    COUNT(*) - COUNT(email) as null_emails,
    COUNT(*) - COUNT(phone) as null_phones
FROM users;

SQL 面试技巧

1. 先理清逻辑再写 SQL

  • 先说清楚你要查什么
  • 分步骤写,用 CTE 让逻辑更清晰
  • 最后再优化

2. 注意边界情况

  • 空表怎么办?
  • 有 NULL 值怎么办?
  • 有重复数据怎么办?

3. 性能意识

  • 避免 SELECT *
  • 用 EXISTS 代替 IN(子查询大时)
  • 注意索引的使用

FAQ

SQL 面试用哪个数据库?

大部分公司用 MySQL 或 PostgreSQL,但有些(如 Google)用 BigQuery。面试前确认一下。

可以查文档吗?

通常不行,但面试官一般不要求记住所有语法细节。


💡 需要面试辅导?

如果你对准备技术面试感到迷茫,或者想要个性化的面试指导和简历优化,欢迎联系 Interview Coach Pro 获取一对一辅导服务。

👉 联系我们 获取专属面试准备方案

准备好拿下下一次面试了吗?

获取针对你的目标岗位和公司的个性化辅导方案。

联系我们