SQLdata-engineerwindow-functionsinterviewquery-optimizationETL
Data Engineer 面试 SQL 高频题全解析(2026 更新):窗口函数 + 复杂查询
Data Engineer 面试的 SQL 题比普通 SDE 难得多,窗口函数、复杂 JOIN、子查询是必考内容。本文精选 10 道高频 SQL 面试题并详细解析。
Sam · · 18 分钟阅读
Data Engineer 面试中 SQL 是必考环节,而且难度明显高于 SDE 面试。Meta、Google、Amazon、TikTok 等公司的 DE 面试都会考复杂的 SQL 查询。
题型一:窗口函数
题目一:部门薪资排名
题目:找出每个部门薪资前三的员工。
WITH ranked AS (
SELECT
employee_id,
name,
department_id,
salary,
DENSE_RANK() OVER (
PARTITION BY department_id
ORDER BY salary DESC
) as rank
FROM employees
)
SELECT employee_id, name, department_id, salary
FROM ranked
WHERE rank <= 3;
关键知识点:
RANK()vsDENSE_RANK()vsROW_NUMBER()- RANK:1, 2, 2, 4(跳过排名)
- DENSE_RANK:1, 2, 2, 3(不跳过)
- ROW_NUMBER:1, 2, 3, 4(强制唯一)
题目二:连续登录天数
题目:找出连续登录 3 天以上的用户。
WITH login_dates AS (
SELECT DISTINCT user_id, login_date
FROM user_logins
),
with_gap AS (
SELECT
user_id,
login_date,
DATE_SUB(login_date, INTERVAL ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY login_date
) DAY) as gap
FROM login_dates
),
consecutive AS (
SELECT user_id, gap, COUNT(*) as consecutive_days
FROM with_gap
GROUP BY user_id, gap
HAVING COUNT(*) >= 3
)
SELECT user_id, consecutive_days
FROM consecutive;
思路:用行号减去日期,连续日期的差值是相同的。
题型二:复杂 JOIN
题目三:缺失记录查找
题目:找出有订单但没有用户信息的订单。
SELECT o.*
FROM orders o
LEFT JOIN users u ON o.user_id = u.id
WHERE u.id IS NULL;
变种:找出两个表都有的记录(INNER JOIN)和只在一个表中有的记录(FULL OUTER JOIN)。
题目四:自连接
题目:找出每个员工的经理。
SELECT
e.name AS employee,
m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;
题型三:聚合与分组
题目五:活跃用户统计
题目:计算过去 30 天每天的活跃用户数。
SELECT
DATE(event_time) as day,
COUNT(DISTINCT user_id) as active_users
FROM events
WHERE event_time >= DATE_SUB(CURRENT_DATE, INTERVAL 30 DAY)
GROUP BY DATE(event_time)
ORDER BY day;
注意:面试官可能会问”如果某天没有数据怎么办”,这时需要生成日期序列。
题目六:留存率计算
题目:计算第 1、7、30 天的用户留存率。
WITH first_login AS (
SELECT
user_id,
MIN(DATE(login_date)) as first_day
FROM user_logins
GROUP BY user_id
),
retention AS (
SELECT
fl.user_id,
fl.first_day,
u.login_date,
DATEDIFF(u.login_date, fl.first_day) as days_since_first
FROM first_login fl
JOIN user_logins u ON fl.user_id = u.user_id
)
SELECT
CASE
WHEN days_since_first = 0 THEN 'Day 0'
WHEN days_since_first = 1 THEN 'Day 1'
WHEN days_since_first = 7 THEN 'Day 7'
WHEN days_since_first = 30 THEN 'Day 30'
ELSE 'Other'
END as retention_day,
COUNT(DISTINCT user_id) as retained_users,
COUNT(DISTINCT user_id) * 100.0 / (
SELECT COUNT(DISTINCT user_id) FROM first_login
) as retention_rate
FROM retention
WHERE days_since_first IN (0, 1, 7, 30)
GROUP BY retention_day;
题型四:子查询与 CTE
题目七:累计求和
题目:计算每天的交易总额及累计总额。
SELECT
transaction_date,
daily_total,
SUM(daily_total) OVER (
ORDER BY transaction_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as cumulative_total
FROM (
SELECT
DATE(transaction_time) as transaction_date,
SUM(amount) as daily_total
FROM transactions
GROUP BY DATE(transaction_time)
) daily;
题目八:Top N per Group
题目:找出每个类别下销量最高的 3 个产品。
WITH ranked AS (
SELECT
category,
product_id,
product_name,
total_sales,
ROW_NUMBER() OVER (
PARTITION BY category
ORDER BY total_sales DESC
) as rn
FROM product_sales
)
SELECT category, product_id, product_name, total_sales
FROM ranked
WHERE rn <= 3;
题型五:数据清洗与转换
题目九:字符串处理
题目:将邮箱地址解析为域名部分并统计各域名的用户数。
SELECT
SUBSTRING(email, POSITION('@' IN email) + 1) as domain,
COUNT(*) as user_count
FROM users
GROUP BY domain
ORDER BY user_count DESC;
注意:不同数据库的字符串函数不同(MySQL vs PostgreSQL vs BigQuery)。
题目十:NULL 处理
题目:找出有缺失值的字段。
SELECT
COUNT(*) as total_rows,
COUNT(name) as non_null_names,
COUNT(email) as non_null_emails,
COUNT(phone) as non_null_phones,
COUNT(*) - COUNT(name) as null_names,
COUNT(*) - COUNT(email) as null_emails,
COUNT(*) - COUNT(phone) as null_phones
FROM users;
SQL 面试技巧
1. 先理清逻辑再写 SQL
- 先说清楚你要查什么
- 分步骤写,用 CTE 让逻辑更清晰
- 最后再优化
2. 注意边界情况
- 空表怎么办?
- 有 NULL 值怎么办?
- 有重复数据怎么办?
3. 性能意识
- 避免 SELECT *
- 用 EXISTS 代替 IN(子查询大时)
- 注意索引的使用
FAQ
SQL 面试用哪个数据库?
大部分公司用 MySQL 或 PostgreSQL,但有些(如 Google)用 BigQuery。面试前确认一下。
可以查文档吗?
通常不行,但面试官一般不要求记住所有语法细节。
💡 需要面试辅导?
如果你对准备技术面试感到迷茫,或者想要个性化的面试指导和简历优化,欢迎联系 Interview Coach Pro 获取一对一辅导服务。
👉 联系我们 获取专属面试准备方案