登录
  • #刷题
  • #工作信息
  • #求职
  • #leetcode
  • #攻略

Leetcode SQL 刷完之后的考点汇总

爷爷刘大
3971
12
太简单的或者太常用的我没加进来。这个是markdown 用notion编辑的,如有雷同那就是我。

# rank dense_rank 用法

- rank 不能直接命名成rank,必须加双引号as "rank"

- 不能直接用where,要在外面套用一层select 再用where

- 区别rank 和dense_rank, dense 连续编号

# Distinct 用法

count(distinct xxx)

实际上可以用在任何地方

有很多问题distinct 是关键

distinct 两列: distinct a, b

# Left join, inner join, cross join

using(id)

on [a.id](a.id) = b.id

join 有可能不止依赖一个条件,用and 隔开

很难想到用join的一道题,一列表示若干个点的坐标,求最短的线段。明显一个线段由两个点组成,所以把一个表当成两个表join

反复join 的另一个例题,1440,

```sql

select a.xxx , (v1.value > v2.value then xxx)

from tableA a left join tableB v1 on a.left = [v1.name](v1.name) left join tableB v2 on a.right = v2.name

```

# Function

begin()里面不加分号

# dual 用法

虚拟表

用于增加返回空

select (select xxx from) as "column" from dual, 这里from dual 可以不写

# 聚类函数用法

当聚类函数用在where中时,

例子:

错:where salary < max(salary)

对:where salary < (select max(salary) from xxx)

这里需要区分where 和 having的用法

带聚类函数的要用在having中

# null相关

- isnull ifnull nullif 区别:isnull(xxx) 返回0或1; ifnull(e1, e2) 如果e1是null返回e2; nullif(e1,e2)比较e1和e2是否相等,如果等返回null,不等返回e1

- where xx is not null

# 技巧:把一张表当两张表用

from Table a, Table b where [a.id](a.id) = b.subid and xxxx

# in 语法

where (a, b) in (select a, b from xxx)

# delete 语句

语法:delete from Table where xxxxx

注意:delete 在mysql中不可以直接使用where接过滤条件,需要添加中间表。

```sql

delete from Person where

id not in

(select t.* from (select min(id) from Person group by email) t);

```

这里t就是中间表

# time 相关

如果只有date,可以直接写 between "2013-10-01" and "2013-10-03"

或者 date > "2013-10-01"

date 格式是 "2013-10-01"

yea(date) ≥ 2020

给定日期"2013-10-01",选取年月

date_format(order_date,'%Y-%m') as 'month'

left(order_date, 7)

substring(date, 1, 7)

DATEDIFF(end_date, start_date) 是 end_date - start_date

date_column +/- interval n day

# with as

with cte as (select xxx from xxx)

select * from cte;

with a as (xxx), b as (xxx)

select xxxx;

# round

round(xxx, 2) 给xx精确到小数点两位

# avg() 函数

算百分比非常有用的函数,如果统计有没有,可以和case when 1 else 0 end配合使用

# case

```sql

#简单case函数

case sex

when '1' then '男'

when '2' then '女’

else '其他' end;

```

```sql

#case搜索函数

case when sex = '1' then '男'

when sex = '2' then '女'

else '其他' end

```

# limit

一个非常有用的用法,当where 不能用(在出结果之后在排除某些行),用limit抛去offset

可以减少一次select 非常重要的考点

# aggregation 函数

如果是类似

select a, b from table group by a having min(c);

b 其实是随机返回的值,并不是像我们想的那样返回一对。

要用in 或者left join

同理,select a, min(b), c, d from table group by a 并不能保证c,d的一一对应关系,而是随机返回的

两个aggregation 连续用:max(count())是不能用的,所以要拆分成两个select,或者三个,用 with t, select from t where a = (select max(xx) from t) 或者 select from t where count() = select (rank limit 1)

在case 或者 if 的判断式中使用类似 max ,count, 不能直接使用,而要包含一个子查询,

eg.: (case id = (select max(id) from table) then id-1 end) id from table;

在where里直接用max也不行,必须加select子查询

当group by 使用aggregation不现实的时候,改成window函数 partition by

max的一个特殊用法:618, 始终不让一个set被数据库自动返回一个值,保持set并且可以被取值的状态。

# like

xxx like "% xxx%"

# group by

可以直接写成group by a, b

用两列group

# count(1) 就是 count(*)

# group_concat(distinct product order by product) as products

把group之后所有里面的元素连起来

# Union / Union All

把多张表格合并(多个select合并)每个表格用同样的列。

Union 会消除重复,Union All 不会

limit order by 如果作用在select里面,需要加小括号,否则就作用在union之后

在某些特殊的题目中,from 后面的东西可以把两张表先union起来在合起来用比较快

# update

update table set column = value, column2 = value2 where xxxxx

如果不加where则更新所有行

# 把字母变成ASCII,再变回字母

CHAR(ASCII('a'))

# Consecutive 类型的题目

要连续几次就把表重复join几次

# 对原来的表加一列

select column 1, column2, ...., new column from original table;

# 几个常用函数

trim

upper

lower

left

right

substring

length

concat

组合可以得到首字母大写

# 易混淆函数

sum vs count

sum(case 1 else 0) /count(1) = avg

# Sales Analysis 系列特点

用group 后加having 条件,统计sum(待条件) 或者sum(不待条件) 并判断

# 正则表达式

1. column REGEXP '正则 '

2. regexp_like(column, '正则')

3. like "%xxx%"

# if

if(exp1, exp2, exp3) 表示 exp1 is true ? exp2 : exp3

# join ... on (xx or xx)

这是一个很神奇的用法,可以join两次

# mysql没有full join

用两次select并且union

# 窗口函数的特殊用法 window function

lead(column, offset, default) 从当前的下面

lag(column, offset, default) 从当前的上面

order by 之后的不完整偏移量用法:

rows between 1 preceding and 0 following

rows unbounded preceding

RANGE BETWEEN unbounded preceding AND CURRENT ROW

range between interval 2 day preceding and interval 2 day following

window 函数的考法非常多,另外一种考法就是和group by互换。能用上rows的就是非常难的题目了

一般考的全是rank sum count

难一点就考lead 考连续值

虽然连续两个sum不合法,但是sum(sum(amount)) over (order by visited_on rows 6 preceding)是合法的

log 用法eg:求一个人是否连续五天登陆,log(date, 4) in column 或者 date, rank() 并且count(1) > 4 再 date - rank in column

把window 当aggragation用 sum() over (什么都不写)

# 连续序列生成

```sql

with recursive month_2020 as

(

select 1 as month union all

select month+1 from month_2020

where month < 12

)

select * from month_2020

```

```sql

(select 1 as month)

union (select 2 as month)

union (select 3 as month)

union (select 4 as month)

union (select 5 as month)

union (select 6 as month)

union (select 7 as month)

union (select 8 as month)

union (select 9 as month)

union (select 10 as month)

union (select 11 as month)

union (select 12 as month)

```

```sql

select months.column_0 clolumn_name FROM (

VALUES ROW(1), ROW(2), ROW(3), ROW(4), ROW(5), ROW(6), ROW(7), ROW(8), ROW(9), ROW(10), ROW(11), ROW(12)

) AS months

```

```sql

select * from



(select distinct spend_date, 'desktop' platform from spending

union all

select distinct spend_date, 'mobile' platform from spending

union all

select distinct spend_date, 'both' platform from spending) c

```

# pivot

旋转90度,mysql没有这个语句,用ifelse/case 替代

618看上去像pivot类型,但是其实他是多列合并类型的题目,并且利用了自定义变量的技巧,参考下面的经典题型和自定义变量相关内容

# recursive 用法

```sql

with recursive **cte** as (

select base from xxx

union all

select xx from **cte** where xxx

)

```

这里需要注意的是避免死循环,比如1270

```sql

with recursive cte as

(select employee_id from Employees where manager_id=1 and employee_id!=1

union all select e.employee_id from Employees e inner join cte c on e.manager_id = c.employee_id)

select * from cte

```

如果把1放入cte 则会永远跳不出循环

注意:cte引用不可以放入聚合函数,have group order limit 等等,除非在initial query中

为了解决这个问题,用inner join

如果只有一列, 用如下语法表示列名

```sql

WITH RECURSIVE cte_count (n)

AS (

SELECT 1

UNION ALL

SELECT n + 1

FROM cte_count

WHERE n < 3

)

SELECT n

FROM cte_count;

```

这种语法的可以有效避免distinct使用(用union而不是union all)

如果有多个cte, 需要recursive的放在前面

# 经典题型:将离散集合变成区间 1285

一串ID,把连续的id写成start | end的取件形式。

两种典型写法

```sql

select a.log_id start_id, min(b.log_id) end_id from

(select log_id from Logs where log_id - 1 not in (select log_id from Logs)) a,

(select log_id from Logs where log_id + 1 not in (select log_id from Logs)) b where a.log_id <= b.log_id

group by start_id order by start_id asc

```

```sql

select min(log_id) start_id, max(log_id) end_id from

(select log_id, log_id - (row_number() over (order by log_id asc)) dif from logs) a

group by dif order by start_id asc

```

第一种是用了一个小技巧找到边界值,然后做笛卡尔积,再钉死左边界,把找最小值为右边界

第二种是把每一行加一个行号,再做差,根据差值group (group by id - row_number)

总的来说,本质上把一列变成两列(包括变成若干列的题目,例如618)都是要**寻找各个列之间的共同点然后join 再group。**

类似的题目还有1225

```sql

SELECT period_state, MIN(days) start_date, MAX(days) end_date

FROM

(SELECT period_state, days,

RANK() OVER(PARTITION BY period_state ORDER BY days) rk

FROM

(SELECT 'failed' period_state, fail_date days

FROM failed

UNION ALL

SELECT 'succeeded' period_state, success_date days

FROM succeeded) a

WHERE YEAR(days) = '2019') b

GROUP BY 1, date_add(days, INTERVAL -rk DAY)

ORDER BY 2

```

```sql

SELECT period_state, MIN(d) AS start_date, MAX(d) AS end_date

FROM (SELECT fail_date AS d, 'failed' AS period_state,

DATE_SUB(fail_date, INTERVAL ROW_NUMBER() OVER(ORDER BY fail_date) DAY) AS day

FROM Failed

WHERE fail_date BETWEEN '2019-01-01' AND '2019-12-31'

UNION ALL

SELECT success_date AS d, 'succeeded' AS period_state,

DATE_SUB(success_date, INTERVAL ROW_NUMBER() OVER(ORDER BY success_date) DAY) AS day

FROM Succeeded

WHERE success_date BETWEEN '2019-01-01' AND '2019-12-31') AS l

GROUP BY period_state, day

ORDER BY start_date

```

这个解法关注点在与如何把day 和rank结合使用求行号

我的解法复杂一点,但是用了lead 和 lag 做group依据

```sql

with a as (select ri, period_state, lag(period_state,1) over(order by ri asc) prev_ , lead(period_state,1 ) over(order by ri asc) next_ from (select fail_date ri, "failed" period_state from Failed

union

select success_date ri, "succeeded" period_state from Succeeded) a where ri between "2019-01-01" and "2019-12-31" order by ri asc)

select b.period_state, b.start_date, c.end_date from

(select row_number() over(order by ri) rn, a.ri start_date, a.period_state from a where prev_ is null or prev_ <> period_state) b join (select row_number() over(order by ri) rn, a.ri end_date from a where next_ is null or next_ <> period_state) c using(rn) order by start_date asc

```

# 经典题型:求中位数

569

判断标准:

```sql

row_num between cnt/2.0 and cnt/2.0+1

```

```sql

RN_ASC BETWEEN RN_DESC - 1 AND RN_DESC + 1 # 两个方向各排一次

```

```sql

rnk>=total_counts/2 and rnk-1<=total_counts/2

```

```sql

SELECT

Id, Company, Salary

FROM

(SELECT

e.Id,

e.Salary,

e.Company,

IF(@prev = e.Company, @Rank:=@Rank + 1, @Rank:=1) AS rank,

@prev:=e.Company

FROM

Employee e, (SELECT @Rank:=0, @prev:=0) AS temp

ORDER BY e.Company , e.Salary , e.Id) Ranking

INNER JOIN

(SELECT

COUNT(*) AS totalcount, Company AS name

FROM

Employee e2

GROUP BY e2.Company) companycount ON companycount.name = Ranking.Company

WHERE

Rank = FLOOR((totalcount + 1) / 2)

OR Rank = FLOOR((totalcount + 2) / 2)

```

```sql

SELECT

Employee.Id, Employee.Company, Employee.Salary

FROM

Employee,

Employee alias

WHERE

Employee.Company = alias.Company

GROUP BY Employee.Company , Employee.Salary

HAVING SUM(CASE

WHEN Employee.Salary = alias.Salary THEN 1

ELSE 0

END) >= ABS(SUM(SIGN(Employee.Salary - alias.Salary)))

ORDER BY Employee.Id

;

```

571

下边界:floor((total+1)/2)

上边界:floor((total+2)/2)

```sql

WITH tb1 AS (

SELECT *,

-- There are cum_num numbers in TABLE numbers less than or equal to number in that record

-- e.g. There are cum_num = 8 numbers in TABLE numbers less than or equal to 1

-- so you will see [1,1,8,12] AS [Number, Frequency, cum_num, num]

SUM(frequency) OVER (ORDER BY number) AS cum_num,

SUM(frequency) OVER () AS num

FROM numbers

)

SELECT AVG(number*1.0) AS median

FROM tb1

WHERE num / 2.0 BETWEEN cum_num - frequency AND cum_num;

```

```sql

WITH tmp AS (SELECT Number, Frequency,

SUM(Frequency) OVER (ORDER BY Number ASC) rk1,

SUM(Frequency) OVER (ORDER BY Number DESC) rk2

FROM Numbers

order by Number)

select avg(Number) as median

from tmp

where abs(rk1-rk2) <=Frequency;

```

下面是我自己的解法,练习自定义变量的

```sql

with a as (select number, frequency, sum(Frequency) over (order by number asc) ps from numbers)

select (t.u+k.l)/2 "median" from

(select min(number) u from a a, (select @down:=floor((sum(frequency)+1)/2),@upper:=floor((sum(frequency)+2)/2) from numbers) t

where a.ps >= @upper) t

join

(select min(number) l from a where a.ps >= @down) k

```

# 经典题型:把学生分成若干列:618

```sql

select min(case when continent = "America" then name end) America,

min(case when continent = "Asia" then name end) Asia,

min(case when continent = "Europe" then name end) Europe

from (select name, continent, row_number()

over(partition by continent order by name asc) nb from student) t

group by nb

```

注意这里min的用法,也可以替换成max,总之就是某一个聚类函数。这里加了聚类函数避免了group完之后只能返回一个set中的一个值,避免了null。这个max/min的用法**惊为天人 (group by id)**



```sql

SELECT MIN(America) America,MIN(Asia) Asia, MIN(Europe) Europe FROM

(SELECT row_number() OVER(PARTITION BY continent ORDER BY name ASC) AS n,

CASE WHEN continent='America' THEN name END AS America,

CASE WHEN continent='Asia' THEN name END AS Asia,

CASE WHEN continent='Europe' THEN name END AS Europe

FROM

student) t

GROUP BY n

```

下面是第二种解法,用自定义变量

```sql

select max(America) America, max(Asia) Asia, max(Europe) Europe from

(select

if(s.continent = 'America', name, null) as America,

if(s.continent = 'Europe', name, null) as Europe,

if(s.continent = 'Asia',name ,null) as Asia,

(case when s.continent = 'America' then @r1:= @r1+1

when s.continent = 'Europe' then @r2:= @r2+1

when s.continent = 'Asia' then @r3:= @r3+1 end) as row_id

from student s, (select @r1:= 0, @r2:= 0, @r3:= 0) t

order by name) m

group by row_id order by row_id

```

此处还是和之前的解法相似,注意自定义变量的用法

# 自定义变量

基本模式:把变量结合到表中

```sql

select * , (@var := @var + 1) "name"

from Table a, (select @var := 0) t

```

定义变量用

```sql

select @var1 := 0, @var2 := 0 ....

```

如果是命令行中定义用set,既可以是:=也可以是=。select中只允许:=

:= 的优先级是所有语句中最低的,所以要加上括号

注意刚select是不能列出表的,必须select两次才能显式看到表格

另一种变量@@var注意不能用在此处

大小写不敏感

未定义的就是null

用在where和having中必须另写一个select
12条回复
热度排序

发表回复