原题链接:595. Big Countries
考察:行筛选 or
Table: World
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| name | varchar |
| continent | varchar |
| area | int |
| population | int |
| gdp | bigint |
+-------------+---------+
In SQL, name is the primary key column for this table.
Each row of this table gives information about the name of a country, the continent to which it belongs, its area, the population, and its GDP value.
A country is big if:
Return the result table in any order.
The result format is in the following example.
Example 1:
Input:
World table:
+-------------+-----------+---------+------------+--------------+
| name | continent | area | population | gdp |
+-------------+-----------+---------+------------+--------------+
| Afghanistan | Asia | 652230 | 25500100 | 20343000000 |
| Albania | Europe | 28748 | 2831741 | 12960000000 |
| Algeria | Africa | 2381741 | 37100000 | 188681000000 |
| Andorra | Europe | 468 | 78115 | 3712000000 |
| Angola | Africa | 1246700 | 20609294 | 100990000000 |
+-------------+-----------+---------+------------+--------------+
Output:
+-------------+------------+---------+
| name | population | area |
+-------------+------------+---------+
| Afghanistan | 25500100 | 652230 |
| Algeria | 37100000 | 2381741 |
+-------------+------------+---------+
题目大意:
找到所有的大国家,一个大国家需要满足给定的条件A或者条件B
pandas 思路1:
用两个条件进行行筛选,注意是或
pandas 实现1:
import pandas as pd
def big_countries(world: pd.DataFrame) -> pd.DataFrame:
res = world[(world['area'] >= 3000000) | (world['population'] >= 25000000)]
return res[['name', 'population', 'area']]
pandas 思路2:
也是两个条件的筛选,但是用 loc[]
pandas 实现2:
import pandas as pd
def big_countries(world: pd.DataFrame) -> pd.DataFrame:
return world.loc[(world['area'] >= 3000000) | (world['population'] >= 25000000), ['name', 'population', 'area']]
MySQL 思路:
用 where
筛选,两个条件是或的关系,用 OR
MySQL 实现:
SELECT
name,
population,
area
FROM
World
WHERE
area >= 3000000
OR population >= 25000000
原题链接:1757. Recyclable and Low Fat Products
考察:行筛选 and
Table: Products
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| product_id | int |
| low_fats | enum |
| recyclable | enum |
+-------------+---------+
In SQL, product_id is the primary key for this table.
low_fats is an ENUM of type ('Y', 'N') where 'Y' means this product is low fat and 'N' means it is not.
recyclable is an ENUM of types ('Y', 'N') where 'Y' means this product is recyclable and 'N' means it is not.
Find the ids of products that are both low fat and recyclable.
Return the result table in any order.
The result format is in the following example.
Example 1:
input:
Products table:
+-------------+----------+------------+
| product_id | low_fats | recyclable |
+-------------+----------+------------+
| 0 | Y | N |
| 1 | Y | Y |
| 2 | N | Y |
| 3 | Y | Y |
| 4 | N | N |
+-------------+----------+------------+
Output:
+-------------+
| product_id |
+-------------+
| 1 |
| 3 |
+-------------+
Explanation: Only products 1 and 3 are both low fat and recyclable.
题目大意:
返回既满足条件A也满足条件B的产品编号
pandas 思路:
两个条件进行筛选,和上一题就是 and
和 or
的区别,也可以用 loc
pandas 写法:
import pandas as pd
def find_products(products: pd.DataFrame) -> pd.DataFrame:
res = products[(products['low_fats'] == 'Y') & (products['recyclable'] == 'Y')]
return res[['product_id']]
MySQL 思路:
用 where
筛选,两个条件是且的关系,用 AND
MySQL 写法:
SELECT
product_id
FROM
Products
WHERE
low_fats = 'Y'
AND recyclable = 'Y'
原题链接:183. Customers Who Never Order
考察:合并、选取非空、排除条件
Table: Customers
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| id | int |
| name | varchar |
+-------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table indicates the ID and name of a customer.
Table: Orders
+-------------+------+
| Column Name | Type |
+-------------+------+
| id | int |
| customerId | int |
+-------------+------+
id is the primary key (column with unique values) for this table.
customerId is a foreign key (reference columns) of the ID from the Customers table.
Each row of this table indicates the ID of an order and the ID of the customer who ordered it.
Write a solution to find all customers who never order anything.
Return the result table in any order .
The result format is in the following example.
Example 1:
Input:
Customers table:
+----+-------+
| id | name |
+----+-------+
| 1 | Joe |
| 2 | Henry |
| 3 | Sam |
| 4 | Max |
+----+-------+
Orders table:
+----+------------+
| id | customerId |
+----+------------+
| 1 | 3 |
| 2 | 1 |
+----+------------+
Output:
+-----------+
| Customers |
+-----------+
| Henry |
| Max |
+-----------+
题目大意:
给了两个表,一个是顾客表,一个是订单表,要求返回没有点过单的顾客名称
pandas 思路1:
合并,没有点过单的在 customerId
里会为空,行筛选即可
pandas 实现1:
import pandas as pd
def find_customers(customers: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:
tmp = pd.merge(customers, orders, how='left', left_on='id', right_on='customerId')
tmp2 = tmp[tmp['customerId'].isna() == True]
tmp2.rename(columns={'name':'Customers'}, inplace=True)
return tmp2[['Customers']]
pandas 思路2:
在customers表中选取没有在orders的id中出现过的
pandas 实现2:
import pandas as pd
def find_customers(customers: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:
# 选择id没有在orders中出现过的
df = customers[~customers['id'].isin(orders['customerId'])]
# 重命名
df = df[['name']].rename(columns={'name': 'Customers'})
return df
MySQL 思路1:
左连接两个表,where筛选 customerId
为空的
MySQL 实现1:
SELECT
name AS Customers
FROM
Customers a
LEFT JOIN Orders b ON a.id = b.customerId
WHERE
b.customerId IS NULL
MySQL 思路2:
子查询orders中的id,然后用 not in
MySQL 实现2:
SELECT
customers.name AS Customers
FROM
customers
WHERE
customers.id NOT IN ( SELECT customerid FROM orders )
原题链接:1148. Article Views I
考察:去重、排序
Table: Views
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| article_id | int |
| author_id | int |
| viewer_id | int |
| view_date | date |
+---------------+---------+
There is no primary key (column with unique values) for this table, the table may have duplicate rows.
Each row of this table indicates that some viewer viewed an article (written by some author) on some date.
Note that equal author_id and viewer_id indicate the same person.
Write a solution to find all the authors that viewed at least one of their own articles.
Return the result table sorted by id
in ascending order.
The result format is in the following example.
Example 1:
Input:
Views table:
+------------+-----------+-----------+------------+
| article_id | author_id | viewer_id | view_date |
+------------+-----------+-----------+------------+
| 1 | 3 | 5 | 2019-08-01 |
| 1 | 3 | 6 | 2019-08-02 |
| 2 | 7 | 7 | 2019-08-01 |
| 2 | 7 | 6 | 2019-08-02 |
| 4 | 7 | 1 | 2019-07-22 |
| 3 | 4 | 4 | 2019-07-21 |
| 3 | 4 | 4 | 2019-07-21 |
+------------+-----------+-----------+------------+
Output:
+------+
| id |
+------+
| 4 |
| 7 |
+------+
题目大意:
给一个表,记录了文章id、作者id、读者id和查阅时间,要求返回看过自己作品的作者的id,并按id递增排序
pandas思路:
行筛选后,用 drop_duplicates()
去重,然后用 sort_values()
进行排序
pandas实现:
import pandas as pd
def article_views(views: pd.DataFrame) -> pd.DataFrame:
tmp = views[views['author_id'] == views['viewer_id']]
tmp.rename(columns={'author_id':'id'}, inplace=True)
tmp.drop_duplicates(subset='id', keep='first', inplace=True)
tmp.sort_values(by='id', inplace=True)
return tmp[['id']]
MySQL思路:
用 where
和 order by
MySQL实现:
SELECT DISTINCT
author_id AS id
FROM
Views
WHERE
author_id = viewer_id
ORDER BY
author_id
原题链接:1683. Invalid Tweets
考察点:字符串、行筛选、loc
Table: Tweets
+----------------+---------+
| Column Name | Type |
+----------------+---------+
| tweet_id | int |
| content | varchar |
+----------------+---------+
In SQL, tweet_id is the primary key for this table.
This table contains all the tweets in a social media app.
Find the IDs of the invalid tweets. The tweet is invalid if the number of characters used in the content of the tweet is strictly greater than 15
.
Return the result table in any order .
The result format is in the following example.
Example 1:
Input:
Tweets table:
+----------+----------------------------------+
| tweet_id | content |
+----------+----------------------------------+
| 1 | Vote for Biden |
| 2 | Let us make America great again! |
+----------+----------------------------------+
Output:
+----------+
| tweet_id |
+----------+
| 2 |
+----------+
Explanation:
Tweet 1 has length = 14. It is a valid tweet.
Tweet 2 has length = 32. It is an invalid tweet.
题目大意:
返回不合法(字数大于15)的推特的id
pandas 思路1:
直接对这一字段进行判断
pandas 实现1:
import pandas as pd
def invalid_tweets(tweets: pd.DataFrame) -> pd.DataFrame:
invalid = tweets['content'].str.len() > 15 # 结果是一个布尔Series
return tweets[invalid][['tweet_id']]
# 用loc一步实现 推荐用这个
def invalid_tweets(tweets: pd.DataFrame) -> pd.DataFrame:
return tweets.loc[tweets['content'].str.len() > 15, ['tweet_id']]
pandas 思路2:
写一个判断函数,在本题这么做其实没必要
pandas 实现2:
import pandas as pd
def check(str) -> bool:
return len(str) > 15
def invalid_tweets(tweets: pd.DataFrame) -> pd.DataFrame:
tweets['flag'] = tweets['content'].apply(lambda str: check(str))
return tweets[tweets['flag']][['tweet_id']]
mysql写法:
SELECT
tweet_id
FROM
Tweets
WHERE
LENGTH( content ) > 15
原题链接:1873. Calculate Special Bonus
考点:apply和lambda、条件筛选
Table: Employees
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| employee_id | int |
| name | varchar |
| salary | int |
+-------------+---------+
In SQL, employee_id is the primary key for this table.
Each row of this table indicates the employee ID, employee name, and salary.
Calculate the bonus of each employee. The bonus of an employee is 100% of their salary if the ID of the employee is an odd number and the employee name does not start with the character 'M'
. The bonus of an employee is 0 otherwise.
Return the result table ordered by employee_id
.
The result format is in the following example.
Example 1:
Input:
Employees table:
+-------------+---------+--------+
| employee_id | name | salary |
+-------------+---------+--------+
| 2 | Meir | 3000 |
| 3 | Michael | 3800 |
| 7 | Addilyn | 7400 |
| 8 | Juan | 6100 |
| 9 | Kannon | 7700 |
+-------------+---------+--------+
Output:
+-------------+-------+
| employee_id | bonus |
+-------------+-------+
| 2 | 0 |
| 3 | 0 |
| 7 | 7400 |
| 8 | 0 |
| 9 | 7700 |
+-------------+-------+
Explanation:
The employees with IDs 2 and 8 get 0 bonus because they have an even employee_id.
The employee with ID 3 gets 0 bonus because their name starts with ‘M’.
The rest of the employees get a 100% bonus.
题目大意:
给每个人计算奖金,满足两个条件的人的奖金就是他的薪水,其余的人的奖金为0
pandas 思路1:
首先将所有人的奖金设置为他的薪水,其次找到所有不满足条件的人,将他们的奖金设置为0,都先设置为0也是一样的
pandas 实现1:
import pandas as pd
def calculate_special_bonus(employees: pd.DataFrame) -> pd.DataFrame:
employees['bonus'] = employees['salary']
index0 = employees.loc[(~employees['employee_id'] % 2) | (employees['name'].str.startswith('M'))].index
employees.loc[index0, 'bonus'] = 0
employees.sort_values(by='employee_id', inplace=True)
return employees[['employee_id', 'bonus']]
pandas 思路2:
条件判断,使用apply和lambda的组合
pandas 实现2:
import pandas as pd
def calculate_special_bonus(employees: pd.DataFrame) -> pd.DataFrame:
employees['bonus'] = employees.apply(
lambda x: x['salary'] if x['employee_id'] % 2 and not x['name'].startswith('M') else 0,
axis=1
)
df = employees[['employee_id', 'bonus']].sort_values('employee_id')
return df
MySQL 思路:
使用 if
MySQL 实现:
SELECT
employee_id,
IF(employee_id % 2 = 1 AND name NOT REGEXP '^M', salary, 0) AS bonus
FROM
employees
ORDER BY
employee_id
原题链接:1667. Fix Names in a Table
考察:字符串处理
Table: Users
+----------------+---------+
| Column Name | Type |
+----------------+---------+
| user_id | int |
| name | varchar |
+----------------+---------+
In SQL, user_id is the primary key for this table.
This table contains the ID and the name of the user. The name consists of only lowercase and uppercase characters.
Fix the names so that only the first character is uppercase and the rest are lowercase.
Return the result table ordered by user_id
.
The result format is in the following example.
Example 1:
Input:
Users table:
+---------+-------+
| user_id | name |
+---------+-------+
| 1 | aLice |
| 2 | bOB |
+---------+-------+
Output:
+---------+-------+
| user_id | name |
+---------+-------+
| 1 | Alice |
| 2 | Bob |
+---------+-------+
题目大意:
将姓名这一列变为首字母大写其余字母小写的形式
pandas 思路1:
python里面刚好有 capitalize()
函数,满足要求
pandas 实现1:
import pandas as pd
def fix_names(users: pd.DataFrame) -> pd.DataFrame:
users['name'] = users['name'].str.capitalize()
users.sort_values(by='user_id', inplace=True)
return users
pandas 思路2:
如果不知道 capitalize()
函数,那么就用模拟的方法,对于第一个字母将它大写,对于其余的字母将它小写
pandas 实现2:
def fix_names(users: pd.DataFrame) -> pd.DataFrame:
users['name'] = users['name'].str[0].str.upper() + users['name'].str[1:].str.lower()
users.sort_values(by='user_id', inplace=True)
return users
MySQL 思路1:
用对于第一个字母将它大写,其余字母小写,然后进行连接
MySQL 实现1:
-- 用substring
SELECT user_id,
CONCAT(UPPER(SUBSTRING(name, 1, 1)), LOWER(SUBSTRING(name, 2))) AS name
FROM Users
ORDER BY user_id
-- 用left right
SELECT user_id,
CONCAT(UPPER(LEFT(name, 1)), LOWER(RIGHT(name, length(name) - 1))) AS name
FROM Users
ORDER BY user_id
补充:Python字符串大小写转换
str = "I love YOU"
print(str.upper()) # 把所有字符中的小写字母转换成大写字母
print(str.lower()) # 把所有字符中的大写字母转换成小写字母
print(str.capitalize()) # 把第一个字母转化为大写字母,其余小写
print(str.title()) # 把每个单词的第一个字母转化为大写,其余小写
原题链接:1517. Find Users With Valid E-Mails
考察点:正则表达式
Table: Users
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| name | varchar |
| mail | varchar |
+---------------+---------+
In SQL, user_id is the primary key for this table.
This table contains information of the users signed up in a website. Some e-mails are invalid.
Find the users who have valid emails .
A valid e-mail has a prefix name and a domain where:
'_'
, period '.'
, and/or dash '-'
. The prefix name must start with a letter.'@leetcode.com'
.Return the result table in any order .
The result format is in the following example.
Example 1:
Input:
Users table:
+---------+-----------+-------------------------+
| user_id | name | mail |
+---------+-----------+-------------------------+
| 1 | Winston | [email protected] |
| 2 | Jonathan | jonathanisgreat |
| 3 | Annabelle | [email protected] |
| 4 | Sally | [email protected] |
| 5 | Marwan | quarz#[email protected] |
| 6 | David | [email protected] |
| 7 | Shapiro | [email protected] |
+---------+-----------+-------------------------+
Output:
+---------+-----------+-------------------------+
| user_id | name | mail |
+---------+-----------+-------------------------+
| 1 | Winston | [email protected] |
| 3 | Annabelle | [email protected] |
| 4 | Sally | [email protected] |
+---------+-----------+-------------------------+
Explanation:
The mail of user 2 does not have a domain.
The mail of user 5 has the # sign which is not allowed.
The mail of user 6 does not have the leetcode domain.
The mail of user 7 starts with a period.
题目大意:
返回有合理的email的用户的信息
思路:
恰好是个机会学习一下正则表达式,也是第一次知道在SQL里也能用正则
pandas写法:
import pandas as pd
# 力扣官方题解
def valid_emails(df: pd.DataFrame) -> pd.DataFrame:
## 注意我们如何使用原始字符串(在前面放一个‘r’)来避免必须转义反斜杠
# 还要注意,我们对`@`字符进行了转义,因为它在某些正则表达式中具有特殊意义
return users[users["mail"].str.match(r"^[a-zA-Z][a-zA-Z0-9_.-]*\@leetcode\.com$")]
mysql写法:
-- 力扣官方题解
SELECT user_id, name, mail
FROM Users
-- 请注意,我们还转义了`@`字符,因为它在某些正则表达式中具有特殊意义
WHERE mail REGEXP '^[a-zA-Z][a-zA-Z0-9_.-]*\\@leetcode\\.com$';
原题链接:1527. Patients With a Condition
考察:字符串查找
Table: Patients
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| patient_id | int |
| patient_name | varchar |
| conditions | varchar |
+--------------+---------+
In SQL, patient_id is the primary key for this table.
'conditions' contains 0 or more code separated by spaces.
This table contains information of the patients in the hospital.
Find the patient_id, patient_name and conditions of the patients who have Type I Diabetes. Type I Diabetes always starts with DIAB1
prefix.
Return the result table in any order .
The result format is in the following example.
Example 1:
Input:
Patients table:
+------------+--------------+--------------+
| patient_id | patient_name | conditions |
+------------+--------------+--------------+
| 1 | Daniel | YFEV COUGH |
| 2 | Alice | |
| 3 | Bob | DIAB100 MYOP |
| 4 | George | ACNE DIAB100 |
| 5 | Alain | DIAB201 |
+------------+--------------+--------------+
Output:
+------------+--------------+--------------+
| patient_id | patient_name | conditions |
+------------+--------------+--------------+
| 3 | Bob | DIAB100 MYOP |
| 4 | George | ACNE DIAB100 |
+------------+--------------+--------------+
Explanation: Bob and George both have a condition that starts with DIAB1.
题目大意:
返回患有一型糖尿病的换着的信息
思路:
题目的要求是找到所有满足condition里包含以 DIAB1
开头的字符的行。
condition里由多个字符组成,我首先想到的是将condition转换为列表,然后循环判断,但其实只要判断:
DIAB1
开头 DIAB1
字符,(D前面有个空格)pandas 实现:
import pandas as pd
def find_patients(patients: pd.DataFrame) -> pd.DataFrame:
# 也可以用patients['conditions'].str.find(' DIAB1') != -1
return patients[patients['conditions'].str.startswith('DIAB1') | patients['conditions'].str.contains(' DIAB1', regex=False)]
MySQL 实现:
SELECT
*
FROM
Patients
WHERE
conditions LIKE 'DIAB1%'
OR conditions LIKE '% DIAB1%'
原题链接:177. Nth Highest Salary
考察:去重、排序、返回第n位,新建df
Table: Employee
+-------------+------+
| Column Name | Type |
+-------------+------+
| id | int |
| salary | int |
+-------------+------+
id is the primary key (column with unique values) for this table.
Each row of this table contains information about the salary of an employee.
Write a solution to find the nth highest salary from the Employee
table. If there is no nth highest salary, return ·null· .
The result format is in the following example.
Example 1:
Input:
Employee table:
+----+--------+
| id | salary |
+----+--------+
| 1 | 100 |
| 2 | 200 |
| 3 | 300 |
+----+--------+
n = 2
Output:
+------------------------+
| getNthHighestSalary(2) |
+------------------------+
| 200 |
+------------------------+
Example 2:
Input:
Employee table:
+----+--------+
| id | salary |
+----+--------+
| 1 | 100 |
+----+--------+
n = 2
Output:
+------------------------+
| getNthHighestSalary(2) |
+------------------------+
| null |
+------------------------+
题目大意:
返回第n高的薪水
pandas 思路:
题目涉及了去重、排序和选择
pandas采用 drop_duplicates()
去重,sort_values()
排序,选则第n条采用 head(N)
组合tail(1)
注意返回的df需要重新建立
pandas 实现:
import pandas as pd
# 我的写法
def nth_highest_salary(employee: pd.DataFrame, N: int) -> pd.DataFrame:
employee = employee.drop_duplicates(subset='salary') # 去重
employee = employee.sort_values(by='salary', ascending=False) # 降序排列
if employee.shape[0] < N:
ans = None
else:
ans = int(employee.head(N).tail(1)['salary'])
return pd.DataFrame({'getNthHighestSalary(n)':[ans]})
# 官方写法 感觉会更好
def nth_highest_salary(employee: pd.DataFrame, N: int) -> pd.DataFrame:
df = employee[["salary"]].drop_duplicates()
if len(df) < N:
return pd.DataFrame({'getNthHighestSalary(2)': [None]})
return df.sort_values("salary", ascending=False).head(N).tail(1)
MySQL 思路:
MySQL主要考察了 limit
的使用,用于输出第n位
MySQL 实现:
CREATE FUNCTION getNthHighestSalary(N INT) RETURNS INT
BEGIN
DECLARE M INT;
SET M = N-1;
RETURN (
# Write your MySQL query statement below.
SELECT DISTINCT salary
FROM employee
ORDER BY salary DESC
LIMIT M, 1
);
END
补充:SQL 中,查询中子句的执行顺序
注意:你的 DBMS 可能会以等价但 不同 的顺序执行一个查询。
原题链接:176. Second Highest Salary
考察:去重、排序、取第n、为空情况
Table: Employee
+-------------+------+
| Column Name | Type |
+-------------+------+
| id | int |
| salary | int |
+-------------+------+
id is the primary key (column with unique values) for this table.
Each row of this table contains information about the salary of an employee.
Write a solution to find the second highest salary from the Employee
table. If there is no second highest salary, return null
(return None
in Pandas).
The result format is in the following example.
Example 1:
Input:
Employee table:
+----+--------+
| id | salary |
+----+--------+
| 1 | 100 |
| 2 | 200 |
| 3 | 300 |
+----+--------+
Output:
+---------------------+
| SecondHighestSalary |
+---------------------+
| 200 |
+---------------------+
Example 2:
Input:
Employee table:
+----+--------+
| id | salary |
+----+--------+
| 1 | 100 |
+----+--------+
Output:
+---------------------+
| SecondHighestSalary |
+---------------------+
| null |
+---------------------+
题目大意:
返回第二高的薪水,如果不存在就返回空
pandas 思路:
要做的就是去重、排序、取第二高,以及没有结果的情况返回空
pandas 实现:
import pandas as pd
def second_highest_salary(employee: pd.DataFrame) -> pd.DataFrame:
employee.drop_duplicates(subset='salary', inplace=True) # 去重
employee = employee.sort_values(by='salary', ascending=False) # 降序排列
if employee.shape[0] < 2:
ans = None
else:
ans = int(employee.head(2).tail(1)['salary'])
return pd.DataFrame({'SecondHighestSalary':[ans]})
MySQL 思路1:
使用子查询和 limit
子句,外面再套一层,这样为空的情况就可以正确显示
MySQL 实现1:
SELECT
(SELECT DISTINCT
Salary
FROM
Employee
ORDER BY Salary DESC
LIMIT 1 OFFSET 1) AS SecondHighestSalary
MySQL 思路2:
用 ifnull
来处理不存在的情况
MySQL 实现2:
select ifnull(
(select distinct Salary
from Employee
order by Salary desc
limit 1,1
),null
) as SecondHighestSalary;
补充:
limit n, m
:先获取到游标n的位置,再从此位置开始往后取m条数据,不足m条的返回实际的数量IFNULL(表达式1, 表达式2)
:如果表达式1的值不为null返回表达式1的值,否则返回表达式2的值原题链接:184. Department Highest Salary
考察:groupby
Pandas Schema:
data = [[1, 'Joe', 70000, 1], [2, 'Jim', 90000, 1], [3, 'Henry', 80000, 2], [4, 'Sam', 60000, 2], [5, 'Max', 90000, 1]]
Employee = pd.DataFrame(data, columns=['id', 'name', 'salary', 'departmentId']).astype({'id':'Int64', 'name':'object', 'salary':'Int64', 'departmentId':'Int64'})
data = [[1, 'IT'], [2, 'Sales']]
Department = pd.DataFrame(data, columns=['id', 'name']).astype({'id':'Int64', 'name':'object'})
Table: Employee
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| id | int |
| name | varchar |
| salary | int |
| departmentId | int |
+--------------+---------+
id is the primary key (column with unique values) for this table.
departmentId is a foreign key (reference columns) of the ID from the Department table.
Each row of this table indicates the ID, name, and salary of an employee. It also contains the ID of their department.
Table: Department
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| id | int |
| name | varchar |
+-------------+---------+
id is the primary key (column with unique values) for this table. It is guaranteed that department name is not NULL.
Each row of this table indicates the ID of a department and its name.
Write a solution to find employees who have the highest salary in each of the departments.
Return the result table in any order .
The result format is in the following example.
Example 1:
Input:
Employee table:
+----+-------+--------+--------------+
| id | name | salary | departmentId |
+----+-------+--------+--------------+
| 1 | Joe | 70000 | 1 |
| 2 | Jim | 90000 | 1 |
| 3 | Henry | 80000 | 2 |
| 4 | Sam | 60000 | 2 |
| 5 | Max | 90000 | 1 |
+----+-------+--------+--------------+
Department table:
+----+-------+
| id | name |
+----+-------+
| 1 | IT |
| 2 | Sales |
+----+-------+
Output:
+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT | Jim | 90000 |
| Sales | Henry | 80000 |
| IT | Max | 90000 |
+------------+----------+--------+
Explanation: Max and Jim both have the highest salary in the IT department and Henry has the highest salary in the Sales department.
题目大意:
返回每个部门里工资最高的人的信息
pandas 思路:
首先需要链接两个表,想到了左连接
其次要注意最高薪资是多个人的情况要均保留,因此不能简单的排序后去重
pandas 实现:
import pandas as pd
def department_highest_salary(employee: pd.DataFrame, department: pd.DataFrame) -> pd.DataFrame:
ans = pd.merge(employee, department, how='left', left_on='departmentId', right_on='id') # 合并两个表
ans.rename(columns={'name_x': 'Employee', 'name_y': 'Department', 'salary': 'Salary'}, inplace=True) # 重命名
# 选择工资等于部门最高工资的员工
max_salary = ans.groupby('Department')['Salary'].transform('max')
ans = ans[ans['Salary'] == max_salary]
return ans[['Department', 'Employee', 'Salary']]
MySQL 思路:
先用子查询查出每个部门的最高薪资,然后用 in
查询(DepartmentId, Salary) 在临时表中的结果
MySQL 实现:
SELECT
b.name AS Department,
a.name AS Employee,
a.salary AS salary
FROM
Employee a
JOIN Department b ON a.departmentId = b.id
WHERE
( a.departmentId, a.salary ) IN ( SELECT departmentId, max( salary ) FROM Employee GROUP BY departmentId )
原题链接:178. Rank Scores
考察:排序、窗口函数
Table: Scores
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| id | int |
| score | decimal |
+-------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table contains the score of a game. Score is a floating point value with two decimal places.
Write a solution to find the rank of the scores. The ranking should be calculated according to the following rules:
Return the result table ordered by score
in descending order.
The result format is in the following example.
Example 1:
Input:
Scores table:
+----+-------+
| id | score |
+----+-------+
| 1 | 3.50 |
| 2 | 3.65 |
| 3 | 4.00 |
| 4 | 3.85 |
| 5 | 4.00 |
| 6 | 3.65 |
+----+-------+
Output:
+-------+------+
| score | rank |
+-------+------+
| 4.00 | 1 |
| 4.00 | 1 |
| 3.85 | 2 |
| 3.65 | 3 |
| 3.65 | 3 |
| 3.50 | 4 |
+-------+------+
题目大意:
返回根据排名升序之后的结果
pandas 思路:
就是经典密集型排序(相同分数的采用统一排名),pandas偷懒的话可以直接用 rank()
pandas 实现:
import pandas as pd
def order_scores(scores: pd.DataFrame) -> pd.DataFrame:
scores['rank'] = scores['score'].rank(method='dense',ascending=False) # 使用rank函数 密集型 相同分数相同排名
scores.sort_values(by='rank', ascending=True, inplace=True)
return scores[['score', 'rank']]
MySQL 思路1:
窗口函数(窗口函数对一组查询行执行类似于聚合的操作。但是,聚合操作将查询行分组为一个单独的结果行,而窗口函数为每个查询行生成一个结果), dense_rank()
窗口函数恰好满足要求
MySQL 实现1:
SELECT
score,
dense_rank( ) over ( ORDER BY score DESC ) AS 'rank'
FROM
scores
MySQL 思路2:
类似于计数排序的思路,找到大于等于本分数的个数,就得到排名
使用相关子查询来实现:
MySQL 实现2:
SELECT
S1.score,
( SELECT COUNT( DISTINCT S2.score ) FROM Scores S2 WHERE S2.score >= S1.score ) AS 'rank'
FROM
Scores S1
ORDER BY
S1.score DESC
补充:常见的三种排序方式:
原题链接:196. Delete Duplicate Emails
考察:groupby
Table: Person
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| id | int |
| email | varchar |
+-------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table contains an email. The emails will not contain uppercase letters.
Write a solution to delete all duplicate emails, keeping only one unique email with the smallest id
.
For SQL users, please note that you are supposed to write a DELETE
statement and not a SELECT
one.
For Pandas users, please note that you are supposed to modify Person
in place.
After running your script, the answer shown is the Person
table. The driver will first compile and run your piece of code and then show the Person
table. The final order of the Person
table does not matter .
The result format is in the following example.
Example 1:
Input:
Person table:
+----+------------------+
| id | email |
+----+------------------+
| 1 | [email protected] |
| 2 | [email protected] |
| 3 | [email protected] |
+----+------------------+
Output:
+----+------------------+
| id | email |
+----+------------------+
| 1 | [email protected] |
| 2 | [email protected] |
+----+------------------+
Explanation: [email protected] is repeated two times. We keep the row with the smallest Id = 1.
题目大意:
题目要求删去重复的邮箱,只保留id最小的额那一条
pandas 思路:
通过 groupby('email')['id'].transform('min')
将整个df按相同的email进行分组,并得到每个组最小的id组成的series,再根据他们的index用drop()
进行删除。题目要求算法原地工作,那么设置 inplace=True
即可
pandas 实现:
import pandas as pd
def delete_duplicate_emails(person: pd.DataFrame) -> None:
min_id = person.groupby('email')['id'].transform('min') # 找到最小的id
removed_person = person[person['id'] != min_id]
person.drop(removed_person.index, inplace=True) # 删除对应index
MySQL 思路:
官方的写法是通过内连接,将每条记录和其他与它有相同邮箱的记录进行比较,当他的id不是最小的,就进行删除
MySQL 实现:
DELETE p1 FROM Person p1,
Person p2
WHERE
p1.Email = p2.Email AND p1.Id > p2.Id
原题链接:1795. Rearrange Products Table
考察:表合并
Table: Products
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| product_id | int |
| store1 | int |
| store2 | int |
| store3 | int |
+-------------+---------+
product_id is the primary key (column with unique values) for this table.
Each row in this table indicates the product's price in 3 different stores: store1, store2, and store3.
If the product is not available in a store, the price will be null in that store's column.
Write a solution to rearrange the Products
table so that each row has (product_id, store, price
). If a product is not available in a store, do not include a row with that product_id
and store
combination in the result table.
Return the result table in any order.
The result format is in the following example.
Example 1:
Input:
Products table:
+------------+--------+--------+--------+
| product_id | store1 | store2 | store3 |
+------------+--------+--------+--------+
| 0 | 95 | 100 | 105 |
| 1 | 70 | null | 80 |
+------------+--------+--------+--------+
Output:
+------------+--------+-------+
| product_id | store | price |
+------------+--------+-------+
| 0 | store1 | 95 |
| 0 | store2 | 100 |
| 0 | store3 | 105 |
| 1 | store1 | 70 |
| 1 | store3 | 80 |
+------------+--------+-------+
Explanation:
Product 0 is available in all three stores with prices 95, 100, and 105 respectively.
Product 1 is available in store1 with price 70 and store3 with price 80. The product is not available in store2.
题目大意:
修改表的结构,由原来表的结构修改到目标表的结构
pandas 思路:
行转列,原先的表名现在称为store列下的值。那么可以遍历三个store,然后单独处理,最后再concat()
到一起
pandas 实现:
import pandas as pd
def rearrange_products_table(products: pd.DataFrame) -> pd.DataFrame:
store_list = ['store1', 'store2', 'store3']
ans = pd.DataFrame(columns=['product_id', 'store', 'price']) # 先设置一个空的ans
# 遍历三个store
for store in store_list:
tmp = products.loc[products[store].notnull(), ['product_id', store]]
tmp.rename(columns={store: 'price'}, inplace=True)
tmp['store'] = store
tmp = tmp[['product_id', 'store', 'price']]
ans = pd.concat([ans, tmp])
return ans
MySQL 思路:
重新排列表格,将三个商店的各自查询结果 union
成为一个完整结果
MySQL 实现:
SELECT product_id, 'store1' AS store, store1 AS price
FROM Products
WHERE store1 IS NOT NULL
UNION
SELECT product_id, 'store2' AS store, store2 AS price
FROM Products
WHERE store2 IS NOT NULL
UNION
SELECT product_id, 'store3' AS store, store3 AS price
FROM Products
WHERE store3 IS NOT NULL