【Leetcode 30天Pandas挑战】学习记录 上

题目列表:

  • 条件筛选:
    • 595. Big Countries
    • 1757. Recyclable and Low Fat Products
    • 183. Customers Who Never Order
    • 1148. Article Views I
  • 字符串函数:
    • 1683. Invalid Tweets
    • 1873. Calculate Special Bonus(好题)
    • 1667. Fix Names in a Table(好题)
    • 1517. Find Users With Valid E-Mails(好题)
    • 1527. Patients With a Condition(好题)
  • 数据操作:
    • 177. Nth Highest Salary(好题)
    • 176. Second Highest Salary
    • 184. Department Highest Salary(好题)
    • 178. Rank Scores(好题)
    • 196. Delete Duplicate Emails(好题)
    • 1795. Rearrange Products Table

条件筛选:

595. Big Countries

原题链接:595. Big Countries
考察:行筛选 or

Table: World

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| name        | varchar |
| continent   | varchar |
| area        | int     |
| population  | int     |
| gdp         | bigint  |
+-------------+---------+

In SQL, name is the primary key column for this table.
Each row of this table gives information about the name of a country, the continent to which it belongs, its area, the population, and its GDP value.

A country is big if:

  • it has an area of at least three million (i.e., 3000000 km2), or
  • it has a population of at least twenty-five million (i.e., 25000000).
    Find the name, population, and area of the big countries.

Return the result table in any order.

The result format is in the following example.

Example 1:

Input:

World table:
+-------------+-----------+---------+------------+--------------+
| name        | continent | area    | population | gdp          |
+-------------+-----------+---------+------------+--------------+
| Afghanistan | Asia      | 652230  | 25500100   | 20343000000  |
| Albania     | Europe    | 28748   | 2831741    | 12960000000  |
| Algeria     | Africa    | 2381741 | 37100000   | 188681000000 |
| Andorra     | Europe    | 468     | 78115      | 3712000000   |
| Angola      | Africa    | 1246700 | 20609294   | 100990000000 |
+-------------+-----------+---------+------------+--------------+

Output:

+-------------+------------+---------+
| name        | population | area    |
+-------------+------------+---------+
| Afghanistan | 25500100   | 652230  |
| Algeria     | 37100000   | 2381741 |
+-------------+------------+---------+

题目大意:
找到所有的大国家,一个大国家需要满足给定的条件A或者条件B

pandas 思路1:
用两个条件进行行筛选,注意是或

pandas 实现1:

import pandas as pd

def big_countries(world: pd.DataFrame) -> pd.DataFrame:
    res = world[(world['area'] >= 3000000) | (world['population'] >= 25000000)]
    return res[['name', 'population', 'area']]

pandas 思路2:
也是两个条件的筛选,但是用 loc[]

pandas 实现2:

import pandas as pd

def big_countries(world: pd.DataFrame) -> pd.DataFrame:
    return world.loc[(world['area'] >= 3000000) | (world['population'] >= 25000000), ['name', 'population', 'area']]

MySQL 思路:
where 筛选,两个条件是或的关系,用 OR

MySQL 实现:

SELECT
	name,
	population,
	area 
FROM
	World 
WHERE
	area >= 3000000 
	OR population >= 25000000



1757. Recyclable and Low Fat Products

原题链接:1757. Recyclable and Low Fat Products
考察:行筛选 and

Table: Products

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| product_id  | int     |
| low_fats    | enum    |
| recyclable  | enum    |
+-------------+---------+
In SQL, product_id is the primary key for this table.
low_fats is an ENUM of type ('Y', 'N') where 'Y' means this product is low fat and 'N' means it is not.
recyclable is an ENUM of types ('Y', 'N') where 'Y' means this product is recyclable and 'N' means it is not.

Find the ids of products that are both low fat and recyclable.

Return the result table in any order.

The result format is in the following example.

Example 1:

input:

Products table:
+-------------+----------+------------+
| product_id  | low_fats | recyclable |
+-------------+----------+------------+
| 0           | Y        | N          |
| 1           | Y        | Y          |
| 2           | N        | Y          |
| 3           | Y        | Y          |
| 4           | N        | N          |
+-------------+----------+------------+

Output:

+-------------+
| product_id  |
+-------------+
| 1           |
| 3           |
+-------------+

Explanation: Only products 1 and 3 are both low fat and recyclable.

题目大意:
返回既满足条件A也满足条件B的产品编号

pandas 思路:
两个条件进行筛选,和上一题就是 andor 的区别,也可以用 loc

pandas 写法:

import pandas as pd

def find_products(products: pd.DataFrame) -> pd.DataFrame:
    res = products[(products['low_fats'] == 'Y') & (products['recyclable'] == 'Y')]
    return res[['product_id']]

MySQL 思路:
where 筛选,两个条件是且的关系,用 AND

MySQL 写法:

SELECT
	product_id 
FROM
	Products 
WHERE
	low_fats = 'Y' 
	AND recyclable = 'Y'



183. Customers Who Never Order

原题链接:183. Customers Who Never Order
考察:合并、选取非空、排除条件

Table: Customers

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| name        | varchar |
+-------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table indicates the ID and name of a customer.

Table: Orders

+-------------+------+
| Column Name | Type |
+-------------+------+
| id          | int  |
| customerId  | int  |
+-------------+------+
id is the primary key (column with unique values) for this table.
customerId is a foreign key (reference columns) of the ID from the Customers table.
Each row of this table indicates the ID of an order and the ID of the customer who ordered it.

Write a solution to find all customers who never order anything.

Return the result table in any order .

The result format is in the following example.

Example 1:

Input:

Customers table:
+----+-------+
| id | name  |
+----+-------+
| 1  | Joe   |
| 2  | Henry |
| 3  | Sam   |
| 4  | Max   |
+----+-------+

Orders table:
+----+------------+
| id | customerId |
+----+------------+
| 1  | 3          |
| 2  | 1          |
+----+------------+

Output:

+-----------+
| Customers |
+-----------+
| Henry     |
| Max       |
+-----------+

题目大意:
给了两个表,一个是顾客表,一个是订单表,要求返回没有点过单的顾客名称

pandas 思路1:
合并,没有点过单的在 customerId 里会为空,行筛选即可

pandas 实现1:

import pandas as pd

def find_customers(customers: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:
    tmp = pd.merge(customers, orders, how='left', left_on='id', right_on='customerId')
    tmp2 = tmp[tmp['customerId'].isna() == True]
    tmp2.rename(columns={'name':'Customers'}, inplace=True)
    
    return tmp2[['Customers']]

pandas 思路2:
在customers表中选取没有在orders的id中出现过的

pandas 实现2:

import pandas as pd

def find_customers(customers: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:
    # 选择id没有在orders中出现过的
    df = customers[~customers['id'].isin(orders['customerId'])]

    # 重命名
    df = df[['name']].rename(columns={'name': 'Customers'})
    return df

MySQL 思路1:
左连接两个表,where筛选 customerId 为空的

MySQL 实现1:

SELECT
	name AS Customers 
FROM
	Customers a
	LEFT JOIN Orders b ON a.id = b.customerId 
WHERE
	b.customerId IS NULL

MySQL 思路2:
子查询orders中的id,然后用 not in

MySQL 实现2:

SELECT
	customers.name AS Customers 
FROM
	customers 
WHERE
	customers.id NOT IN ( SELECT customerid FROM orders )



1148. Article Views I

原题链接:1148. Article Views I
考察:去重、排序

Table: Views

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| article_id    | int     |
| author_id     | int     |
| viewer_id     | int     |
| view_date     | date    |
+---------------+---------+
There is no primary key (column with unique values) for this table, the table may have duplicate rows.
Each row of this table indicates that some viewer viewed an article (written by some author) on some date. 
Note that equal author_id and viewer_id indicate the same person.

Write a solution to find all the authors that viewed at least one of their own articles.

Return the result table sorted by id in ascending order.

The result format is in the following example.

Example 1:

Input:

Views table:
+------------+-----------+-----------+------------+
| article_id | author_id | viewer_id | view_date  |
+------------+-----------+-----------+------------+
| 1          | 3         | 5         | 2019-08-01 |
| 1          | 3         | 6         | 2019-08-02 |
| 2          | 7         | 7         | 2019-08-01 |
| 2          | 7         | 6         | 2019-08-02 |
| 4          | 7         | 1         | 2019-07-22 |
| 3          | 4         | 4         | 2019-07-21 |
| 3          | 4         | 4         | 2019-07-21 |
+------------+-----------+-----------+------------+

Output:

+------+
| id   |
+------+
| 4    |
| 7    |
+------+

题目大意:
给一个表,记录了文章id、作者id、读者id和查阅时间,要求返回看过自己作品的作者的id,并按id递增排序

pandas思路:
行筛选后,用 drop_duplicates() 去重,然后用 sort_values() 进行排序

pandas实现:

import pandas as pd

def article_views(views: pd.DataFrame) -> pd.DataFrame:
  tmp = views[views['author_id'] == views['viewer_id']] 
  tmp.rename(columns={'author_id':'id'}, inplace=True)
  tmp.drop_duplicates(subset='id', keep='first', inplace=True)
  tmp.sort_values(by='id', inplace=True)

  return tmp[['id']]

MySQL思路:
whereorder by

MySQL实现:

SELECT DISTINCT
	author_id AS id 
FROM
	Views 
WHERE
	author_id = viewer_id 
ORDER BY
	author_id



字符串函数:

1683. Invalid Tweets

原题链接:1683. Invalid Tweets
考察点:字符串、行筛选、loc

Table: Tweets

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| tweet_id       | int     |
| content        | varchar |
+----------------+---------+
In SQL, tweet_id is the primary key for this table.
This table contains all the tweets in a social media app.

Find the IDs of the invalid tweets. The tweet is invalid if the number of characters used in the content of the tweet is strictly greater than 15 .

Return the result table in any order .

The result format is in the following example.

Example 1:

Input:

Tweets table:
+----------+----------------------------------+
| tweet_id | content                          |
+----------+----------------------------------+
| 1        | Vote for Biden                   |
| 2        | Let us make America great again! |
+----------+----------------------------------+

Output:

+----------+
| tweet_id |
+----------+
| 2        |
+----------+

Explanation:
Tweet 1 has length = 14. It is a valid tweet.
Tweet 2 has length = 32. It is an invalid tweet.

题目大意:
返回不合法(字数大于15)的推特的id

pandas 思路1:
直接对这一字段进行判断

pandas 实现1:

import pandas as pd

def invalid_tweets(tweets: pd.DataFrame) -> pd.DataFrame:
    invalid = tweets['content'].str.len() > 15 # 结果是一个布尔Series
    return tweets[invalid][['tweet_id']]

# 用loc一步实现 推荐用这个
def invalid_tweets(tweets: pd.DataFrame) -> pd.DataFrame:
    return tweets.loc[tweets['content'].str.len() > 15, ['tweet_id']]    

pandas 思路2:
写一个判断函数,在本题这么做其实没必要

pandas 实现2:

import pandas as pd

def check(str) -> bool:
    return len(str) > 15

def invalid_tweets(tweets: pd.DataFrame) -> pd.DataFrame:
    tweets['flag'] = tweets['content'].apply(lambda str: check(str))
    return tweets[tweets['flag']][['tweet_id']]

mysql写法:

SELECT
	tweet_id 
FROM
	Tweets 
WHERE
	LENGTH( content ) > 15



1873. Calculate Special Bonus(好题)

原题链接:1873. Calculate Special Bonus
考点:apply和lambda、条件筛选

Table: Employees

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| employee_id | int     |
| name        | varchar |
| salary      | int     |
+-------------+---------+
In SQL, employee_id is the primary key for this table.
Each row of this table indicates the employee ID, employee name, and salary.

Calculate the bonus of each employee. The bonus of an employee is 100% of their salary if the ID of the employee is an odd number and the employee name does not start with the character 'M' . The bonus of an employee is 0 otherwise.

Return the result table ordered by employee_id .

The result format is in the following example.

Example 1:

Input:

Employees table:
+-------------+---------+--------+
| employee_id | name    | salary |
+-------------+---------+--------+
| 2           | Meir    | 3000   |
| 3           | Michael | 3800   |
| 7           | Addilyn | 7400   |
| 8           | Juan    | 6100   |
| 9           | Kannon  | 7700   |
+-------------+---------+--------+

Output:

+-------------+-------+
| employee_id | bonus |
+-------------+-------+
| 2           | 0     |
| 3           | 0     |
| 7           | 7400  |
| 8           | 0     |
| 9           | 7700  |
+-------------+-------+

Explanation:
The employees with IDs 2 and 8 get 0 bonus because they have an even employee_id.
The employee with ID 3 gets 0 bonus because their name starts with ‘M’.
The rest of the employees get a 100% bonus.

题目大意:
给每个人计算奖金,满足两个条件的人的奖金就是他的薪水,其余的人的奖金为0

pandas 思路1:
首先将所有人的奖金设置为他的薪水,其次找到所有不满足条件的人,将他们的奖金设置为0,都先设置为0也是一样的

pandas 实现1:

import pandas as pd

def calculate_special_bonus(employees: pd.DataFrame) -> pd.DataFrame:
    employees['bonus'] = employees['salary']
    index0 = employees.loc[(~employees['employee_id'] % 2) | (employees['name'].str.startswith('M'))].index
    employees.loc[index0, 'bonus'] = 0
    employees.sort_values(by='employee_id', inplace=True)

    return employees[['employee_id', 'bonus']]

pandas 思路2:
条件判断,使用apply和lambda的组合

pandas 实现2:

import pandas as pd

def calculate_special_bonus(employees: pd.DataFrame) -> pd.DataFrame:
    employees['bonus'] = employees.apply(
        lambda x: x['salary'] if x['employee_id'] % 2 and not x['name'].startswith('M') else 0, 
        axis=1
    )

    df = employees[['employee_id', 'bonus']].sort_values('employee_id')
    return df

MySQL 思路:
使用 if

MySQL 实现:

SELECT 
    employee_id,
    IF(employee_id % 2 = 1 AND name NOT REGEXP '^M', salary, 0) AS bonus 
FROM 
    employees 
ORDER BY 
    employee_id



1667. Fix Names in a Table(好题)

原题链接:1667. Fix Names in a Table
考察:字符串处理

Table: Users

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| user_id        | int     |
| name           | varchar |
+----------------+---------+
In SQL, user_id is the primary key for this table.
This table contains the ID and the name of the user. The name consists of only lowercase and uppercase characters.

Fix the names so that only the first character is uppercase and the rest are lowercase.

Return the result table ordered by user_id .

The result format is in the following example.

Example 1:

Input:

Users table:
+---------+-------+
| user_id | name  |
+---------+-------+
| 1       | aLice |
| 2       | bOB   |
+---------+-------+

Output:

+---------+-------+
| user_id | name  |
+---------+-------+
| 1       | Alice |
| 2       | Bob   |
+---------+-------+

题目大意:
将姓名这一列变为首字母大写其余字母小写的形式

pandas 思路1:
python里面刚好有 capitalize() 函数,满足要求

pandas 实现1:

import pandas as pd

def fix_names(users: pd.DataFrame) -> pd.DataFrame:
    users['name'] = users['name'].str.capitalize()
    users.sort_values(by='user_id', inplace=True)
    return users

pandas 思路2:
如果不知道 capitalize() 函数,那么就用模拟的方法,对于第一个字母将它大写,对于其余的字母将它小写

pandas 实现2:

def fix_names(users: pd.DataFrame) -> pd.DataFrame:
    users['name'] = users['name'].str[0].str.upper() + users['name'].str[1:].str.lower()
    users.sort_values(by='user_id', inplace=True)
    return users

MySQL 思路1:
用对于第一个字母将它大写,其余字母小写,然后进行连接

MySQL 实现1:

-- 用substring
SELECT user_id, 
  CONCAT(UPPER(SUBSTRING(name, 1, 1)), LOWER(SUBSTRING(name, 2))) AS name
FROM Users
ORDER BY user_id

-- 用left right
SELECT user_id, 
  CONCAT(UPPER(LEFT(name, 1)), LOWER(RIGHT(name, length(name) - 1))) AS name
FROM Users
ORDER BY user_id

补充:Python字符串大小写转换

str = "I love YOU"
print(str.upper())          # 把所有字符中的小写字母转换成大写字母
print(str.lower())          # 把所有字符中的大写字母转换成小写字母
print(str.capitalize())     # 把第一个字母转化为大写字母,其余小写
print(str.title())          # 把每个单词的第一个字母转化为大写,其余小写 



1517. Find Users With Valid E-Mails(好题)

原题链接:1517. Find Users With Valid E-Mails
考察点:正则表达式

Table: Users

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user_id       | int     |
| name          | varchar |
| mail          | varchar |
+---------------+---------+
In SQL, user_id is the primary key for this table.
This table contains information of the users signed up in a website. Some e-mails are invalid.

Find the users who have valid emails .

A valid e-mail has a prefix name and a domain where:

  • The prefix name is a string that may contain letters (upper or lower case), digits, underscore '_' , period '.' , and/or dash '-' . The prefix name must start with a letter.
  • The domain is '@leetcode.com' .

Return the result table in any order .

The result format is in the following example.

Example 1:

Input:

Users table:
+---------+-----------+-------------------------+
| user_id | name      | mail                    |
+---------+-----------+-------------------------+
| 1       | Winston   | [email protected]    |
| 2       | Jonathan  | jonathanisgreat         |
| 3       | Annabelle | [email protected]     |
| 4       | Sally     | [email protected] |
| 5       | Marwan    | quarz#[email protected] |
| 6       | David     | [email protected]       |
| 7       | Shapiro   | [email protected]     |
+---------+-----------+-------------------------+

Output:

+---------+-----------+-------------------------+
| user_id | name      | mail                    |
+---------+-----------+-------------------------+
| 1       | Winston   | [email protected]    |
| 3       | Annabelle | [email protected]     |
| 4       | Sally     | [email protected] |
+---------+-----------+-------------------------+

Explanation:
The mail of user 2 does not have a domain.
The mail of user 5 has the # sign which is not allowed.
The mail of user 6 does not have the leetcode domain.
The mail of user 7 starts with a period.

题目大意:
返回有合理的email的用户的信息

思路:
恰好是个机会学习一下正则表达式,也是第一次知道在SQL里也能用正则

pandas写法:

import pandas as pd

# 力扣官方题解
def valid_emails(df: pd.DataFrame) -> pd.DataFrame:
    ## 注意我们如何使用原始字符串(在前面放一个‘r’)来避免必须转义反斜杠
    # 还要注意,我们对`@`字符进行了转义,因为它在某些正则表达式中具有特殊意义
    return users[users["mail"].str.match(r"^[a-zA-Z][a-zA-Z0-9_.-]*\@leetcode\.com$")]

mysql写法:

-- 力扣官方题解
SELECT user_id, name, mail
FROM Users
-- 请注意,我们还转义了`@`字符,因为它在某些正则表达式中具有特殊意义
WHERE mail REGEXP '^[a-zA-Z][a-zA-Z0-9_.-]*\\@leetcode\\.com$';



1527. Patients With a Condition(好题)

原题链接:1527. Patients With a Condition
考察:字符串查找

Table: Patients

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| patient_id   | int     |
| patient_name | varchar |
| conditions   | varchar |
+--------------+---------+
In SQL, patient_id is the primary key for this table.
'conditions' contains 0 or more code separated by spaces. 
This table contains information of the patients in the hospital.

Find the patient_id, patient_name and conditions of the patients who have Type I Diabetes. Type I Diabetes always starts with DIAB1 prefix.

Return the result table in any order .

The result format is in the following example.

Example 1:

Input:

Patients table:
+------------+--------------+--------------+
| patient_id | patient_name | conditions   |
+------------+--------------+--------------+
| 1          | Daniel       | YFEV COUGH   |
| 2          | Alice        |              |
| 3          | Bob          | DIAB100 MYOP |
| 4          | George       | ACNE DIAB100 |
| 5          | Alain        | DIAB201      |
+------------+--------------+--------------+

Output:

+------------+--------------+--------------+
| patient_id | patient_name | conditions   |
+------------+--------------+--------------+
| 3          | Bob          | DIAB100 MYOP |
| 4          | George       | ACNE DIAB100 | 
+------------+--------------+--------------+

Explanation: Bob and George both have a condition that starts with DIAB1.

题目大意:
返回患有一型糖尿病的换着的信息

思路:
题目的要求是找到所有满足condition里包含以 DIAB1 开头的字符的行。
condition里由多个字符组成,我首先想到的是将condition转换为列表,然后循环判断,但其实只要判断:

  • 是否以 DIAB1 开头
  • 是否包含 DIAB1 字符,(D前面有个空格)
    上面两个条件任意成立一个即可

pandas 实现:

import pandas as pd

def find_patients(patients: pd.DataFrame) -> pd.DataFrame:
    # 也可以用patients['conditions'].str.find(' DIAB1') != -1
    return patients[patients['conditions'].str.startswith('DIAB1') | patients['conditions'].str.contains(' DIAB1', regex=False)]

MySQL 实现:

SELECT
	* 
FROM
	Patients 
WHERE
	conditions LIKE 'DIAB1%'  
	OR conditions LIKE '% DIAB1%'



数据操作:

177. Nth Highest Salary(好题)

原题链接:177. Nth Highest Salary
考察:去重、排序、返回第n位,新建df

Table: Employee

+-------------+------+
| Column Name | Type |
+-------------+------+
| id          | int  |
| salary      | int  |
+-------------+------+
id is the primary key (column with unique values) for this table.
Each row of this table contains information about the salary of an employee.

Write a solution to find the nth highest salary from the Employee table. If there is no nth highest salary, return ·null· .

The result format is in the following example.

Example 1:

Input:

Employee table:
+----+--------+
| id | salary |
+----+--------+
| 1  | 100    |
| 2  | 200    |
| 3  | 300    |
+----+--------+
n = 2

Output:

+------------------------+
| getNthHighestSalary(2) |
+------------------------+
| 200                    |
+------------------------+

Example 2:

Input:

Employee table:
+----+--------+
| id | salary |
+----+--------+
| 1  | 100    |
+----+--------+
n = 2

Output:

+------------------------+
| getNthHighestSalary(2) |
+------------------------+
| null                   |
+------------------------+

题目大意:
返回第n高的薪水

pandas 思路:
题目涉及了去重、排序和选择
pandas采用 drop_duplicates() 去重,sort_values() 排序,选则第n条采用 head(N) 组合tail(1)
注意返回的df需要重新建立

pandas 实现:

import pandas as pd

# 我的写法
def nth_highest_salary(employee: pd.DataFrame, N: int) -> pd.DataFrame:
    employee = employee.drop_duplicates(subset='salary') # 去重
    employee = employee.sort_values(by='salary', ascending=False) # 降序排列

    if employee.shape[0] < N:
        ans = None
    else:
        ans = int(employee.head(N).tail(1)['salary'])
    return pd.DataFrame({'getNthHighestSalary(n)':[ans]})

# 官方写法 感觉会更好
def nth_highest_salary(employee: pd.DataFrame, N: int) -> pd.DataFrame:
    df = employee[["salary"]].drop_duplicates()
    if len(df) < N:
        return pd.DataFrame({'getNthHighestSalary(2)': [None]})
    return df.sort_values("salary", ascending=False).head(N).tail(1)

MySQL 思路:
MySQL主要考察了 limit 的使用,用于输出第n位

MySQL 实现:

CREATE FUNCTION getNthHighestSalary(N INT) RETURNS INT
BEGIN
DECLARE M INT; 
    SET M = N-1; 

  RETURN (
      # Write your MySQL query statement below.
      SELECT DISTINCT salary
      FROM employee
      ORDER BY salary DESC
      LIMIT M, 1
  );
END

补充:SQL 中,查询中子句的执行顺序

  1. FROM 子句:指定从中检索数据的表。
  2. WHERE 子句:根据指定的条件筛选行。
  3. GROUP BY 子句:根据指定的列或表达式对行进行分组。
  4. HAVING 子句:根据条件筛选分组的行。
  5. SELECT 子句:选择将在结果集中返回的列或表达式。
  6. ORDER BY 子句:根据指定的列或表达式对结果集进行排序。
  7. LIMIT/OFFSET 子句:限制结果集中返回的行数。

注意:你的 DBMS 可能会以等价但 不同 的顺序执行一个查询。




176. Second Highest Salary

原题链接:176. Second Highest Salary
考察:去重、排序、取第n、为空情况

Table: Employee

+-------------+------+
| Column Name | Type |
+-------------+------+
| id          | int  |
| salary      | int  |
+-------------+------+
id is the primary key (column with unique values) for this table.
Each row of this table contains information about the salary of an employee.

Write a solution to find the second highest salary from the Employee table. If there is no second highest salary, return null (return None in Pandas).

The result format is in the following example.

Example 1:

Input:

Employee table:
+----+--------+
| id | salary |
+----+--------+
| 1  | 100    |
| 2  | 200    |
| 3  | 300    |
+----+--------+

Output:

+---------------------+
| SecondHighestSalary |
+---------------------+
| 200                 |
+---------------------+

Example 2:

Input:

Employee table:
+----+--------+
| id | salary |
+----+--------+
| 1  | 100    |
+----+--------+

Output:

+---------------------+
| SecondHighestSalary |
+---------------------+
| null                |
+---------------------+

题目大意:
返回第二高的薪水,如果不存在就返回空

pandas 思路:
要做的就是去重、排序、取第二高,以及没有结果的情况返回空

pandas 实现:

import pandas as pd

def second_highest_salary(employee: pd.DataFrame) -> pd.DataFrame:
    employee.drop_duplicates(subset='salary', inplace=True)  # 去重
    employee = employee.sort_values(by='salary', ascending=False) # 降序排列
    if employee.shape[0] < 2:
        ans = None
    else:
        ans = int(employee.head(2).tail(1)['salary'])
    return pd.DataFrame({'SecondHighestSalary':[ans]})

MySQL 思路1:
使用子查询和 limit 子句,外面再套一层,这样为空的情况就可以正确显示

MySQL 实现1:

SELECT
    (SELECT DISTINCT
            Salary
        FROM
            Employee
        ORDER BY Salary DESC
        LIMIT 1 OFFSET 1) AS SecondHighestSalary

MySQL 思路2:
ifnull 来处理不存在的情况

MySQL 实现2:

select ifnull(
    (select distinct Salary 
        from Employee 
        order by Salary desc 
        limit 1,1
    ),null
) as SecondHighestSalary;

补充:

  • limit n, m :先获取到游标n的位置,再从此位置开始往后取m条数据,不足m条的返回实际的数量
  • IFNULL(表达式1, 表达式2) :如果表达式1的值不为null返回表达式1的值,否则返回表达式2的值



184. Department Highest Salary(好题)

原题链接:184. Department Highest Salary
考察:groupby

Pandas Schema:

data = [[1, 'Joe', 70000, 1], [2, 'Jim', 90000, 1], [3, 'Henry', 80000, 2], [4, 'Sam', 60000, 2], [5, 'Max', 90000, 1]]
Employee = pd.DataFrame(data, columns=['id', 'name', 'salary', 'departmentId']).astype({'id':'Int64', 'name':'object', 'salary':'Int64', 'departmentId':'Int64'})
data = [[1, 'IT'], [2, 'Sales']]
Department = pd.DataFrame(data, columns=['id', 'name']).astype({'id':'Int64', 'name':'object'})

Table: Employee

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| id           | int     |
| name         | varchar |
| salary       | int     |
| departmentId | int     |
+--------------+---------+
id is the primary key (column with unique values) for this table.
departmentId is a foreign key (reference columns) of the ID from the Department table.
Each row of this table indicates the ID, name, and salary of an employee. It also contains the ID of their department.

Table: Department

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| name        | varchar |
+-------------+---------+
id is the primary key (column with unique values) for this table. It is guaranteed that department name is not NULL.
Each row of this table indicates the ID of a department and its name.

Write a solution to find employees who have the highest salary in each of the departments.

Return the result table in any order .

The result format is in the following example.

Example 1:

Input:

Employee table:
+----+-------+--------+--------------+
| id | name  | salary | departmentId |
+----+-------+--------+--------------+
| 1  | Joe   | 70000  | 1            |
| 2  | Jim   | 90000  | 1            |
| 3  | Henry | 80000  | 2            |
| 4  | Sam   | 60000  | 2            |
| 5  | Max   | 90000  | 1            |
+----+-------+--------+--------------+

Department table:
+----+-------+
| id | name  |
+----+-------+
| 1  | IT    |
| 2  | Sales |
+----+-------+

Output:

+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT         | Jim      | 90000  |
| Sales      | Henry    | 80000  |
| IT         | Max      | 90000  |
+------------+----------+--------+

Explanation: Max and Jim both have the highest salary in the IT department and Henry has the highest salary in the Sales department.

题目大意:
返回每个部门里工资最高的人的信息

pandas 思路:
首先需要链接两个表,想到了左连接
其次要注意最高薪资是多个人的情况要均保留,因此不能简单的排序后去重

pandas 实现:

import pandas as pd

def department_highest_salary(employee: pd.DataFrame, department: pd.DataFrame) -> pd.DataFrame:
    ans = pd.merge(employee, department, how='left', left_on='departmentId', right_on='id') # 合并两个表
    ans.rename(columns={'name_x': 'Employee', 'name_y': 'Department', 'salary': 'Salary'}, inplace=True) # 重命名

    # 选择工资等于部门最高工资的员工
    max_salary = ans.groupby('Department')['Salary'].transform('max')
    ans = ans[ans['Salary'] == max_salary]

    return ans[['Department', 'Employee', 'Salary']]

MySQL 思路:
先用子查询查出每个部门的最高薪资,然后用 in 查询(DepartmentId, Salary) 在临时表中的结果

MySQL 实现:

SELECT
	b.name AS Department,
	a.name AS Employee,
	a.salary AS salary 
FROM
	Employee a
	JOIN Department b ON a.departmentId = b.id 
WHERE
	( a.departmentId, a.salary ) IN ( SELECT departmentId, max( salary ) FROM Employee GROUP BY departmentId )



178. Rank Scores(好题)

原题链接:178. Rank Scores
考察:排序、窗口函数

Table: Scores

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| score       | decimal |
+-------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table contains the score of a game. Score is a floating point value with two decimal places.

Write a solution to find the rank of the scores. The ranking should be calculated according to the following rules:

  • The scores should be ranked from the highest to the lowest.
  • If there is a tie(平局) between two scores, both should have the same ranking.
  • After a tie, the next ranking number should be the next consecutive integer value. In other words, there should be no holes between ranks.

Return the result table ordered by score in descending order.

The result format is in the following example.

Example 1:

Input:

Scores table:
+----+-------+
| id | score |
+----+-------+
| 1  | 3.50  |
| 2  | 3.65  |
| 3  | 4.00  |
| 4  | 3.85  |
| 5  | 4.00  |
| 6  | 3.65  |
+----+-------+

Output:

+-------+------+
| score | rank |
+-------+------+
| 4.00  | 1    |
| 4.00  | 1    |
| 3.85  | 2    |
| 3.65  | 3    |
| 3.65  | 3    |
| 3.50  | 4    |
+-------+------+

题目大意:
返回根据排名升序之后的结果

pandas 思路:
就是经典密集型排序(相同分数的采用统一排名),pandas偷懒的话可以直接用 rank()

pandas 实现:

import pandas as pd

def order_scores(scores: pd.DataFrame) -> pd.DataFrame:
    scores['rank'] = scores['score'].rank(method='dense',ascending=False) # 使用rank函数 密集型 相同分数相同排名
    scores.sort_values(by='rank', ascending=True, inplace=True)
    
    return scores[['score', 'rank']]

MySQL 思路1:
窗口函数(窗口函数对一组查询行执行类似于聚合的操作。但是,聚合操作将查询行分组为一个单独的结果行,而窗口函数为每个查询行生成一个结果), dense_rank() 窗口函数恰好满足要求

MySQL 实现1:

SELECT
	score,
	dense_rank( ) over ( ORDER BY score DESC ) AS 'rank' 
FROM
	scores

MySQL 思路2:
类似于计数排序的思路,找到大于等于本分数的个数,就得到排名
使用相关子查询来实现:

  • 对于每个分数,选择在表中大于等于该分数的不同分数的数量
  • 按照score对结果排序

MySQL 实现2:

SELECT
	S1.score,
	( SELECT COUNT( DISTINCT S2.score ) FROM Scores S2 WHERE S2.score >= S1.score ) AS 'rank' 
FROM
	Scores S1 
ORDER BY
	S1.score DESC

补充:常见的三种排序方式:

  • First排序排名:1、2、3、4、5
  • Min跳跃排名:1、2、2、4、5
  • Dense密集排名:1、2、2、3、4



196. Delete Duplicate Emails(好题)

原题链接:196. Delete Duplicate Emails
考察:groupby

Table: Person

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| email       | varchar |
+-------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table contains an email. The emails will not contain uppercase letters.

Write a solution to delete all duplicate emails, keeping only one unique email with the smallest id .

For SQL users, please note that you are supposed to write a DELETE statement and not a SELECT one.

For Pandas users, please note that you are supposed to modify Person in place.

After running your script, the answer shown is the Person table. The driver will first compile and run your piece of code and then show the Person table. The final order of the Person table does not matter .

The result format is in the following example.

Example 1:

Input:

Person table:
+----+------------------+
| id | email            |
+----+------------------+
| 1  | [email protected] |
| 2  | [email protected]  |
| 3  | [email protected] |
+----+------------------+

Output:

+----+------------------+
| id | email            |
+----+------------------+
| 1  | [email protected] |
| 2  | [email protected]  |
+----+------------------+

Explanation: [email protected] is repeated two times. We keep the row with the smallest Id = 1.

题目大意:
题目要求删去重复的邮箱,只保留id最小的额那一条

pandas 思路:
通过 groupby('email')['id'].transform('min') 将整个df按相同的email进行分组,并得到每个组最小的id组成的series,再根据他们的index用drop()进行删除。题目要求算法原地工作,那么设置 inplace=True 即可

pandas 实现:

import pandas as pd

def delete_duplicate_emails(person: pd.DataFrame) -> None:
    min_id = person.groupby('email')['id'].transform('min') # 找到最小的id
    removed_person = person[person['id'] != min_id] 
    person.drop(removed_person.index, inplace=True) # 删除对应index

MySQL 思路:
官方的写法是通过内连接,将每条记录和其他与它有相同邮箱的记录进行比较,当他的id不是最小的,就进行删除

MySQL 实现:

DELETE p1 FROM Person p1,
    Person p2
WHERE
    p1.Email = p2.Email AND p1.Id > p2.Id



1795. Rearrange Products Table

原题链接:1795. Rearrange Products Table
考察:表合并

Table: Products

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| product_id  | int     |
| store1      | int     |
| store2      | int     |
| store3      | int     |
+-------------+---------+
product_id is the primary key (column with unique values) for this table.
Each row in this table indicates the product's price in 3 different stores: store1, store2, and store3.
If the product is not available in a store, the price will be null in that store's column.

Write a solution to rearrange the Products table so that each row has (product_id, store, price). If a product is not available in a store, do not include a row with that product_id and store combination in the result table.

Return the result table in any order.

The result format is in the following example.

Example 1:

Input:

Products table:
+------------+--------+--------+--------+
| product_id | store1 | store2 | store3 |
+------------+--------+--------+--------+
| 0          | 95     | 100    | 105    |
| 1          | 70     | null   | 80     |
+------------+--------+--------+--------+

Output:

+------------+--------+-------+
| product_id | store  | price |
+------------+--------+-------+
| 0          | store1 | 95    |
| 0          | store2 | 100   |
| 0          | store3 | 105   |
| 1          | store1 | 70    |
| 1          | store3 | 80    |
+------------+--------+-------+

Explanation:
Product 0 is available in all three stores with prices 95, 100, and 105 respectively.
Product 1 is available in store1 with price 70 and store3 with price 80. The product is not available in store2.

题目大意:
修改表的结构,由原来表的结构修改到目标表的结构

pandas 思路:
行转列,原先的表名现在称为store列下的值。那么可以遍历三个store,然后单独处理,最后再concat()到一起

pandas 实现:

import pandas as pd

def rearrange_products_table(products: pd.DataFrame) -> pd.DataFrame:
    store_list = ['store1', 'store2', 'store3']
    ans = pd.DataFrame(columns=['product_id', 'store', 'price']) # 先设置一个空的ans

    # 遍历三个store
    for store in store_list:
        tmp = products.loc[products[store].notnull(), ['product_id', store]]
        tmp.rename(columns={store: 'price'}, inplace=True)
        tmp['store'] = store
        tmp = tmp[['product_id', 'store', 'price']]
        ans = pd.concat([ans, tmp])

    return ans

MySQL 思路:
重新排列表格,将三个商店的各自查询结果 union 成为一个完整结果

MySQL 实现:

SELECT product_id, 'store1' AS store, store1 AS price 
FROM Products 
WHERE store1 IS NOT NULL

UNION 

SELECT product_id, 'store2' AS store, store2 AS price 
FROM Products 
WHERE store2 IS NOT NULL

UNION 

SELECT product_id, 'store3' AS store, store3 AS price 
FROM Products 
WHERE store3 IS NOT NULL



你可能感兴趣的:(#,Pandas,leetcode,pandas,学习)