cwtnice

【Leetcode 30天Pandas挑战】学习记录上

题目列表：

条件筛选：
- 595. Big Countries
- 1757. Recyclable and Low Fat Products
- 183. Customers Who Never Order
- 1148. Article Views I
字符串函数：
- 1683. Invalid Tweets
- 1873. Calculate Special Bonus（好题）
- 1667. Fix Names in a Table（好题）
- 1517. Find Users With Valid E-Mails（好题）
- 1527. Patients With a Condition（好题）
数据操作:
- 177. Nth Highest Salary（好题）
- 176. Second Highest Salary
- 184. Department Highest Salary（好题）
- 178. Rank Scores（好题）
- 196. Delete Duplicate Emails（好题）
- 1795. Rearrange Products Table

条件筛选：

595. Big Countries

原题链接：595. Big Countries
考察：行筛选 or

Table: World

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| name        | varchar |
| continent   | varchar |
| area        | int     |
| population  | int     |
| gdp         | bigint  |
+-------------+---------+

In SQL, name is the primary key column for this table.
Each row of this table gives information about the name of a country, the continent to which it belongs, its area, the population, and its GDP value.

A country is big if:

it has an area of at least three million (i.e., 3000000 km2), or
it has a population of at least twenty-five million (i.e., 25000000).
Find the name, population, and area of the big countries.

Return the result table in any order.

The result format is in the following example.

Example 1:

Input:

World table:
+-------------+-----------+---------+------------+--------------+
| name        | continent | area    | population | gdp          |
+-------------+-----------+---------+------------+--------------+
| Afghanistan | Asia      | 652230  | 25500100   | 20343000000  |
| Albania     | Europe    | 28748   | 2831741    | 12960000000  |
| Algeria     | Africa    | 2381741 | 37100000   | 188681000000 |
| Andorra     | Europe    | 468     | 78115      | 3712000000   |
| Angola      | Africa    | 1246700 | 20609294   | 100990000000 |
+-------------+-----------+---------+------------+--------------+

Output:

+-------------+------------+---------+
| name        | population | area    |
+-------------+------------+---------+
| Afghanistan | 25500100   | 652230  |
| Algeria     | 37100000   | 2381741 |
+-------------+------------+---------+

题目大意：
找到所有的大国家，一个大国家需要满足给定的条件A或者条件B

pandas 思路1：
用两个条件进行行筛选，注意是或

pandas 实现1：

import pandas as pd

def big_countries(world: pd.DataFrame) -> pd.DataFrame:
    res = world[(world['area'] >= 3000000) | (world['population'] >= 25000000)]
    return res[['name', 'population', 'area']]

pandas 思路2：
也是两个条件的筛选，但是用 loc[]

pandas 实现2：

import pandas as pd

def big_countries(world: pd.DataFrame) -> pd.DataFrame:
    return world.loc[(world['area'] >= 3000000) | (world['population'] >= 25000000), ['name', 'population', 'area']]

MySQL 思路：
用 where 筛选，两个条件是或的关系，用 OR

MySQL 实现：

SELECT
	name,
	population,
	area 
FROM
	World 
WHERE
	area >= 3000000 
	OR population >= 25000000

1757. Recyclable and Low Fat Products

原题链接：1757. Recyclable and Low Fat Products
考察：行筛选 and

Table: Products

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| product_id  | int     |
| low_fats    | enum    |
| recyclable  | enum    |
+-------------+---------+
In SQL, product_id is the primary key for this table.
low_fats is an ENUM of type ('Y', 'N') where 'Y' means this product is low fat and 'N' means it is not.
recyclable is an ENUM of types ('Y', 'N') where 'Y' means this product is recyclable and 'N' means it is not.

Find the ids of products that are both low fat and recyclable.

Return the result table in any order.

The result format is in the following example.

Example 1:

input:

Products table:
+-------------+----------+------------+
| product_id  | low_fats | recyclable |
+-------------+----------+------------+
| 0           | Y        | N          |
| 1           | Y        | Y          |
| 2           | N        | Y          |
| 3           | Y        | Y          |
| 4           | N        | N          |
+-------------+----------+------------+

Output:

+-------------+
| product_id  |
+-------------+
| 1           |
| 3           |
+-------------+

Explanation: Only products 1 and 3 are both low fat and recyclable.

题目大意：
返回既满足条件A也满足条件B的产品编号

pandas 思路：
两个条件进行筛选，和上一题就是 and 和 or 的区别，也可以用 loc

pandas 写法：

import pandas as pd

def find_products(products: pd.DataFrame) -> pd.DataFrame:
    res = products[(products['low_fats'] == 'Y') & (products['recyclable'] == 'Y')]
    return res[['product_id']]

MySQL 思路：
用 where 筛选，两个条件是且的关系，用 AND

MySQL 写法：

SELECT
	product_id 
FROM
	Products 
WHERE
	low_fats = 'Y' 
	AND recyclable = 'Y'

183. Customers Who Never Order

原题链接：183. Customers Who Never Order
考察：合并、选取非空、排除条件

Table: Customers

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| name        | varchar |
+-------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table indicates the ID and name of a customer.

Table: Orders

+-------------+------+
| Column Name | Type |
+-------------+------+
| id          | int  |
| customerId  | int  |
+-------------+------+
id is the primary key (column with unique values) for this table.
customerId is a foreign key (reference columns) of the ID from the Customers table.
Each row of this table indicates the ID of an order and the ID of the customer who ordered it.

Write a solution to find all customers who never order anything.

Return the result table in any order .

The result format is in the following example.

Example 1:

Input:

Customers table:
+----+-------+
| id | name  |
+----+-------+
| 1  | Joe   |
| 2  | Henry |
| 3  | Sam   |
| 4  | Max   |
+----+-------+

Orders table:
+----+------------+
| id | customerId |
+----+------------+
| 1  | 3          |
| 2  | 1          |
+----+------------+

Output:

+-----------+
| Customers |
+-----------+
| Henry     |
| Max       |
+-----------+

题目大意：
给了两个表，一个是顾客表，一个是订单表，要求返回没有点过单的顾客名称

pandas 思路1：
合并，没有点过单的在 customerId 里会为空，行筛选即可

pandas 实现1：

import pandas as pd

def find_customers(customers: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:
    tmp = pd.merge(customers, orders, how='left', left_on='id', right_on='customerId')
    tmp2 = tmp[tmp['customerId'].isna() == True]
    tmp2.rename(columns={'name':'Customers'}, inplace=True)
    
    return tmp2[['Customers']]

pandas 思路2：
在customers表中选取没有在orders的id中出现过的

pandas 实现2：

import pandas as pd

def find_customers(customers: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:
    # 选择id没有在orders中出现过的
    df = customers[~customers['id'].isin(orders['customerId'])]

    # 重命名
    df = df[['name']].rename(columns={'name': 'Customers'})
    return df

MySQL 思路1：
左连接两个表，where筛选 customerId 为空的

MySQL 实现1：

SELECT
	name AS Customers 
FROM
	Customers a
	LEFT JOIN Orders b ON a.id = b.customerId 
WHERE
	b.customerId IS NULL

MySQL 思路2：
子查询orders中的id，然后用 not in

MySQL 实现2：

SELECT
	customers.name AS Customers 
FROM
	customers 
WHERE
	customers.id NOT IN ( SELECT customerid FROM orders )

1148. Article Views I

原题链接：1148. Article Views I
考察：去重、排序

Table: Views

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| article_id    | int     |
| author_id     | int     |
| viewer_id     | int     |
| view_date     | date    |
+---------------+---------+
There is no primary key (column with unique values) for this table, the table may have duplicate rows.
Each row of this table indicates that some viewer viewed an article (written by some author) on some date. 
Note that equal author_id and viewer_id indicate the same person.

Write a solution to find all the authors that viewed at least one of their own articles.

Return the result table sorted by id in ascending order.

The result format is in the following example.

Example 1:

Input:

Views table:
+------------+-----------+-----------+------------+
| article_id | author_id | viewer_id | view_date  |
+------------+-----------+-----------+------------+
| 1          | 3         | 5         | 2019-08-01 |
| 1          | 3         | 6         | 2019-08-02 |
| 2          | 7         | 7         | 2019-08-01 |
| 2          | 7         | 6         | 2019-08-02 |
| 4          | 7         | 1         | 2019-07-22 |
| 3          | 4         | 4         | 2019-07-21 |
| 3          | 4         | 4         | 2019-07-21 |
+------------+-----------+-----------+------------+

Output:

+------+
| id   |
+------+
| 4    |
| 7    |
+------+

题目大意：
给一个表，记录了文章id、作者id、读者id和查阅时间，要求返回看过自己作品的作者的id，并按id递增排序

pandas思路：
行筛选后，用 drop_duplicates() 去重，然后用 sort_values() 进行排序

pandas实现：

import pandas as pd

def article_views(views: pd.DataFrame) -> pd.DataFrame:
  tmp = views[views['author_id'] == views['viewer_id']] 
  tmp.rename(columns={'author_id':'id'}, inplace=True)
  tmp.drop_duplicates(subset='id', keep='first', inplace=True)
  tmp.sort_values(by='id', inplace=True)

  return tmp[['id']]

MySQL思路：
用 where 和 order by

MySQL实现：

SELECT DISTINCT
	author_id AS id 
FROM
	Views 
WHERE
	author_id = viewer_id 
ORDER BY
	author_id

字符串函数：

1683. Invalid Tweets

原题链接：1683. Invalid Tweets
考察点：字符串、行筛选、loc

Table: Tweets

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| tweet_id       | int     |
| content        | varchar |
+----------------+---------+
In SQL, tweet_id is the primary key for this table.
This table contains all the tweets in a social media app.

Find the IDs of the invalid tweets. The tweet is invalid if the number of characters used in the content of the tweet is strictly greater than 15 .

Return the result table in any order .

The result format is in the following example.

Example 1:

Input:

Tweets table:
+----------+----------------------------------+
| tweet_id | content                          |
+----------+----------------------------------+
| 1        | Vote for Biden                   |
| 2        | Let us make America great again! |
+----------+----------------------------------+

Output:

+----------+
| tweet_id |
+----------+
| 2        |
+----------+

Explanation:
Tweet 1 has length = 14. It is a valid tweet.
Tweet 2 has length = 32. It is an invalid tweet.

题目大意：
返回不合法（字数大于15）的推特的id

pandas 思路1：
直接对这一字段进行判断

pandas 实现1：

import pandas as pd

def invalid_tweets(tweets: pd.DataFrame) -> pd.DataFrame:
    invalid = tweets['content'].str.len() > 15 # 结果是一个布尔Series
    return tweets[invalid][['tweet_id']]

# 用loc一步实现 推荐用这个
def invalid_tweets(tweets: pd.DataFrame) -> pd.DataFrame:
    return tweets.loc[tweets['content'].str.len() > 15, ['tweet_id']]

pandas 思路2：
写一个判断函数，在本题这么做其实没必要

pandas 实现2：

import pandas as pd

def check(str) -> bool:
    return len(str) > 15

def invalid_tweets(tweets: pd.DataFrame) -> pd.DataFrame:
    tweets['flag'] = tweets['content'].apply(lambda str: check(str))
    return tweets[tweets['flag']][['tweet_id']]

mysql写法：

SELECT
	tweet_id 
FROM
	Tweets 
WHERE
	LENGTH( content ) > 15

1873. Calculate Special Bonus（好题）

原题链接：1873. Calculate Special Bonus
考点：apply和lambda、条件筛选

Table: Employees

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| employee_id | int     |
| name        | varchar |
| salary      | int     |
+-------------+---------+
In SQL, employee_id is the primary key for this table.
Each row of this table indicates the employee ID, employee name, and salary.

Calculate the bonus of each employee. The bonus of an employee is 100% of their salary if the ID of the employee is an odd number and the employee name does not start with the character 'M' . The bonus of an employee is 0 otherwise.

Return the result table ordered by employee_id .

The result format is in the following example.

Example 1:

Input:

Employees table:
+-------------+---------+--------+
| employee_id | name    | salary |
+-------------+---------+--------+
| 2           | Meir    | 3000   |
| 3           | Michael | 3800   |
| 7           | Addilyn | 7400   |
| 8           | Juan    | 6100   |
| 9           | Kannon  | 7700   |
+-------------+---------+--------+

Output:

+-------------+-------+
| employee_id | bonus |
+-------------+-------+
| 2           | 0     |
| 3           | 0     |
| 7           | 7400  |
| 8           | 0     |
| 9           | 7700  |
+-------------+-------+

Explanation:
The employees with IDs 2 and 8 get 0 bonus because they have an even employee_id.
The employee with ID 3 gets 0 bonus because their name starts with ‘M’.
The rest of the employees get a 100% bonus.

题目大意：
给每个人计算奖金，满足两个条件的人的奖金就是他的薪水，其余的人的奖金为0

pandas 思路1：
首先将所有人的奖金设置为他的薪水，其次找到所有不满足条件的人，将他们的奖金设置为0，都先设置为0也是一样的

pandas 实现1：

import pandas as pd

def calculate_special_bonus(employees: pd.DataFrame) -> pd.DataFrame:
    employees['bonus'] = employees['salary']
    index0 = employees.loc[(~employees['employee_id'] % 2) | (employees['name'].str.startswith('M'))].index
    employees.loc[index0, 'bonus'] = 0
    employees.sort_values(by='employee_id', inplace=True)

    return employees[['employee_id', 'bonus']]

pandas 思路2：
条件判断，使用apply和lambda的组合

pandas 实现2：

import pandas as pd

def calculate_special_bonus(employees: pd.DataFrame) -> pd.DataFrame:
    employees['bonus'] = employees.apply(
        lambda x: x['salary'] if x['employee_id'] % 2 and not x['name'].startswith('M') else 0, 
        axis=1
    )

    df = employees[['employee_id', 'bonus']].sort_values('employee_id')
    return df

MySQL 思路：
使用 if

MySQL 实现：

SELECT 
    employee_id,
    IF(employee_id % 2 = 1 AND name NOT REGEXP '^M', salary, 0) AS bonus 
FROM 
    employees 
ORDER BY 
    employee_id

1667. Fix Names in a Table（好题）

原题链接：1667. Fix Names in a Table
考察：字符串处理

Table: Users

+----------------+---------+
| Column Name    | Type    |
+----------------+---------+
| user_id        | int     |
| name           | varchar |
+----------------+---------+
In SQL, user_id is the primary key for this table.
This table contains the ID and the name of the user. The name consists of only lowercase and uppercase characters.

Fix the names so that only the first character is uppercase and the rest are lowercase.

Return the result table ordered by user_id .

The result format is in the following example.

Example 1:

Input:

Users table:
+---------+-------+
| user_id | name  |
+---------+-------+
| 1       | aLice |
| 2       | bOB   |
+---------+-------+

Output:

+---------+-------+
| user_id | name  |
+---------+-------+
| 1       | Alice |
| 2       | Bob   |
+---------+-------+

题目大意：
将姓名这一列变为首字母大写其余字母小写的形式

pandas 思路1：
python里面刚好有 capitalize() 函数，满足要求

pandas 实现1：

import pandas as pd

def fix_names(users: pd.DataFrame) -> pd.DataFrame:
    users['name'] = users['name'].str.capitalize()
    users.sort_values(by='user_id', inplace=True)
    return users

pandas 思路2：
如果不知道 capitalize() 函数，那么就用模拟的方法，对于第一个字母将它大写，对于其余的字母将它小写

pandas 实现2：

def fix_names(users: pd.DataFrame) -> pd.DataFrame:
    users['name'] = users['name'].str[0].str.upper() + users['name'].str[1:].str.lower()
    users.sort_values(by='user_id', inplace=True)
    return users

MySQL 思路1：
用对于第一个字母将它大写，其余字母小写，然后进行连接

MySQL 实现1：

-- 用substring
SELECT user_id, 
  CONCAT(UPPER(SUBSTRING(name, 1, 1)), LOWER(SUBSTRING(name, 2))) AS name
FROM Users
ORDER BY user_id

-- 用left right
SELECT user_id, 
  CONCAT(UPPER(LEFT(name, 1)), LOWER(RIGHT(name, length(name) - 1))) AS name
FROM Users
ORDER BY user_id

补充：Python字符串大小写转换

str = "I love YOU"
print(str.upper())          # 把所有字符中的小写字母转换成大写字母
print(str.lower())          # 把所有字符中的大写字母转换成小写字母
print(str.capitalize())     # 把第一个字母转化为大写字母，其余小写
print(str.title())          # 把每个单词的第一个字母转化为大写，其余小写

1517. Find Users With Valid E-Mails（好题）

原题链接：1517. Find Users With Valid E-Mails
考察点：正则表达式

Table: Users

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user_id       | int     |
| name          | varchar |
| mail          | varchar |
+---------------+---------+
In SQL, user_id is the primary key for this table.
This table contains information of the users signed up in a website. Some e-mails are invalid.

Find the users who have valid emails .

A valid e-mail has a prefix name and a domain where:

The prefix name is a string that may contain letters (upper or lower case), digits, underscore '_' , period '.' , and/or dash '-' . The prefix name must start with a letter.
The domain is '@leetcode.com' .

Return the result table in any order .

The result format is in the following example.

Example 1:

Input:

Users table:
+---------+-----------+-------------------------+
| user_id | name      | mail                    |
+---------+-----------+-------------------------+
| 1       | Winston   | [email protected]    |
| 2       | Jonathan  | jonathanisgreat         |
| 3       | Annabelle | [email protected]     |
| 4       | Sally     | [email protected] |
| 5       | Marwan    | quarz#[email protected] |
| 6       | David     | [email protected]       |
| 7       | Shapiro   | [email protected]     |
+---------+-----------+-------------------------+

Output:

+---------+-----------+-------------------------+
| user_id | name      | mail                    |
+---------+-----------+-------------------------+
| 1       | Winston   | [email protected]    |
| 3       | Annabelle | [email protected]     |
| 4       | Sally     | [email protected] |
+---------+-----------+-------------------------+

Explanation:
The mail of user 2 does not have a domain.
The mail of user 5 has the # sign which is not allowed.
The mail of user 6 does not have the leetcode domain.
The mail of user 7 starts with a period.

题目大意：
返回有合理的email的用户的信息

思路：
恰好是个机会学习一下正则表达式，也是第一次知道在SQL里也能用正则

pandas写法：

import pandas as pd

# 力扣官方题解
def valid_emails(df: pd.DataFrame) -> pd.DataFrame:
    ## 注意我们如何使用原始字符串(在前面放一个‘r’)来避免必须转义反斜杠
    # 还要注意，我们对`@`字符进行了转义，因为它在某些正则表达式中具有特殊意义
    return users[users["mail"].str.match(r"^[a-zA-Z][a-zA-Z0-9_.-]*\@leetcode\.com$")]

mysql写法：

-- 力扣官方题解
SELECT user_id, name, mail
FROM Users
-- 请注意，我们还转义了`@`字符，因为它在某些正则表达式中具有特殊意义
WHERE mail REGEXP '^[a-zA-Z][a-zA-Z0-9_.-]*\\@leetcode\\.com$';

1527. Patients With a Condition（好题）

原题链接：1527. Patients With a Condition
考察：字符串查找

Table: Patients

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| patient_id   | int     |
| patient_name | varchar |
| conditions   | varchar |
+--------------+---------+
In SQL, patient_id is the primary key for this table.
'conditions' contains 0 or more code separated by spaces. 
This table contains information of the patients in the hospital.

Find the patient_id, patient_name and conditions of the patients who have Type I Diabetes. Type I Diabetes always starts with DIAB1 prefix.

Return the result table in any order .

The result format is in the following example.

Example 1:

Input:

Patients table:
+------------+--------------+--------------+
| patient_id | patient_name | conditions   |
+------------+--------------+--------------+
| 1          | Daniel       | YFEV COUGH   |
| 2          | Alice        |              |
| 3          | Bob          | DIAB100 MYOP |
| 4          | George       | ACNE DIAB100 |
| 5          | Alain        | DIAB201      |
+------------+--------------+--------------+

Output:

+------------+--------------+--------------+
| patient_id | patient_name | conditions   |
+------------+--------------+--------------+
| 3          | Bob          | DIAB100 MYOP |
| 4          | George       | ACNE DIAB100 | 
+------------+--------------+--------------+

Explanation: Bob and George both have a condition that starts with DIAB1.

题目大意：
返回患有一型糖尿病的换着的信息

思路：
题目的要求是找到所有满足condition里包含以 DIAB1 开头的字符的行。
condition里由多个字符组成，我首先想到的是将condition转换为列表，然后循环判断，但其实只要判断：

是否以 DIAB1 开头
是否包含 DIAB1 字符，（D前面有个空格）
上面两个条件任意成立一个即可

pandas 实现：

import pandas as pd

def find_patients(patients: pd.DataFrame) -> pd.DataFrame:
    # 也可以用patients['conditions'].str.find(' DIAB1') != -1
    return patients[patients['conditions'].str.startswith('DIAB1') | patients['conditions'].str.contains(' DIAB1', regex=False)]

MySQL 实现：

SELECT
	* 
FROM
	Patients 
WHERE
	conditions LIKE 'DIAB1%'  
	OR conditions LIKE '% DIAB1%'

数据操作:

177. Nth Highest Salary（好题）

原题链接：177. Nth Highest Salary
考察：去重、排序、返回第n位，新建df

Table: Employee

+-------------+------+
| Column Name | Type |
+-------------+------+
| id          | int  |
| salary      | int  |
+-------------+------+
id is the primary key (column with unique values) for this table.
Each row of this table contains information about the salary of an employee.

Write a solution to find the n^th highest salary from the Employee table. If there is no n^th highest salary, return ·null· .

The result format is in the following example.

Example 1:

Input:

Employee table:
+----+--------+
| id | salary |
+----+--------+
| 1  | 100    |
| 2  | 200    |
| 3  | 300    |
+----+--------+
n = 2

Output:

+------------------------+
| getNthHighestSalary(2) |
+------------------------+
| 200                    |
+------------------------+

Example 2:

Input:

Employee table:
+----+--------+
| id | salary |
+----+--------+
| 1  | 100    |
+----+--------+
n = 2

Output:

+------------------------+
| getNthHighestSalary(2) |
+------------------------+
| null                   |
+------------------------+

题目大意：
返回第n高的薪水

pandas 思路：
题目涉及了去重、排序和选择
pandas采用 drop_duplicates() 去重，sort_values() 排序，选则第n条采用 head(N) 组合tail(1)
注意返回的df需要重新建立

pandas 实现：

import pandas as pd

# 我的写法
def nth_highest_salary(employee: pd.DataFrame, N: int) -> pd.DataFrame:
    employee = employee.drop_duplicates(subset='salary') # 去重
    employee = employee.sort_values(by='salary', ascending=False) # 降序排列

    if employee.shape[0] < N:
        ans = None
    else:
        ans = int(employee.head(N).tail(1)['salary'])
    return pd.DataFrame({'getNthHighestSalary(n)':[ans]})

# 官方写法 感觉会更好
def nth_highest_salary(employee: pd.DataFrame, N: int) -> pd.DataFrame:
    df = employee[["salary"]].drop_duplicates()
    if len(df) < N:
        return pd.DataFrame({'getNthHighestSalary(2)': [None]})
    return df.sort_values("salary", ascending=False).head(N).tail(1)

MySQL 思路：
MySQL主要考察了 limit 的使用，用于输出第n位

MySQL 实现：

CREATE FUNCTION getNthHighestSalary(N INT) RETURNS INT
BEGIN
DECLARE M INT; 
    SET M = N-1; 

  RETURN (
      # Write your MySQL query statement below.
      SELECT DISTINCT salary
      FROM employee
      ORDER BY salary DESC
      LIMIT M, 1
  );
END

补充：SQL 中，查询中子句的执行顺序

FROM 子句：指定从中检索数据的表。
WHERE 子句：根据指定的条件筛选行。
GROUP BY 子句：根据指定的列或表达式对行进行分组。
HAVING 子句：根据条件筛选分组的行。
SELECT 子句：选择将在结果集中返回的列或表达式。
ORDER BY 子句：根据指定的列或表达式对结果集进行排序。
LIMIT/OFFSET 子句：限制结果集中返回的行数。

注意：你的 DBMS 可能会以等价但不同的顺序执行一个查询。

176. Second Highest Salary

原题链接：176. Second Highest Salary
考察：去重、排序、取第n、为空情况

Table: Employee

+-------------+------+
| Column Name | Type |
+-------------+------+
| id          | int  |
| salary      | int  |
+-------------+------+
id is the primary key (column with unique values) for this table.
Each row of this table contains information about the salary of an employee.

Write a solution to find the second highest salary from the Employee table. If there is no second highest salary, return null (return None in Pandas).

The result format is in the following example.

Example 1:

Input:

Employee table:
+----+--------+
| id | salary |
+----+--------+
| 1  | 100    |
| 2  | 200    |
| 3  | 300    |
+----+--------+

Output:

+---------------------+
| SecondHighestSalary |
+---------------------+
| 200                 |
+---------------------+

Example 2:

Input:

Employee table:
+----+--------+
| id | salary |
+----+--------+
| 1  | 100    |
+----+--------+

Output:

+---------------------+
| SecondHighestSalary |
+---------------------+
| null                |
+---------------------+

题目大意：
返回第二高的薪水，如果不存在就返回空

pandas 思路：
要做的就是去重、排序、取第二高，以及没有结果的情况返回空

pandas 实现：

import pandas as pd

def second_highest_salary(employee: pd.DataFrame) -> pd.DataFrame:
    employee.drop_duplicates(subset='salary', inplace=True)  # 去重
    employee = employee.sort_values(by='salary', ascending=False) # 降序排列
    if employee.shape[0] < 2:
        ans = None
    else:
        ans = int(employee.head(2).tail(1)['salary'])
    return pd.DataFrame({'SecondHighestSalary':[ans]})

MySQL 思路1：
使用子查询和 limit 子句，外面再套一层，这样为空的情况就可以正确显示

MySQL 实现1：

SELECT
    (SELECT DISTINCT
            Salary
        FROM
            Employee
        ORDER BY Salary DESC
        LIMIT 1 OFFSET 1) AS SecondHighestSalary

MySQL 思路2：
用 ifnull 来处理不存在的情况

MySQL 实现2：

select ifnull(
    (select distinct Salary 
        from Employee 
        order by Salary desc 
        limit 1,1
    ),null
) as SecondHighestSalary;

补充：

limit n, m ：先获取到游标n的位置，再从此位置开始往后取m条数据，不足m条的返回实际的数量
IFNULL(表达式1, 表达式2) ：如果表达式1的值不为null返回表达式1的值，否则返回表达式2的值

184. Department Highest Salary（好题）

原题链接：184. Department Highest Salary
考察：groupby

Pandas Schema：

data = [[1, 'Joe', 70000, 1], [2, 'Jim', 90000, 1], [3, 'Henry', 80000, 2], [4, 'Sam', 60000, 2], [5, 'Max', 90000, 1]]
Employee = pd.DataFrame(data, columns=['id', 'name', 'salary', 'departmentId']).astype({'id':'Int64', 'name':'object', 'salary':'Int64', 'departmentId':'Int64'})
data = [[1, 'IT'], [2, 'Sales']]
Department = pd.DataFrame(data, columns=['id', 'name']).astype({'id':'Int64', 'name':'object'})

Table: Employee

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| id           | int     |
| name         | varchar |
| salary       | int     |
| departmentId | int     |
+--------------+---------+
id is the primary key (column with unique values) for this table.
departmentId is a foreign key (reference columns) of the ID from the Department table.
Each row of this table indicates the ID, name, and salary of an employee. It also contains the ID of their department.

Table: Department

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| name        | varchar |
+-------------+---------+
id is the primary key (column with unique values) for this table. It is guaranteed that department name is not NULL.
Each row of this table indicates the ID of a department and its name.

Write a solution to find employees who have the highest salary in each of the departments.

Return the result table in any order .

The result format is in the following example.

Example 1:

Input:

Employee table:
+----+-------+--------+--------------+
| id | name  | salary | departmentId |
+----+-------+--------+--------------+
| 1  | Joe   | 70000  | 1            |
| 2  | Jim   | 90000  | 1            |
| 3  | Henry | 80000  | 2            |
| 4  | Sam   | 60000  | 2            |
| 5  | Max   | 90000  | 1            |
+----+-------+--------+--------------+

Department table:
+----+-------+
| id | name  |
+----+-------+
| 1  | IT    |
| 2  | Sales |
+----+-------+

Output:

+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT         | Jim      | 90000  |
| Sales      | Henry    | 80000  |
| IT         | Max      | 90000  |
+------------+----------+--------+

Explanation: Max and Jim both have the highest salary in the IT department and Henry has the highest salary in the Sales department.

题目大意：
返回每个部门里工资最高的人的信息

pandas 思路：
首先需要链接两个表，想到了左连接
其次要注意最高薪资是多个人的情况要均保留，因此不能简单的排序后去重

pandas 实现：

import pandas as pd

def department_highest_salary(employee: pd.DataFrame, department: pd.DataFrame) -> pd.DataFrame:
    ans = pd.merge(employee, department, how='left', left_on='departmentId', right_on='id') # 合并两个表
    ans.rename(columns={'name_x': 'Employee', 'name_y': 'Department', 'salary': 'Salary'}, inplace=True) # 重命名

    # 选择工资等于部门最高工资的员工
    max_salary = ans.groupby('Department')['Salary'].transform('max')
    ans = ans[ans['Salary'] == max_salary]

    return ans[['Department', 'Employee', 'Salary']]

MySQL 思路：
先用子查询查出每个部门的最高薪资，然后用 in 查询(DepartmentId, Salary) 在临时表中的结果

MySQL 实现：

SELECT
	b.name AS Department,
	a.name AS Employee,
	a.salary AS salary 
FROM
	Employee a
	JOIN Department b ON a.departmentId = b.id 
WHERE
	( a.departmentId, a.salary ) IN ( SELECT departmentId, max( salary ) FROM Employee GROUP BY departmentId )

178. Rank Scores（好题）

原题链接：178. Rank Scores
考察：排序、窗口函数

Table: Scores

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| score       | decimal |
+-------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table contains the score of a game. Score is a floating point value with two decimal places.

Write a solution to find the rank of the scores. The ranking should be calculated according to the following rules:

The scores should be ranked from the highest to the lowest.
If there is a tie(平局) between two scores, both should have the same ranking.
After a tie, the next ranking number should be the next consecutive integer value. In other words, there should be no holes between ranks.

Return the result table ordered by score in descending order.

The result format is in the following example.

Example 1:

Input:

Scores table:
+----+-------+
| id | score |
+----+-------+
| 1  | 3.50  |
| 2  | 3.65  |
| 3  | 4.00  |
| 4  | 3.85  |
| 5  | 4.00  |
| 6  | 3.65  |
+----+-------+

Output:

+-------+------+
| score | rank |
+-------+------+
| 4.00  | 1    |
| 4.00  | 1    |
| 3.85  | 2    |
| 3.65  | 3    |
| 3.65  | 3    |
| 3.50  | 4    |
+-------+------+

题目大意：
返回根据排名升序之后的结果

pandas 思路：
就是经典密集型排序（相同分数的采用统一排名），pandas偷懒的话可以直接用 rank()

pandas 实现：

import pandas as pd

def order_scores(scores: pd.DataFrame) -> pd.DataFrame:
    scores['rank'] = scores['score'].rank(method='dense',ascending=False) # 使用rank函数 密集型 相同分数相同排名
    scores.sort_values(by='rank', ascending=True, inplace=True)
    
    return scores[['score', 'rank']]

MySQL 思路1：
窗口函数（窗口函数对一组查询行执行类似于聚合的操作。但是，聚合操作将查询行分组为一个单独的结果行，而窗口函数为每个查询行生成一个结果）， dense_rank() 窗口函数恰好满足要求

MySQL 实现1：

SELECT
	score,
	dense_rank( ) over ( ORDER BY score DESC ) AS 'rank' 
FROM
	scores

MySQL 思路2：
类似于计数排序的思路，找到大于等于本分数的个数，就得到排名
使用相关子查询来实现：

对于每个分数，选择在表中大于等于该分数的不同分数的数量
按照score对结果排序

MySQL 实现2：

SELECT
	S1.score,
	( SELECT COUNT( DISTINCT S2.score ) FROM Scores S2 WHERE S2.score >= S1.score ) AS 'rank' 
FROM
	Scores S1 
ORDER BY
	S1.score DESC

补充：常见的三种排序方式：

First排序排名：1、2、3、4、5
Min跳跃排名：1、2、2、4、5
Dense密集排名：1、2、2、3、4

196. Delete Duplicate Emails（好题）

原题链接：196. Delete Duplicate Emails
考察：groupby

Table: Person

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| email       | varchar |
+-------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table contains an email. The emails will not contain uppercase letters.

Write a solution to delete all duplicate emails, keeping only one unique email with the smallest id .

For SQL users, please note that you are supposed to write a DELETE statement and not a SELECT one.

For Pandas users, please note that you are supposed to modify Person in place.

After running your script, the answer shown is the Person table. The driver will first compile and run your piece of code and then show the Person table. The final order of the Person table does not matter .

The result format is in the following example.

Example 1:

Input:

Person table:
+----+------------------+
| id | email            |
+----+------------------+
| 1  | [email protected] |
| 2  | [email protected]  |
| 3  | [email protected] |
+----+------------------+

Output:

+----+------------------+
| id | email            |
+----+------------------+
| 1  | [email protected] |
| 2  | [email protected]  |
+----+------------------+

Explanation: [email protected] is repeated two times. We keep the row with the smallest Id = 1.

题目大意：
题目要求删去重复的邮箱，只保留id最小的额那一条

pandas 思路：
通过 groupby('email')['id'].transform('min') 将整个df按相同的email进行分组，并得到每个组最小的id组成的series，再根据他们的index用drop()进行删除。题目要求算法原地工作，那么设置 inplace=True 即可

pandas 实现：

import pandas as pd

def delete_duplicate_emails(person: pd.DataFrame) -> None:
    min_id = person.groupby('email')['id'].transform('min') # 找到最小的id
    removed_person = person[person['id'] != min_id] 
    person.drop(removed_person.index, inplace=True) # 删除对应index

MySQL 思路：
官方的写法是通过内连接，将每条记录和其他与它有相同邮箱的记录进行比较，当他的id不是最小的，就进行删除

MySQL 实现：

DELETE p1 FROM Person p1,
    Person p2
WHERE
    p1.Email = p2.Email AND p1.Id > p2.Id

1795. Rearrange Products Table

原题链接：1795. Rearrange Products Table
考察：表合并

Table: Products

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| product_id  | int     |
| store1      | int     |
| store2      | int     |
| store3      | int     |
+-------------+---------+
product_id is the primary key (column with unique values) for this table.
Each row in this table indicates the product's price in 3 different stores: store1, store2, and store3.
If the product is not available in a store, the price will be null in that store's column.

Write a solution to rearrange the Products table so that each row has (product_id, store, price). If a product is not available in a store, do not include a row with that product_id and store combination in the result table.

Return the result table in any order.

The result format is in the following example.

Example 1:

Input:

Products table:
+------------+--------+--------+--------+
| product_id | store1 | store2 | store3 |
+------------+--------+--------+--------+
| 0          | 95     | 100    | 105    |
| 1          | 70     | null   | 80     |
+------------+--------+--------+--------+

Output:

+------------+--------+-------+
| product_id | store  | price |
+------------+--------+-------+
| 0          | store1 | 95    |
| 0          | store2 | 100   |
| 0          | store3 | 105   |
| 1          | store1 | 70    |
| 1          | store3 | 80    |
+------------+--------+-------+

Explanation:
Product 0 is available in all three stores with prices 95, 100, and 105 respectively.
Product 1 is available in store1 with price 70 and store3 with price 80. The product is not available in store2.

题目大意：
修改表的结构，由原来表的结构修改到目标表的结构

pandas 思路：
行转列，原先的表名现在称为store列下的值。那么可以遍历三个store，然后单独处理，最后再concat()到一起

pandas 实现：

import pandas as pd

def rearrange_products_table(products: pd.DataFrame) -> pd.DataFrame:
    store_list = ['store1', 'store2', 'store3']
    ans = pd.DataFrame(columns=['product_id', 'store', 'price']) # 先设置一个空的ans

    # 遍历三个store
    for store in store_list:
        tmp = products.loc[products[store].notnull(), ['product_id', store]]
        tmp.rename(columns={store: 'price'}, inplace=True)
        tmp['store'] = store
        tmp = tmp[['product_id', 'store', 'price']]
        ans = pd.concat([ans, tmp])

    return ans

MySQL 思路：
重新排列表格，将三个商店的各自查询结果 union 成为一个完整结果

MySQL 实现：

SELECT product_id, 'store1' AS store, store1 AS price 
FROM Products 
WHERE store1 IS NOT NULL

UNION 

SELECT product_id, 'store2' AS store, store2 AS price 
FROM Products 
WHERE store2 IS NOT NULL

UNION 

SELECT product_id, 'store3' AS store, store3 AS price 
FROM Products 
WHERE store3 IS NOT NULL

你可能感兴趣的:(#,Pandas,leetcode,pandas,学习)

文本生成新纪元：解锁大模型的企业级应用密码
数字化浪潮席卷各行业的当下，文本生成技术正经历着翻天覆地的变革，这场变革的幕后功臣正是大模型。今天，咱们就来深入探讨大模型在文本生成领域的奥秘，看看它如何赋能企业，又该怎样规避风险，实现价值最大化。技术跃迁：从笨拙规则到智能生成回首往昔，文本生成依靠规则模板与关键字替换，虽能实现基础自动化，却如机械舞者，动作生硬、缺乏灵动。业务稍有变动，规则需全面重构，耗时费力。随着N-gram等统计机器学习方法
【零基础学AI】第10讲：线性回归 1989 0基础学AI 人工智能线性回归算法 python 回归 numpy 开源
本节课你将学到理解线性回归的原理和应用场景掌握最小二乘法的基本思想使用Python构建房价预测模型学会评估回归模型的性能指标开始之前环境要求Python3.8+JupyterNotebook或任何PythonIDE需要安装的包pipinstallscikit-learnpandasmatplotlibseabornnumpy前置知识第9讲：机器学习概述基本的Python和数据处理能力核心概念什么是
【零基础学AI】第9讲：机器学习概述 1989 0基础学AI 人工智能机器学习 python numpy devops 开源
本节课你将学到理解什么是机器学习，以及它与传统编程的区别掌握监督学习、无监督学习的基本概念使用scikit-learn完成你的第一个机器学习项目构建一个完整的iris花朵分类器开始之前环境要求Python3.8+JupyterNotebook或任何PythonIDE需要安装的包pipinstallscikit-learnpandasmatplotlibseaborn前置知识基本的Python语法（
python递归实现乘法_算法-递归 weixin_39817012 python递归实现乘法
我们在前面学习过递归函数，递归函数采用的就是递归算法，前面我们通过最常见的菲波那切数列去学习了递归函数，这一节我们再来详细了解一下递归算法。1.递归算法递归算法(英语：recursionalgorithm)在计算机科学中是指一种通过重复将问题分解为同类的子问题而解决问题的方法。递归式方法可以被用于解决很多的计算机科学问题，因此它是计算机科学中十分重要的一个概念，递归算法有三个特点：1)递归的过程一
PPT：数字化智能化数字孪生车间建设方案
导语大家好，我是社长，老K。专注分享智能制造和智能仓储物流等内容。欢迎大家到本文底部评论区留言。也欢迎大家使用我们的仓储物流技术AI智能体。新书《智能物流系统构成与技术实践》人俱乐部完整版文件和更多学习资料，请球友到知识星球【智能仓储物流技术研习社】自行下载这份文件是一份关于数字化智能化车间建设方案的详细规划文件，涵盖了从理论到实践的各个方面，旨在帮助企业实现车间的数字化转型和智能化升级。以下是其
EasyFeature软件特性四：星云空天大模型智绘中勘人工智能深度学习信息可视化
随着智能遥感进入新纪元，数据处理与模型效率的挑战日益成为应用落地的关键瓶颈。EasyFeature软件以星云空天大模型为核心，构建了基于人机智能提示学习的多模态系统。通过海量高质量数据预训练，集成了包括遥感场景分类、快速目标检测、地物分类、变化检测等在内的丰富模型库，提供端到端的智能解译能力。EasyFeature完全实现国产化自主可控，涵盖全栈软硬件支撑与训推一体化流程，确保高效安全。其极简安装
NLP随机插入 Humbunklung 机器学习自然语言处理人工智能 python nlp
文章目录随机插入示例Python代码示例随机插入随机插入是一种文本数据增强方法，其核心思想是在原句中随机选择若干位置，插入与上下文相关的词语，从而生成新的训练样本。这种方法能够增加句子的多样性，提高模型对不同词序和表达方式的鲁棒性。示例原句：机器学习可以提升数据分析的效率。随机插入后（插入“显著”）：机器学习可以显著提升数据分析的效率。Python代码示例下面是一个简单的随机插入实现，假设我们有一
FOC学习笔记（3）结构性凸极与饱和性凸极的区别及其在无感FOC中的影响 desssq FOC记录笔记单片机嵌入式硬件 foc算法
电机凸极性(Saliency)是指由于转子磁路不对称性导致的直轴(d轴)和交轴(q轴)磁阻或电感存在差异的特性。这种不对称性表现为d轴(与转子永磁体磁场方向一致)磁阻通常较大(电感较小)，而与之正交的q轴磁阻通常较小(电感较大)。凸极性是无位置传感器控制(特别是高频注入法)实现转子位置估算的关键物理基础，尤其在零速和低速工况下至关重要。凸极性主要来源于两种机制：结构性凸极和饱和性凸极。结构性凸极是
微信助手插件功能六十：屏蔽@我杨利杰YJlio #微信助手微信
微信助手插件功能六十：屏蔽@我微信助手插件功能六十：屏蔽@我功能简介功能效果⚙️开启步骤✅适合人群注意事项微信助手插件功能六十：屏蔽@我⚠️免责声明：本插件仅供学习与研究用途，不用于商业或非法用途。使用插件可能导致微信账号被封，相关后果需由使用者自行承担。插件下载后请在24小时内删除！功能简介今天为大家介绍微信助手的第60个实用功能——屏蔽@我。在微信群聊中，经常会被各种无关内容**@你**，尤其
微信助手插件功能五十九：屏蔽拍了拍杨利杰YJlio #微信助手微信
微信助手插件功能五十九：屏蔽拍了拍微信助手插件功能五十九：屏蔽拍了拍功能简介功能效果开启方式✅适用人群小贴士微信助手插件功能五十九：屏蔽拍了拍⚠️免责声明：本插件仅供学习与研究用途，不用于商业或非法用途。使用插件可能导致微信账号被封，相关后果需由使用者自行承担。插件下载后请在24小时内删除！功能简介今天为大家介绍微信助手的第59个实用功能——屏蔽拍了拍。微信“拍了拍”功能虽然初衷是增加互动感，但随
KITTI数据集可视化实用教程及源码解析国营窝窝乡蛮大人
本文还有配套的精品资源，点击获取简介：本文详细介绍如何使用源码实现KITTI数据集的可视化，强调数据集可视化在计算机视觉领域的关键作用。重点介绍如何加载、处理和融合KITTI数据集中的图像和激光雷达数据，并通过可视化手段分析结果，包括图像点云投影、坐标转换、颜色映射等技术。读者将通过学习源码深入理解数据结构、文件格式，并定制化工具以满足特定项目需求。1.计算机视觉数据集可视化的重要性在计算机视觉领
DiNA：扩张邻域注意力 Transformer AI专题精讲 Paper阅读 transformer 人工智能
摘要Transformer正迅速成为跨模态、跨领域和跨任务中应用最广泛的深度学习架构之一。在计算机视觉领域，除了持续发展的纯transformer架构，分层transformer也因其优越的性能和在现有框架中易于集成而受到广泛关注。这类模型通常采用局部化的注意力机制，如滑动窗口的NeighborhoodAttention（NA）或SwinTransformer的ShiftedWindowSelfA
解释神经网络的普适逼近定理（面试题200合集，中频、实用）快撑死的鱼算法工程师宝典（面试学习最新技术必备）深度学习人工智能
神经网络的普适逼近定理（UniversalApproximationTheorem,UAT）是理解为什么神经网络如此强大和灵活的理论基石之一。它为我们提供了信心，即在某些条件下，一个相对简单的神经网络结构原则上能够模拟出几乎任何复杂的函数。这个定理在深度学习领域中经常被提及，尤其是在讨论模型表达能力的时候。普适逼近定理（UniversalApproximationTheorem）概述普适逼近定理的
Python全栈数据工程师养成攻略-全部代码实战详解国营窝窝乡蛮大人
本文还有配套的精品资源，点击获取简介：本攻略提供全面资源，帮助初学者系统掌握Python全栈数据工程师的核心技能，包括数据处理、分析、数据库管理及Web开发。攻略详细指导如何使用.gitignore保持项目整洁，通过README.md文档深入了解项目内容，以及如何操作data目录中的数据集和codes目录中的Python代码，实现从数据处理到Web应用构建的全流程。学习内容涵盖数据ETL、Pand
学习日志02 ETF 基础数据可视化分析与简易管理系统 im_AMBER 学习数据分析
从头开始了，现在有数据的变动还有要用jupyter，这个文学编程的确很好，虽然我们老师有点push有点严格，但觉得好好学确实能收获不少知识的！！！是的！已经搭建了miniconda关联的jupyternotebook1我发现jupyter是不可以关闭conda终端运行的对哒，JupyterNotebook是依赖终端（或AnacondaPrompt）启动的本地服务，终端窗口不能直接关闭，否则Jupy
学习笔记2：redis基本操作
学习笔记2：redis基本操作启动服务在命令行中输入以下指令即可启动redis服务：[redis-server文件的路径][redis.conf文件的路径]进入客户端在命令行中输入以下指令即可进入操作redis的客户端：[redis-cli文件的路径]常用操作redis服务的指令#启动redis服务systemctlstartredis#重启redis服务systemctlrestartredis
r语言回归分析分类变量_R语言下的PSM分析分类变量处理与分析步骤 weixin_39715834 r语言回归分析分类变量 r语言清除变量
最近学习了PSM，我选择了用R去跑PSM，在这过程中遇到了许多问题，最后也都一一解决了，写下这个也是希望大家在遇到相同问题的时候能够得到帮助和启发，别的应该不会遇到太难的问题了哈哈。最近我也没做什么，录数据，或者说还在调整心态，最近遇到的事情也比较多，又或者说最近的心态比较乱，晚上也睡不好导致白天也比较烦躁，所以可能还是需要一段时间去好好调整，因此最近更新的也比较慢。不过还是会坚持的。问题阐述：1
【学习】《算法图解》第九章学习笔记：迪杰斯特拉算法程序员
一、迪杰斯特拉算法概述迪杰斯特拉算法（Dijkstra'salgorithm）是一种解决带权有向图上单源最短路径问题的贪心算法，由荷兰计算机科学家艾兹赫尔·迪杰斯特拉（EdsgerW.Dijkstra）于1956年提出。该算法常用于路由协议，也可以用作其他图算法的子程序。（一）算法适用场景迪杰斯特拉算法适用于：带权有向图（每条边都有权重）所有权重都为非负值（不能有负权边）需要找出从一个顶点到图中所
【TVM 教程】PAPI 入门
ApacheTVM是一个深度的深度学习编译框架，适用于CPU、GPU和各种机器学习加速芯片。更多TVM中文文档可访问→https://tvm.hyper.ai/性能应用程序编程接口（PerformanceApplicationProgrammingInterface，简称PAPI）是一个可在各种平台上提供性能计数器的库。在指定的运行期间，性能计数器提供处理器行为的准确底层信息，包含简单的指标，如总
Java Class常量池和运行时常量池的区别? java1234_小锋 java java 开发语言 jvm
大家好，我是锋哥。今天分享关于【JavaClass常量池和运行时常量池的区别?】面试题。希望对大家有帮助；JavaClass常量池和运行时常量池的区别?超硬核AI学习资料，现在永久免费了！在Java中，常量池分为类常量池和运行时常量池，它们分别用于存储不同类型的常量。下面是它们之间的主要区别：1.类常量池(ClassConstantPool)：定义：类常量池是指类加载时在.class文件中定义的常
Kafka中的消费者偏移量是如何管理的？ java1234_小锋 java kafka 分布式
大家好，我是锋哥。今天分享关于【Kafka中的消费者偏移量是如何管理的？】面试题。希望对大家有帮助；Kafka中的消费者偏移量是如何管理的？超硬核AI学习资料，现在永久免费了！在Kafka中，消费者的偏移量（offset）是用来追踪消费者读取消息的位置。Kafka提供了多种方式来管理消费者偏移量，确保消息能够从正确的位置继续消费。以下是Kafka中消费者偏移量的管理方式：1.自动提交（AutoCo
Kafka的消费消息是如何传递的？ java1234_小锋 java kafka 分布式
大家好，我是锋哥。今天分享关于【Kafka的消费消息是如何传递的？】面试题。希望对大家有帮助；Kafka的消费消息是如何传递的？超硬核AI学习资料，现在永久免费了！在Kafka中，消息的消费是通过消费者（Consumer）和消费者组（ConsumerGroup）来完成的。Kafka通过这种机制来传递消息并确保消息被正确消费。下面是Kafka消费消息传递的基本流程：消息生产（Producer）：Ka
Java GC是任意时候都能进行的吗？ java1234_小锋 java java 开发语言
大家好，我是锋哥。今天分享关于【JavaGC是任意时候都能进行的吗？】面试题。希望对大家有帮助；JavaGC是任意时候都能进行的吗？超硬核AI学习资料，现在永久免费了！Java的垃圾回收（GC）并不是任意时刻都能进行的。GC的执行有一定的规则和条件：垃圾回收的触发时机：堆内存不足：当Java堆内存空间不足时，垃圾回收会被触发，试图回收不再使用的对象来腾出内存。手动触发：可以通过System.gc(
什么是分布式系统?
大家好，我是锋哥。今天分享关于【什么是分布式系统?】面试题。希望对大家有帮助；什么是分布式系统?超硬核AI学习资料，现在永久免费了！分布式系统是指由多个独立的计算节点（计算机或设备）组成的系统，这些节点通过网络进行通信与协调，完成共同的任务。每个节点通常有自己的处理器、内存和存储，而系统的整体目标是通过这些节点的协作来提供一种统一的服务。分布式系统的主要特点：节点独立性：每个节点都有自己的硬件和操
【LLaMA 3实战】6、LLaMA 3上下文学习指南：从少样本提示到企业级应用实战无心水 LLaMA 3 模型实战专栏 llama LLaMA 3实战 LLaMa 3上下文 AI入门程序员的AI开发第一课人工智能 AI
一、上下文学习（ICL）的技术本质与LLaMA3突破（一）ICL的核心原理与模型机制上下文学习（In-ContextLearning）的本质是通过提示词激活预训练模型的元学习能力，使模型无需微调即可适应新任务。LLaMA3的ICL架构通过以下机制实现突破：任务抽象：从示例中提取输入输出映射规则，如情感分析中的正负向判断模式模式泛化：将规则迁移到新输入，支持跨领域知识迁移动态适应：实时调整注意力分布
Kafka与RabbitMQ相比有什么优势？ java1234_小锋 java java 开发语言
大家好，我是锋哥。今天分享关于【Kafka与RabbitMQ相比有什么优势？】面试题。希望对大家有帮助；Kafka与RabbitMQ相比有什么优势？超硬核AI学习资料，现在永久免费了！Kafka与RabbitMQ在消息队列的设计和应用上有一些显著的区别，每个都有各自的优势。以下是Kafka相比RabbitMQ的几个优势：高吞吐量和低延迟：Kafka能够处理大量消息并保持高吞吐量，适用于实时数据流处
跨区域智能电网负荷预测：基于 PaddleFL 的创新探索暮雨哀尘人工智能智能电网 AIGC PaddleFL 数据库 python 可视化
跨区域智能电网负荷预测：基于PaddleFL的创新探索摘要：本文聚焦跨区域智能电网负荷预测，提出基于PaddleFL框架的联邦学习方法，整合多地区智能电网数据，实现数据隐私保护下的高精度预测，为电网调度优化提供依据，推动智能电网发展。一、引言在当今社会，电力作为经济发展的命脉，其稳定供应对于保障社会生活的正常运转和生产的持续进行具有不可替代的重要性。而智能电网作为现代电力系统的重要发展方向，通过集
机器学习在智能供应链中的应用：需求预测与库存优化 Blossom.118 机器学习与人工智能机器学习人工智能机器人深度学习 python 神经网络 sklearn
在当今全球化的商业环境中，供应链管理的效率和灵活性对于企业的竞争力至关重要。智能供应链通过整合先进的信息技术，如物联网（IoT）、大数据和机器学习，能够实现从原材料采购到产品交付的全流程优化。机器学习技术在智能供应链中的应用尤为突出，尤其是在需求预测和库存优化方面。本文将探讨机器学习在智能供应链中的应用，并分析其带来的机遇和挑战。一、智能供应链中的需求预测准确的需求预测是供应链管理的核心。需求预测
面向隐私保护的机器学习：联邦学习技术解析与应用 Blossom.118 机器学习与人工智能机器学习人工智能深度学习 tensorflow python 神经网络 cnn
在当今数字化时代，数据隐私和安全问题日益受到关注。随着《数据安全法》《个人信息保护法》等法律法规的实施，企业和机构在数据处理和分析过程中面临着越来越严格的合规要求。然而，机器学习模型的训练和优化往往需要大量的数据支持，这就产生了一个矛盾：如何在保护数据隐私的前提下，充分利用数据的价值进行机器学习模型的训练和优化？联邦学习（FederatedLearning）作为一种新兴的隐私保护技术，为解决这一问
人工智能-基础篇-10-什么是卷积神经网络CNN（网格状数据处理：输入层，卷积层，激活函数，池化层，全连接层，输出层等） weisian151 人工智能人工智能 cnn 神经网络
卷积神经网络（ConvolutionalNeuralNetwork,CNN）是一种专为处理网格状数据（如图像、视频、音频）设计的深度学习模型。它通过模拟生物视觉机制，从原始数据中自动提取多层次的特征，最终实现高效的分类、检测或生成任务。1、核心概念与原理1、生物视觉启发局部感受野：模仿人类视觉皮层神经元仅响应局部区域刺激的特性，每个神经元关注输入数据的局部区域（如图像的一小块区域）。权值共享：同一
Enum用法不懂事的小屁孩 enum
以前的时候知道enum，但是真心不怎么用，在实际开发中，经常会用到以下代码: protected final static String XJ = "XJ"; protected final static String YHK = "YHK"; protected final static String PQ = "PQ";
【Spark九十七】RDD API之aggregateByKey bit1129 spark
1. aggregateByKey的运行机制 /** * Aggregate the values of each key, using given combine functions and a neutral "zero value". * This function can return a different result type
hive创建表是报错： Specified key was too long; max key length is 767 bytes daizj hive
今天在hive客户端创建表时报错，具体操作如下 hive> create table test2(id string); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:javax.jdo.JDODataSto
Map 与 JavaBean之间的转换周凡杨 java 自省转换反射
最近项目里需要一个工具类，它的功能是传入一个Map后可以返回一个JavaBean对象。很喜欢写这样的Java服务，首先我想到的是要通过Java 的反射去实现匿名类的方法调用，这样才可以把Map里的值set 到JavaBean里。其实这里用Java的自省会更方便，下面两个方法就是一个通过反射，一个通过自省来实现本功能。 1：JavaBean类 1 &nb
java连接ftp下载 g21121 java
有的时候需要用到java连接ftp服务器下载，上传一些操作，下面写了一个小例子。 /** ftp服务器地址 */ private String ftpHost; /** ftp服务器用户名 */ private String ftpName; /** ftp服务器密码 */ private String ftpPass; /** ftp根目录 */ private String f
web报表工具FineReport使用中遇到的常见报错及解决办法（二）老A不折腾 finereport web报表 java报表总结
抛砖引玉，希望大家能把自己整理的问题及解决方法晾出来，Mark一下，利人利己。出现问题先搜一下文档上有没有，再看看度娘有没有，再看看论坛有没有。有报错要看日志。下面简单罗列下常见的问题，大多文档上都有提到的。 1、没有返回数据集：在存储过程中的操作语句之前加上set nocount on 或者在数据集exec调用存储过程的前面加上这句。当S
linux 系统cpu 内存等信息查看墙头上一根草 cpu 内存 liunx
1 查看CPU 　　1.1 查看CPU个数　　# cat /proc/cpuinfo | grep "physical id" | uniq | wc -l 　　2 　　**uniq命令：删除重复行;wc –l命令：统计行数** 　　1.2 查看CPU核数　　# cat /proc/cpuinfo | grep "cpu cores" | u
Spring中的AOP aijuans spring AOP
Spring中的AOP Written by Tony Jiang @ 2012-1-18 （转）何为AOP AOP，面向切面编程。在不改动代码的前提下，灵活的在现有代码的执行顺序前后，添加进新规机能。来一个简单的Sample: 目标类： [java] view plain copy print ? package&nb
placeholder(HTML 5) IE 兼容插件 alxw4616 JavaScript jquery jQuery插件
placeholder 这个属性被越来越频繁的使用. 但为做HTML 5 特性IE没能实现这东西. 以下的jQuery插件就是用来在IE上实现该属性的. /** * [placeholder(HTML 5) IE 实现.IE9以下通过测试.] * v 1.0 by oTwo 2014年7月31日 11:45:29 */ $.fn.placeholder = function
Object类,值域,泛型等总结(适合有基础的人看) 百合不是茶泛型的继承和通配符变量的值域 Object类转换
java的作用域在编程的时候经常会遇到,而我经常会搞不清楚这个问题,所以在家的这几天回忆一下过去不知道的每个小知识点变量的值域; package 基础; /** * 作用域的范围 * * @author Administrator * */ public class zuoyongyu { public static vo
JDK1.5 Condition接口 bijian1013 java thread Condition java多线程
Condition 将 Object 监视器方法（wait、notify和 notifyAll）分解成截然不同的对象，以便通过将这些对象与任意 Lock 实现组合使用，为每个对象提供多个等待 set （wait-set）。其中，Lock 替代了 synchronized 方法和语句的使用，Condition 替代了 Object 监视器方法的使用。条件（也称为条件队列或条件变量）为线程提供了一
开源中国OSC源创会记录 bijian1013 hadoop spark MemSQL
一.Strata+Hadoop World（SHW）大会是全世界最大的大数据大会之一。SHW大会为各种技术提供了深度交流的机会，还会看到最领先的大数据技术、最广泛的应用场景、最有趣的用例教学以及最全面的大数据行业和趋势探讨。二.Hadoop &nbs
【Java范型七】范型消除 bit1129 java
范型是Java1.5引入的语言特性，它是编译时的一个语法现象，也就是说，对于一个类，不管是范型类还是非范型类，编译得到的字节码是一样的，差别仅在于通过范型这种语法来进行编译时的类型检查，在运行时是没有范型或者类型参数这个说法的。范型跟反射刚好相反，反射是一种运行时行为，所以编译时不能访问的变量或者方法(比如private)，在运行时通过反射是可以访问的，也就是说，可见性也是一种编译时的行为，在
【Spark九十四】spark-sql工具的使用 bit1129 spark
spark-sql是Spark bin目录下的一个可执行脚本，它的目的是通过这个脚本执行Hive的命令，即原来通过 hive>输入的指令可以通过spark-sql>输入的指令来完成。 spark-sql可以使用内置的Hive metadata-store，也可以使用已经独立安装的Hive的metadata store 关于Hive build into Spark
js做的各种倒计时 ronin47 js 倒计时
第一种：精确到秒的javascript倒计时代码 HTML代码: <form name="form1"> <div align="center" align="middle"
java-37.有n 个长为m+1 的字符串，如果某个字符串的最后m 个字符与某个字符串的前m 个字符匹配，则两个字符串可以联接 bylijinnan java
public class MaxCatenate { /* * Q.37 有n 个长为m+1 的字符串，如果某个字符串的最后m 个字符与某个字符串的前m 个字符匹配，则两个字符串可以联接， * 问这n 个字符串最多可以连成一个多长的字符串，如果出现循环，则返回错误。 */ public static void main(String[] args){
mongoDB安装开窍的石头 mongodb安装基本操作
mongoDB的安装 1:mongoDB下载 https://www.mongodb.org/downloads 2:下载mongoDB下载后解压
[开源项目]引擎的关键意义 comsci 开源项目
一个系统，最核心的东西就是引擎。。。。。而要设计和制造出引擎，最关键的是要坚持。。。。。。现在最先进的引擎技术，也是从莱特兄弟那里出现的，但是中间一直没有断过研发的
软件度量的一些方法 cuiyadll 方法
软件度量的一些方法http://cuiyingfeng.blog.51cto.com/43841/6775/在前面我们已介绍了组成软件度量的几个方面。在这里我们将先给出关于这几个方面的一个纲要介绍。在后面我们还会作进一步具体的阐述。当我们不从高层次的概念级来看软件度量及其目标的时候，我们很容易把这些活动看成是不同而且毫不相干的。我们现在希望表明他们是怎样恰如其分地嵌入我们的框架的。也就是我们度量的
XSD中的targetNameSpace解释 darrenzhu xml namespace xsd targetnamespace
参考链接: http://blog.csdn.net/colin1014/article/details/357694 xsd文件中定义了一个targetNameSpace后，其内部定义的元素，属性，类型等都属于该targetNameSpace,其自身或外部xsd文件使用这些元素，属性等都必须从定义的targetNameSpace中找：例如：以下xsd文件，就出现了该错误，即便是在一
什么是RAID0、RAID1、RAID0+1、RAID5，等磁盘阵列模式? dcj3sjt126com raid
RAID 1又称为Mirror或Mirroring，它的宗旨是最大限度的保证用户数据的可用性和可修复性。 RAID 1的操作方式是把用户写入硬盘的数据百分之百地自动复制到另外一个硬盘上。由于对存储的数据进行百分之百的备份，在所有RAID级别中，RAID 1提供最高的数据安全保障。同样，由于数据的百分之百备份，备份数据占了总存储空间的一半，因而，Mirror的磁盘空间利用率低，存储成本高。 Mir
yii2 restful web服务快速入门 dcj3sjt126com PHP yii2
快速入门 Yii 提供了一整套用来简化实现 RESTful 风格的 Web Service 服务的 API。特别是，Yii 支持以下关于 RESTful 风格的 API：支持 Active Record 类的通用API的快速原型涉及的响应格式（在默认情况下支持 JSON 和 XML) 支持可选输出字段的定制对象序列化适当的格式的数据采集和验证错误
MongoDB查询(3)——内嵌文档查询（七） eksliang MongoDB查询内嵌文档 MongoDB查询内嵌数组
MongoDB查询内嵌文档转载请出自出处：http://eksliang.iteye.com/blog/2177301 一、概述有两种方法可以查询内嵌文档：查询整个文档；针对键值对进行查询。这两种方式是不同的，下面我通过例子进行分别说明。二、查询整个文档例如:有如下文档 db.emp.insert({ &qu
android4.4从系统图库无法加载图片的问题 gundumw100 android
典型的使用场景就是要设置一个头像，头像需要从系统图库或者拍照获得，在android4.4之前，我用的代码没问题，但是今天使用android4.4的时候突然发现不灵了。baidu了一圈，终于解决了。下面是解决方案： private String[] items = new String[] { "图库","拍照" }; /* 头像名称 */
网页特效大全 jQuery等 ini JavaScript jquery css html5 ini
HTML5和CSS3知识和特效 asp.net ajax jquery实例分享一个下雪的特效 jQuery倾斜的动画导航菜单选美大赛示例你会选谁 jQuery实现HTML5时钟功能强大的滚动播放插件JQ-Slide 万圣节快乐！！！向上弹出菜单jQuery插件 htm5视差动画 jquery将列表倒转顺序推荐一个jQuery分页插件 jquery animate
swift objc_setAssociatedObject block(version1.2 xcode6.4) 啸笑天 version
import UIKit class LSObjectWrapper: NSObject { let value: ((barButton: UIButton?) -> Void)? init(value: (barButton: UIButton?) -> Void) { self.value = value
Aegis 默认的 Xfire 绑定方式，将 XML 映射为 POJO MagicMa_007 java POJO xml Aegis xfire
Aegis 是一个默认的 Xfire 绑定方式，它将 XML 映射为 POJO, 支持代码先行的开发.你开发服务类与 POJO,它为你生成 XML schema/wsdl XML 和注解映射概览默认情况下，你的 POJO 类被是基于他们的名字与命名空间被序列化。如果
js get max value in (json) Array qiaolevip 每天进步一点点学习永无止境 max 纵观千象
// Max value in Array var arr = [1,2,3,5,3,2];Math.max.apply(null, arr); // 5 // Max value in Jaon Array var arr = [{"x":"8/11/2009","y":0.026572007},{"x"
XMLhttpRequest 请求 XML,JSON ,POJO 数据 Luob. POJO json Ajax xml XMLhttpREquest
在使用XMlhttpRequest对象发送请求和响应之前，必须首先使用javaScript对象创建一个XMLHttpRquest对象。 var xmlhttp； function getXMLHttpRequest(){ if(window.ActiveXObject){ xmlhttp:new ActiveXObject("Microsoft.XMLHTTP
jquery wuai jquery
以下防止文档在完全加载之前运行Jquery代码，否则会出现试图隐藏一个不存在的元素、获得未完全加载的图像的大小等等 $(document).ready(function(){ jquery代码; }); <script type="text/javascript" src="c:/scripts/jquery-1.4.2.min.js&quo

【Leetcode 30天Pandas挑战】学习记录 上

题目列表：

条件筛选：

595. Big Countries

1757. Recyclable and Low Fat Products

183. Customers Who Never Order

1148. Article Views I

字符串函数：

1683. Invalid Tweets

1873. Calculate Special Bonus（好题）

1667. Fix Names in a Table（好题）

1517. Find Users With Valid E-Mails（好题）

1527. Patients With a Condition（好题）

数据操作:

177. Nth Highest Salary（好题）

176. Second Highest Salary

184. Department Highest Salary（好题）

178. Rank Scores（好题）

196. Delete Duplicate Emails（好题）

1795. Rearrange Products Table

你可能感兴趣的:(#,Pandas,leetcode,pandas,学习)

【Leetcode 30天Pandas挑战】学习记录上