最近做题的时候要写一些题解,在把CodeForces
的题目复制下来的时候,数学公式的处理比较麻烦,所以我用Python
的urllib.request
和BeautifulSoup4
库对题目信息进行了爬取,写题解的时候时间节约了很多。
考虑到大家可能也会遇到同样的问题,写一篇笔记分享给大家。
安装urllib
和BeautifulSoup
库。
pip3 install urllib
pip3 install beautifulsoup4
以 CodeForces 1353 B. Two Arrays And Swaps 为例。
# 导入库
import urllib.request
import bs4
from bs4 import BeautifulSoup
# 题目属性
problemSet = "1353"
problemId = "B"
# 题目链接
url = f"https://codeforces.com/problemset/problem/{problemSet}/{problemId}"
# 获取网页内容
html = urllib.request.urlopen(url).read()
# 格式化
soup = BeautifulSoup(html,'lxml')
# 存储
data_dict = {}
# 找到主体内容
mainContent = soup.find_all(name="div", attrs={"class" :"problem-statement"})[0]
先从比较简单的信息入手,找到题目标题、时间、和内存限制。
# Limit
# 找到题目标题、时间、和内存限制
# Title
data_dict['Title'] = f"CodeForces {problemSet} " + mainContent.find_all(name="div", attrs={"class":"title"})[0].contents[-1]
# Time Limit
data_dict['Time Limit'] = mainContent.find_all(name="div", attrs={"class":"time-limit"})[0].contents[-1]
# Memory Limit
data_dict['Memory Limit'] = mainContent.find_all(name="div", attrs={"class":"memory-limit"})[0].contents[-1]
定义函数,处理主体内容中诡异的空格和公式的三个美元符号$$$
。
def divTextProcess(div):
"""
处理标签中的文本内容
"""
strBuffer = ''
# 遍历处理每个标签
for each in div.find_all("p"):
for content in each.contents:
# 如果不是第一个,加换行符
if (strBuffer != ''):
strBuffer += '\n\n'
# 处理
if (type(content) != bs4.element.Tag):
# 如果是文本,添加至字符串buffer中
strBuffer += content.replace(" ", " ").replace("$$$", "$")
else:
# 如果是html元素,如span等,加上粗体
strBuffer += "**" + content.contents[0].replace(" ", " ").replace("$$$", "$") + "**"
# 返回结果
return strBuffer
4.2. Problem Description
获取题目描述,由于题目描述的标签没有id
和class
属性,这里通过找列表中第10
个div
的方式来获取。
# 处理题目描述
data_dict['Problem Description'] = divTextProcess(mainContent.find_all("div")[10])
4.3. Input
输入描述
div = mainContent.find_all(name="div", attrs={"class":"input-specification"})[0]
data_dict['Input'] = divTextProcess(div)
4.4. Output
输出描述
div = mainContent.find_all(name="div", attrs={"class":"output-specification"})[0]
data_dict['Output'] = divTextProcess(div)
4.5. Sample Input & Output
输入样例,用代码框环境包围。
# Input
div = mainContent.find_all(name="div", attrs={"class":"input"})[0]
data_dict['Sample Input'] = "```cpp" + div.find_all("pre")[0].contents[0] + '```'
# Onput
div = mainContent.find_all(name="div", attrs={"class":"output"})[0]
data_dict['Sample Output'] = "```cpp" + div.find_all("pre")[0].contents[0] + '```'
4.6. Note
样例说明
# 若有样例说明
if(len(mainContent.find_all(name="div", attrs={"class":"note"})) > 0):
div = mainContent.find_all(name="div", attrs={"class":"note"})[0]
data_dict['Note'] = divTextProcess(div)
4.7. Source
题目链接
data_dict['Source'] = '[' + data_dict['Title'] + ']' + '(' + url + ')'
5. 输出
for each in data_dict.keys():
print('### ' + each + '\n')
print(data_dict[each].replace("\n\n**", "**").replace("**\n\n", "**") + '\n')
下面是最后的输出结果
### Title
CodeForces 1353 B. Two Arrays And Swaps
### Time Limit
1 second
### Memory Limit
256 megabytes
### Problem Description
You are given two arrays $a$ and $b$ both consisting of $n$ positive (greater than zero) integers. You are also given an integer $k$.
In one move, you can choose two indices $i$ and $j$ ($1 \le i, j \le n$) and swap $a_i$ and $b_j$ (i.e. $a_i$ becomes $b_j$ and vice versa). Note that $i$ and $j$ can be equal or different (in particular, swap $a_2$ with $b_2$ or swap $a_3$ and $b_9$ both are acceptable moves).
Your task is to find the **maximum** possible sum you can obtain in the array $a$ if you can do no more than (i.e. at most) $k$ such moves (swaps).
You have to answer $t$ independent test cases.
### Input
The first line of the input contains one integer $t$ ($1 \le t \le 200$) — the number of test cases. Then $t$ test cases follow.
The first line of the test case contains two integers $n$ and $k$ ($1 \le n \le 30; 0 \le k \le n$) — the number of elements in $a$ and $b$ and the maximum number of moves you can do. The second line of the test case contains $n$ integers $a_1, a_2, \dots, a_n$ ($1 \le a_i \le 30$), where $a_i$ is the $i$-th element of $a$. The third line of the test case contains $n$ integers $b_1, b_2, \dots, b_n$ ($1 \le b_i \le 30$), where $b_i$ is the $i$-th element of $b$.
### Output
For each test case, print the answer — the **maximum** possible sum you can obtain in the array $a$ if you can do no more than (i.e. at most) $k$ swaps.
### Sample Input
// 这里会有```cpp代码环境,在这里为了展示方便去掉了
5
2 1
1 2
3 4
5 5
5 5 6 6 5
1 2 5 4 3
5 3
1 2 3 4 5
10 9 10 10 9
4 0
2 2 4 3
2 4 2 3
4 4
1 2 2 1
4 4 5 4
### Sample Output
6
27
39
11
17
### Note
In the first test case of the example, you can swap $a_1 = 1$ and $b_2 = 4$, so $a=[4, 2]$ and $b=[3, 1]$.
In the second test case of the example, you don't need to swap anything.
In the third test case of the example, you can swap $a_1 = 1$ and $b_1 = 10$, $a_3 = 3$ and $b_3 = 10$ and $a_2 = 2$ and $b_4 = 10$, so $a=[10, 10, 10, 4, 5]$ and $b=[1, 9, 3, 2, 9]$.
In the fourth test case of the example, you cannot swap anything.
In the fifth test case of the example, you can swap arrays $a$ and $b$, so $a=[4, 4, 5, 4]$ and $b=[1, 2, 2, 1]$.
### Source
[CodeForces 1353 B. Two Arrays And Swaps](https://codeforces.com/problemset/problem/1353/B)
联系邮箱:[email protected]
Github:https://github.com/CurrenWong
欢迎转载/Star/Fork,有问题欢迎通过邮箱交流。