CVPR2023、ICCV2023论文题目汇总及词频统计

CVPR2023论文地址:CVPR 2023 Open Access Repository (thecvf.com)

ICCV2023论文地址:ICCV 2023 Open Access Repository (thecvf.com)

ECCV2022,2020,2018论文地址:ECVA | European Computer Vision Association

先看效果

CVPR2023、ICCV2023论文题目汇总及词频统计_第1张图片CVPR2023、ICCV2023论文题目汇总及词频统计_第2张图片CVPR2023、ICCV2023论文题目汇总及词频统计_第3张图片

得到这些词可以进行研究热点估计,自己思考或者直接交给chatgpt

CVPR2023、ICCV2023论文题目汇总及词频统计_第4张图片

代码如下:

# -*- coding: utf-8 -*-
import urllib.request
import re
from collections import Counter
import os

def get_html(url):
    with urllib.request.urlopen(url) as page:
        html = page.read().decode('utf-8')
    return html

def download_file(download_url, file_name):
    with urllib.request.urlopen(download_url) as response, open(file_name, 'wb') as file:
        file.write(response.read())
    print("Completed")

url = 'https://openaccess.thecvf.com/CVPR2023?day=all' ## 替换CVPR2023 | ICCV2023
html = get_html(url)
url_list = re.findall(r'\bcontent/CVPR2023.*paper\.pdf\b', html) ## 替换 CVPR2023 | ICCV2023
namelist = [url.split('/')[-1] for url in url_list]

md_file_path = "G:\paper\CVPR2023.md" ##修改成你自己的

# 如果 MD 文件不存在,重新生成
if not os.path.exists(md_file_path):
    with open(md_file_path, "w") as f:
        f.writelines(name + "\n" for name in namelist)

# 无论是否重新生成,都读取 MD 文件内容
with open(md_file_path, 'r', encoding='utf-8') as file:
    text = file.read()

# 将文本内容拆分成单词列表
words = [word.strip('_') for word in text.split('_')]

# 使用Counter来统计词频
word_counts = Counter(words)

# 打印按词频降序排列的结果
for word, count in word_counts.most_common():
    print(f"{word}: {count}")

md文件部分内容如下所示:

CVPR:

CVPR2023、ICCV2023论文题目汇总及词频统计_第5张图片

ICCV:

CVPR2023、ICCV2023论文题目汇总及词频统计_第6张图片

如果想要CVPR和ICCV一起统计,代码如下:

# -*- coding: utf-8 -*-
import urllib.request
import re
from collections import Counter
import os

def get_html(url):
    with urllib.request.urlopen(url) as page:
        html = page.read().decode('utf-8')
    return html

def download_file(download_url, file_name):
    with urllib.request.urlopen(download_url) as response, open(file_name, 'wb') as file:
        file.write(response.read())
    print("Completed")

url_1 = 'https://openaccess.thecvf.com/CVPR2023?day=all'
html_1 = get_html(url_1)
url_list_1 = re.findall(r'\bcontent/CVPR2023.*paper\.pdf\b', html_1)
namelist_1 = [url.split('/')[-1] for url in url_list_1]
md_file_path_1 = "G:\paper\CVPR2023.md"
# 如果 MD 文件不存在,重新生成
if not os.path.exists(md_file_path_1):
    with open(md_file_path_1, "w") as f:
        f.writelines(name + "\n" for name in namelist_1)
# 无论是否重新生成,都读取 MD 文件内容
with open(md_file_path_1, 'r', encoding='utf-8') as file:
    text_1 = file.read()
    
url_2 = 'https://openaccess.thecvf.com/ICCV2023?day=all'
html_2 = get_html(url_2)
url_list_2 = re.findall(r'\bcontent/ICCV2023.*paper\.pdf\b', html_2)
namelist_2 = [url.split('/')[-1] for url in url_list_2]
md_file_path_2 = "G:\paper\ICCV2023.md"
# 如果 MD文件不存在,重新生成
if not os.path.exists(md_file_path_2):
    with open(md_file_path_2, "w") as f:
        f.writelines(name + "\n" for name in namelist_2)
# 无论是否重新生成,都读取 MD 文件内容
with open(md_file_path_2, 'r', encoding='utf-8') as file:
    text_2 = file.read()

text = text_1 + text_2
# 将文本内容拆分成单词列表
words = [word.strip('_') for word in text.split('_')]

# 使用Counter来统计词频
word_counts = Counter(words)

# 打印按词频降序排列的结果
for word, count in word_counts.most_common():
    print(f"{word}: {count}")

部分结果如下所示:

CVPR2023、ICCV2023论文题目汇总及词频统计_第7张图片

你可能感兴趣的:(c#,开发语言)