首先尝试了八爪鱼,还有谷歌浏览器的很多爬虫插件,比如simple scraper(这个真好用)
import requests
from bs4 import BeautifulSoup
url: str = 'https://loriballen.com/ai-chatgpt-prompts/'
# 爬取 https://loriballen.com/ai-chatgpt-prompts/ 中的文字内容放到 text.txt 中
response: requests.models.Response = requests.get(url)
soup: BeautifulSoup = BeautifulSoup(response.text, 'html.parser')
text: str = soup.get_text()
# text放到text.txt中
# f.write(text)
# UnicodeEncodeError: 'gbk' codec can't encode character '\xa0' in position 4253: illegal multibyte sequence
with open('text.txt', 'w', encoding='utf-8') as f:
这个写法是用 github copilot lab 帮我修改过的,尝试了 readable
url: str = 'https://loriballen.com/ai-chatgpt-prompts/'
Source: Conversation with Bing, 2023/3/23(1) python - Use of colon in variable declaration - Stack Overflow. https://stackoverflow.com/questions/51639332/use-of-colon-in-variable-declaration Accessed 2023/3/23.
(2) python - Function parameter with colon - Stack Overflow. https://stackoverflow.com/questions/54962869/function-parameter-with-colon Accessed 2023/3/23.
(3) Python Print Type of Variable – How to Get Var Type - freeCodeCamp.org. https://www.freecodecamp.org/news/python-print-type-of-variable-how-to-get-var-type/ Accessed 2023/3/23.
(4) When To Use Colon ( in Python? - AskPython. https://www.askpython.com/python/examples/when-to-use-colon-python Accessed 2023/3/23.
# 读取text.txt 去除掉空行
# write your code here
with open('text.txt', 'r', encoding='utf-8') as f:
text = f.read()
# text.strip() 去除掉首尾的空格
text = text.strip()
# text.split('\n') 按照换行符分割
text = text.split('\n')
# text = [i for i in text if i != ''] 去除掉空行
text = [i for i in text if i != '']
# text = '\n'.join(text) 按照换行符合并
text = '\n'.join(text)
# 写入textNew.txt
with open('textNew.txt', 'w', encoding='utf-8') as f:
Business Strategy Prompts
What are some tips for [X] What are some tips for getting more real estate listings?
What are the best resources for [X] What are the best resources for a new Youtuber?
How can you [achieve goal] in [time frame]? How can you monetize a Youtube channel within 1 year?
Blogging Prompts
What is the most popular [type of content] in [industry/niche] What is the most popular type of content in the digital marketing space?
Email Prompts
Write a marketing email promoting this product or service: [insert product or service]It helps to add text to guide ChatGPT to the facts. Bullet points work well. After entering the product or service, add a colon, and then add the bullet points or text.
Marketing Prompts
Check the grammar and punctuation in this document (you will need to provide the paragraph, paragraphs, or full text)Check the grammar and spelling in an article.
给了下面这段prompt,可惜copilot,phind,cusor给出的代码都跑不了(cursor 还以为我的#的意思是那一行开头的符号是#)
# 读取textNew.txt # 举一个例子让你明白我的意思 # textNew.txt里面的格式大概是下面这样的 # category1 # prompt1 # prompt2 # prompt3 # 空行 # category2 # prompt1 # prompt2 # 现在你要做的是把这个文件里面的内容读取出来,然后把它变成一个字典
# 读取textNew.txt
# 举一个例子让你明白我的意思
# textNew.txt里面的格式大概是下面这样的
# 空行
# category1
# prompt1
# prompt2
# ...
# promptk
# 空行
# category2
# prompt1
# prompt2
# ...
# promptk
# category3
# prompt1
# prompt2
# ...
# promptk
# 空行
# categoryn
# prompt1
# prompt2
# ...
# promptk
# 现在你要做的是把这个文件里面的内容读取出来,然后把它变成一个字典
# 你要做的是把这个字典写入到一个叫做data.json的文件里面
# write your code here
# 读取textNew.txt
# 举一个例子让你明白我的意思
# textNew.txt里面的格式大概是下面这样的
# 空行
# category1
# prompt1
# prompt2
# ...
# promptk
# 空行
# category2
# prompt1
# prompt2
# ...
# promptk
# category3
# prompt1
# prompt2
# ...
# promptk
# 空行
# categoryn
# prompt1
# prompt2
# ...
# promptk
# 现在你要做的是把这个文件里面的内容读取出来,然后把它变成一个字典
# 你要做的是把这个字典写入到一个叫做data.json的文件里面
# write your code here
# 首先,要导入json模块
import json
# 然后读入textNew.txt
with open('textNew.txt', 'r', encoding='utf-8') as f:
# write your code here
# 按照空行分割段落
# 每一段的第一行是category
# 每一段的第二行开始是prompt
text = f.read() # 读取textNew.txt
text = text.split('\n\n') # 按照空行分割段落
# 提取category
# i.split('\n')的意思是按照换行符分割
# text是一个列表,列表里面的每一个元素都是一个字符串
# i是其中的字符串,也就是一个段落,i[0]就是这个段落的第一行
category = [i.split('\n')[0] for i in text]
# 提取prompt
prompt = [i.split('\n')[1:] for i in text]
# 存入字典
# dict(zip(category, prompt))的意思是把category和prompt合并成一个字典
# zip(category, prompt)的意思是把category和prompt合并成一个列表
data = dict(zip(category, prompt))
# 写入data.json
with open('data.json', 'w', encoding='utf-8') as f:
json.dump(data, f, ensure_ascii=False, indent=4)