本节主要介绍inferring(文本推理)任务,可以视为对输入文本的理解和分析任务,包含文本分类、理解情感等。
给定一段关于灯的评论
lamp_review = """
Needed a nice lamp for my bedroom, and this one had \
additional storage and not too high of a price point. \
Got it fast. The string to our lamp broke during the \
transit and the company happily sent over a new one. \
Came within a few days as well. It was easy to put \
together. I had a missing part, so I contacted their \
support and they very quickly got me the missing piece! \
Lumina seems to me to be a great company that cares \
about their customers and products!!
"""
response = get_completion(prompt)
print(response)
现在我们要模型给出该段评论的情感,我们可以使用下述prompt输入模型。
`prompt = f"""
What is the sentiment of the following product review,
which is delimited with triple backticks?
Review text: '''{lamp_review}'''
"""
模型会得到"The sentiment of the product review is positive"。如果我们只想得到一个情感分类,方便后续处理,我们可以直接将希望得到的分类标签规定在prompt中:
`prompt = f"""
What is the sentiment of the following product review,
which is delimited with triple backticks?
Give your answer as a single word, either "positive" \
or "negative".
Review text: '''{lamp_review}'''
"""
这样最后得到的情感分类就是positive。
仍考虑上述评论数据,我们想得到多个情感(如果指定了list则可以认为是多分类任务),则可以输入下述prompt给chatgpt。
prompt = f"""
Identify a list of emotions that the writer of the \
following review is expressing. Include no more than \
five items in the list. Format your answer as a list of \
lower-case words separated by commas.
Review text: '''{lamp_review}'''
"""
下述prompt可以令模型同时执行包括情感分类,实体提取在哪多个任务等,并输出结构化结果。
prompt = f"""
Identify the following items from the review text:
- Sentiment (positive or negative)
- Is the reviewer expressing anger? (true or false)
- Item purchased by reviewer
- Company that made the item
The review is delimited with triple backticks. \
Format your response as a JSON object with \
"Sentiment", "Anger", "Item" and "Brand" as the keys.
If the information isn't present, use "unknown" \
as the value.
Make your response as short as possible.
Format the Anger value as a boolean.
Review text: '''{lamp_review}'''
"""
该prompt用到了本系列课程的几个知识点
给定一篇长文本
story = """
In a recent survey conducted by the government,
public sector employees were asked to rate their level
of satisfaction with the department they work at.
The results revealed that NASA was the most popular
department with a satisfaction rating of 95%.
One NASA employee, John Smith, commented on the findings,
stating, "I'm not surprised that NASA came out on top.
It's a great place to work with amazing people and
incredible opportunities. I'm proud to be a part of
such an innovative organization."
The results were also welcomed by NASA's management team,
with Director Tom Johnson stating, "We are thrilled to
hear that our employees are satisfied with their work at NASA.
We have a talented and dedicated team who work tirelessly
to achieve our goals, and it's fantastic to see that their
hard work is paying off."
The survey also revealed that the
Social Security Administration had the lowest satisfaction
rating, with only 45% of employees indicating they were
satisfied with their job. The government has pledged to
address the concerns raised by employees in the survey and
work towards improving job satisfaction across all departments.
"""
,可通过下述prompt得到文本的关键词/主题,并指定关键词之间通过逗号分隔,方便后续使用。
prompt = f"""
Determine five topics that are being discussed in the \
following text, which is delimited by triple backticks.
Make each item one or two words long.
Format your response as a list of items separated by commas.
Text sample: '''{story}'''
"""
得到的结果为“government survey, job satisfaction, NASA, Social Security Administration, employee concerns”。
如果我们已知主题列表topic_list = [ "nasa", "local government", "engineering", "employee satisfaction", "federal government" ]
,希望判断上述文本和列表中的哪些主题有关,则可通过下述prompt返回一个表示上述文章是否包含每个主题的列表
prompt = f"""
Determine whether each item in the following list of \
topics is a topic in the text below, which
is delimited with triple backticks.
Give your answer as list with 0 or 1 for each topic.\
List of topics: {", ".join(topic_list)}
Text sample: '''{story}'''
"""
,得到结果如下,即文章于nasa、employee satisfaction、federal government
nasa: 1
local government: 0
engineering: 0
employee satisfaction: 1
federal government: 1
上述方法被称为为zero-shot learning,即不需要标记样本,仅通过一个prompt给出输出。