【Java-LangChain:面向开发者的提示工程-4】文本概括

第四章 文本概括

当今世界上文本信息浩如烟海,我们很难拥有足够的时间去阅读所有想了解的东西。但欣喜的是,目前LLM在文本概括任务上展现了强大的水准,也已经有不少团队将概括功能实现在多种应用中。
本章节将介绍如何使用编程的方式,调用API接口来实现“文本概括”功能。

环境配置

参考第二章的 环境配置小节内容即可。

单一文本概括

以商品评论的总结任务为例:对于电商平台来说,网站上往往存在着海量的商品评论,这些评论反映了所有客户的想法。如果我们拥有一个工具去概括这些海量、冗长的评论,便能够快速地浏览更多评论,洞悉客户的偏好,从而指导平台与商家提供更优质的服务。

输入文本的示例:

//评论示例
private String review = "这个熊猫公仔是我给女儿的生日礼物,她很喜欢,去哪都带着。\n" +
        "公仔很软,超级可爱,面部表情也很和善。但是相比于价钱来说,\n" +
        "它有点小,我感觉在别的地方用同样的价钱能买到更大的。\n" +
        "快递比预期提前了一天到货,所以在送给女儿之前,我自己玩了会。";

限制输出文本长度

我们尝试限制文本长度为最多30词。

        String prompt = "您的任务是从电子商务网站上生成一个产品评论的简短摘要。\n" +
                "请对三个反引号之间的评论文本进行概括,最多30个词汇。\n" +
                "评论: ```{" + review + "}```";

        String message = this.getCompletion(prompt);

        log.info("iterative1:\n{}", message);

熊猫公仔是女儿生日礼物,软可爱,面部表情友善。价钱稍贵,大小有点小。快递提前一天到货。

设置关键角度侧重

有时,针对不同的业务,我们对文本的侧重会有所不同。例如对于商品评论文本,物流会更关心运输时效,商家更加关心价格与商品质量,平台更关心整体服务体验。
我们可以通过增加Prompt提示,来体现对于某个特定角度的侧重。


        String prompt = "您的任务是从电子商务网站上生成一个产品评论的简短摘要。\n" +
                "请对三个反引号之间的评论文本进行概括,最多30个词汇,并且聚焦在产品运输上。\n" +
                "评论: ```{" + review + "}```";

        String message = this.getCompletion(prompt);

        log.info("iterative2:\n{}", message);
熊猫公仔很可爱,女儿喜欢,但有点小。快递提前一天到货。

可以看到,输出结果以“快递提前一天到货”开头,体现了对于快递效率的侧重。

侧重于价格与质量

        String prompt = "您的任务是从电子商务网站上生成一个产品评论的简短摘要。\n" +
                "请对三个反引号之间的评论文本进行概括,最多30个词汇,并且聚焦在产品价格和质量上。\n" +
                "评论: ```{" + review + "}```\n";

        String message = this.getCompletion(prompt);

        log.info("iterative3:\n{}", message);
可爱的熊猫公仔,质量好,但有点小。价格稍高,但快递提前到货。

可以看到,输出结果以“质量好、价格小贵、尺寸小”开头,体现了对于产品价格与质量的侧重。

关键信息提取

在上节中,虽然我们通过添加关键角度侧重的 Prompt ,使得文本摘要更侧重于某一特定方面,但是可以发现,结果中也会保留一些其他信息,如偏重价格与质量角度的概括中仍保留了“快递提前到货”的信息。
如果我们只想要提取某一角度的信息,并过滤掉其他所有信息,则可以要求 LLM 进行“文本提取( Extract )”而非“概括( Summarize )”

        String prompt = "您的任务是从电子商务网站上的产品评论中提取相关信息。\n" +
                "请从以下三个反引号之间的评论文本中提取产品运输相关的信息,最多30个词汇。\n" +
                "评论: ```{" + review + "}```";

        String message = this.getCompletion(prompt);

        log.info("iterative4:\n{}", message);

产品运输相关的信息:快递提前一天到货。

同时概括多条文本

在实际的工作流中,我们往往有许许多多的评论文本,以下示例将多条用户评价放进列表,并利用 for 循环,使用文本概括(Summarize)提示词,将评价概括至小于 20 词,并按顺序打印。
当然,在实际生产中,对于不同规模的评论文本,除了使用 for 循环以外,还可能需要考虑整合评论、分布式等方法提升运算效率。您可以搭建主控面板,来总结大量用户评论,来方便您或他人快速浏览,还可以点击查看原评论。这样您能高效掌握顾客的所有想法。


        String review = "Needed a nice lamp for my bedroom, and this one had additional storage and not too high of a price point. Got it fast - arrived in 2 days. " +
                "The string to the lamp broke during the transit and the company happily sent over a new one. Came within a few days as well. " +
                "It was easy to put together. Then I had a missing part, so I contacted their support and they very quickly got me the missing piece! Seems to me to be a great company that cares about their customers and products. ";

        String review2 = "My dental hygienist recommended an electric toothbrush, which is why I got this. The battery life seems to be pretty impressive so far. After initial charging and leaving the charger plugged in for the first week to condition the battery, I've unplugged the charger and been using it for twice daily brushing for the last 3 weeks all on the same charge. " +
                "But the toothbrush head is too small. I’ve seen baby toothbrushes bigger than this one. I wish the head was bigger with different length bristles to get between teeth better because this one doesn’t.  " +
                "Overall if you can get this one around the $50 mark, it's a good deal. The manufactuer's replacements heads are pretty expensive, but you can get generic ones that're more reasonably priced. This toothbrush makes me feel like I've been to the dentist every day. My teeth feel sparkly clean!";

        String review3 = "So, they still had the 17 piece system on seasonal sale for around $49 in the month of November, about half off, but for some reason (call it price gouging) around the second week of December the prices all went up to about anywhere from between $70-$89 for the same system. " +
                "And the 11 piece system went up around $10 or so in price also from the earlier sale price of $29. So it looks okay, but if you look at the base, the part where the blade locks into place doesn’t look as good as in previous editions from a few years ago, " +
                "but I plan to be very gentle with it (example, I crush very hard items like beans, ice, rice, etc. in the blender first then pulverize them in the serving size I want in the blender then switch to the whipping blade for a finer flour, and use the cross cutting blade first when making smoothies, then use the flat blade if I need them finer/less pulpy). Special tip when making smoothies, finely cut and freeze the fruits and vegetables (if using spinach-lightly stew soften the spinach then freeze until ready for use-and if making sorbet, use a small to medium sized food processor) that you plan to use that way you can avoid adding so much ice if at all-when making your smoothie. After about a year, the motor was making a funny noise. I called customer service but the warranty expired already, so I had to buy another one. FYI: The overall quality has gone done in these types of products, so they are kind of counting on brand recognition and consumer loyalty to maintain sales. Got it in about two days. ";

        List<String> reviews = Arrays.asList(review, review2, review3);

        for (int i = 0; i < reviews.size(); i++) {

            String prompt = "你的任务是从电子商务网站上的产品评论中提取相关信息。\n" +
                    "    请对三个反引号之间的评论文本进行概括,最多20个词汇。\n" +
                    "    评论文本: ```{" + reviews.get(i) + "}```";

            String message = this.getCompletion(prompt);

            log.info("review {}: {} \n", i, message);
        }


review 0: 这个评论概括为:价格适中,有额外的储物空间,快速送达,公司提供了良好的客户支持。
review 1: 电动牙刷电池寿命长,但刷头太小,需要更长的刷毛。价格合理,使用后牙齿感觉干净。
review 2: 评论概括:产品价格在12月份上涨,底座质量较前几年差,但功能多样,交付速度快。

你可能感兴趣的:(Java-LangChain,java,langchain,python,gpt,chatgpt)