wshzd

LLM之RAG实战（九）| 高级RAG 03：多文档RAG体系结构

在RAG（检索和生成）这样的框架内管理和处理多个文档有很大的挑战。关键不仅在于提取相关内容，还在于选择包含用户查询所寻求的信息的适当文档。基于用户查询对齐的多粒度特性，需要动态选择文档，本文将介绍结构化层次检索来解决多文档RAG问题。

一、Llamaindex结构化检索介绍

Llamaindex支持多层次信息检索。它不只是筛选文档，而是利用元数据过滤来简化选择过程。通过使用自动检索机制，这些过滤器可以根据用户查询检索出最相关的文档。这个过程包括推断语义查询，在矢量数据库中确定最佳过滤器集，有效地将文本到SQL和语义搜索的能力结合起来。

二、结构化层次检索的优点

下面介绍Llamaindex提供的结构化分层检索的一些好处：

增强相关性：通过利用元数据驱动的过滤器，可以准确地识别和检索符合用户查询细微要求的文档。这确保了内容选择中更高的相关性和准确性；
动态文档选择：与传统的静态文档检索是不同，Llamaindex支持动态文档选择。Llamaindex通过根据相关文档的属性和结构化元数据灵活选择相关文档，智能地适应不同的用户查询；
高效信息检索：结构化层次检索显著提高了信息检索的效率。通过将文档预处理到元数据字典中并将其存储在矢量数据库中，该系统简化了检索过程，最大限度地减少了计算开销并优化了搜索效率;
语义查询优化：文本到SQL和语义搜索的融合使系统能够更好地理解用户意图。Llamaindex的自动检索机制将用户查询细化为语义结构，从而能够从文档存储库中精确而细致地检索信息。

三、结构化层次检索代码实现

下面使用Python代码来展示Llamaindex的基本概念，并实现一个结构化的分层检索系统。使用Llamaindex类初始化来管理矢量数据库中的文档元数据。

文档添加：add_document方法通过创建包含摘要和关键字等关键信息的元数据字典，将文档添加到Llamaindex；
检索逻辑：retrieve_documents方法通过将用户查询与矢量数据库中的元数据过滤器进行匹配来处理用户查询。为了演示目的，使用了一个基本的模拟匹配逻辑；
匹配机制：match_metadata方法模拟用户查询和文档元数据之间的匹配过程。这是一个简化的演示逻辑，通常会使用更高级的NLP或语义分析技术。

本示例旨在说明Llamaindex的核心概念，展示如何通过Python中的简化实现来存储文档元数据并基于用户查询检索相关文档。

步骤1：安装库

!pip install llama-index wandb llama_hub weaviate-client --quiet

步骤2：导入库

import osimport openaiimport loggingimport sysfrom IPython.display import Markdown, displayfrom llama_index.llms import OpenAIfrom llama_index.callbacks import CallbackManager, WandbCallbackHandlerfrom llama_index import load_index_from_storageimport pandas as pdfrom llama_index.query_engine import PandasQueryEnginefrom pprint import pprintfrom llama_index import (    VectorStoreIndex,    SimpleKeywordTableIndex,    SimpleDirectoryReader,    StorageContext,    ServiceContext,)import nest_asyncionest_asyncio.apply()#Setup  OPEN API Keyos.environ["OPENAI_API_KEY"] = ""# openai_key = "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" #<--- Your API KEY# openai.api_key = openai_keylogging.basicConfig(stream=sys.stdout, level=logging.INFO)logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))# initialise WandbCallbackHandler and pass any wandb.init argswandb_args = {"project":"llama-index-report"}wandb_callback = WandbCallbackHandler(run_args=wandb_args)# pass wandb_callback to the service contextcallback_manager = CallbackManager([wandb_callback])service_context = ServiceContext.from_defaults(llm=OpenAI(model="gpt-3.5-turbo-0613", temperature=0), chunk_size=1024, callback_manager=callback_manager)

步骤3：下载Github issues

os.environ["GITHUB_TOKEN"] = ""from llama_hub.github_repo_issues import (    GitHubRepositoryIssuesReader,    GitHubIssuesClient,)github_client = GitHubIssuesClient()loader = GitHubRepositoryIssuesReader(    github_client,    owner="run-llama",    repo="llama_index",    verbose=True,)orig_docs = loader.load_data()limit = 100docs = []for idx, doc in enumerate(orig_docs):    doc.metadata["index_id"] = doc.id_    if idx >= limit:        break    docs.append(doc)# OutputFound 100 issues in the repo page 1Resulted in 100 documentsFound 100 issues in the repo page 2Resulted in 200 documentsFound 100 issues in the repo page 3Resulted in 300 documentsFound 8 issues in the repo page 4Resulted in 308 documentsNo more issues found, stopping

from copy import deepcopyimport asynciofrom tqdm.asyncio import tqdm_asynciofrom llama_index import SummaryIndex, Document, ServiceContextfrom llama_index.llms import OpenAIfrom llama_index.async_utils import run_jobsasync def aprocess_doc(doc, include_summary: bool = True):    """Process doc."""    print(f"Processing {doc.id_}")    metadata = doc.metadata    date_tokens = metadata["created_at"].split("T")[0].split("-")    year = int(date_tokens[0])    month = int(date_tokens[1])    day = int(date_tokens[2])    assignee = (        "" if "assignee" not in doc.metadata else doc.metadata["assignee"]    )    size = ""    if len(doc.metadata["labels"]) > 0:        size_arr = [l for l in doc.metadata["labels"] if "size:" in l]        size = size_arr[0].split(":")[1] if len(size_arr) > 0 else ""    new_metadata = {        "state": metadata["state"],        "year": year,        "month": month,        "day": day,        "assignee": assignee,        "size": size,        "index_id": doc.id_,    }    # now extract out summary    summary_index = SummaryIndex.from_documents([doc])    query_str = "Give a one-sentence concise summary of this issue."    query_engine = summary_index.as_query_engine(        service_context=ServiceContext.from_defaults(            llm=OpenAI(model="gpt-3.5-turbo")        )    )    summary_txt = str(query_engine.query(query_str))    new_doc = Document(text=summary_txt, metadata=new_metadata)    return new_docasync def aprocess_docs(docs):    """Process metadata on docs."""    new_docs = []    tasks = []    for doc in docs:        task = aprocess_doc(doc)        tasks.append(task)    new_docs = await run_jobs(tasks, show_progress=True, workers=5)    # new_docs = await tqdm_asyncio.gather(*tasks)    return new_docsnew_docs = await aprocess_docs(docs)# OutputProcessing 9398Processing 9427Processing 9613Processing 9417Processing 9612Processing 8832Processing 9609Processing 9353Processing 9431Processing 9426Processing 9425Processing 9435Processing 9419Processing 9571Processing 9373Processing 9383Processing 9408Processing 9405Processing 9372Processing 9546Processing 9565Processing 9664Processing 9560Processing 9470Processing 9343Processing 9518Processing 9358Processing 8536Processing 9385Processing 9380Processing 9510Processing 9352Processing 9368Processing 7457Processing 8893Processing 9583Processing 9312Processing 7720Processing 9219Processing 9481Processing 9469Processing 9655Processing 9477Processing 9670Processing 9475Processing 9667Processing 9665Processing 9348Processing 9471Processing 9342Processing 9488Processing 9338Processing 9523Processing 9416Processing 7726Processing 9522Processing 9652Processing 9520Processing 9651Processing 7244Processing 9650Processing 9519Processing 9649Processing 9492Processing 9603Processing 9509Processing 9269Processing 9491Processing 8802Processing 9525Processing 9611Processing 9543Processing 8551Processing 9627Processing 9450Processing 9658Processing 9421Processing 9394Processing 9653Processing 9439Processing 9604Processing 9413Processing 9507Processing 9625Processing 9490Processing 9626Processing 9483Processing 9638Processing 7744Processing 9472Processing 8475Processing 9244Processing 9618100%|██████████| 100/100 [02:07<00:00,  1.27s/it]

步骤4：将数据加载到Weaviate Vector Store

from llama_index.vector_stores import WeaviateVectorStorefrom llama_index.storage import StorageContextfrom llama_index import VectorStoreIndeximport weaviate# cloudauth_config = weaviate.AuthApiKey(api_key="")client = weaviate.Client(    "https://.weaviate.network",    auth_client_secret=auth_config,)class_name = "LlamaIndex_auto"vector_store = WeaviateVectorStore(    weaviate_client=client, index_name=class_name)storage_context = StorageContext.from_defaults(vector_store=vector_store)# Since "new_docs" are concise summaries, we can directly feed them as nodes into VectorStoreIndexindex = VectorStoreIndex(new_docs, storage_context=storage_context)docs[0].metadata# Output{'state': 'open', 'created_at': '2023-12-21T20:18:03Z', 'url': 'https://api.github.com/repos/run-llama/llama_index/issues/9655', 'source': 'https://github.com/run-llama/llama_index/pull/9655', 'labels': ['size:L'], 'index_id': '9655'}

步骤5：对原始文档建立Weaviate Index

vector_store = WeaviateVectorStore(    weaviate_client=client, index_name=doc_class_name)storage_context = StorageContext.from_defaults(vector_store=vector_store)doc_index = VectorStoreIndex.from_documents(    docs, storage_context=storage_context)

步骤6：建立自动检索机制

自动检索器的设置过程通过分为以下几个关键步骤：

定义Schema：定义向量数据库模式，包括元数据字段；
VectorIndexAutoRetriever初始化：实例化此类将创建一个利用压缩元数据索引的检索器。需要定义的Schema作为其输入；
创建Wrapper Retriever：该步骤主要将每个节点后处理为IndexNode。此转换包含一个链接回源文档的索引ID，此链接支持在后面的部分中进行递归检索，依靠IndexNode对象与下游检索器、查询引擎或其他节点连接。

6（a）定义Schema

from llama_index.vector_stores.types import MetadataInfo, VectorStoreInfovector_store_info = VectorStoreInfo(    content_info="Github Issues",    metadata_info=[        MetadataInfo(            name="state",            description="Whether the issue is `open` or `closed`",            type="string",        ),        MetadataInfo(            name="year",            description="The year issue was created",            type="integer",        ),        MetadataInfo(            name="month",            description="The month issue was created",            type="integer",        ),        MetadataInfo(            name="day",            description="The day issue was created",            type="integer",        ),        MetadataInfo(            name="assignee",            description="The assignee of the ticket",            type="string",        ),        MetadataInfo(            name="size",            description="How big the issue is (XS, S, M, L, XL, XXL)",            type="string",        ),    ],)

6（b）实例化 VectorIndexAutoRetriever

from llama_index.retrievers import VectorIndexAutoRetrieverretriever = VectorIndexAutoRetriever(    index,    vector_store_info=vector_store_info,    similarity_top_k=2,    empty_query_top_k=10,  # if only metadata filters are specified, this is the limit    verbose=True,)

nodes = retriever.retrieve("Tell me about some issues on 12/11")print(f"Number retrieved: {len(nodes)}")print(nodes[0].metadata)# OutputUsing query str: Using filters: [('month', '==', 12), ('day', '==', 11)]Number retrieved: 6{'state': 'open', 'year': 2023, 'month': 12, 'day': 11, 'assignee': '', 'size': 'XL', 'index_id': '9431'}

6（c）定义Wrapper Retriever

from llama_index.retrievers import BaseRetrieverfrom llama_index.indices.query.schema import QueryBundlefrom llama_index.schema import IndexNode, NodeWithScoreclass IndexAutoRetriever(BaseRetriever):    """Index auto-retriever."""    def __init__(self, retriever: VectorIndexAutoRetriever):        """Init params."""        self.retriever = retriever    def _retrieve(self, query_bundle: QueryBundle):        """Convert nodes to index node."""        retrieved_nodes = self.retriever.retrieve(query_bundle)        new_retrieved_nodes = []        for retrieved_node in retrieved_nodes:            index_id = retrieved_node.metadata["index_id"]            index_node = IndexNode.from_text_node(                retrieved_node.node, index_id=index_id            )            new_retrieved_nodes.append(                NodeWithScore(node=index_node, score=retrieved_node.score)            )        return new_retrieved_nodesindex_retriever = IndexAutoRetriever(retriever=retriever)

步骤7：建立递归检索机制

这种类型的检索器将检索器的每个节点连接到另一个检索器、查询引擎或节点。该设置包括将每个汇总的元数据节点链接到与相应文档对应的RAG管道对齐的检索器。

配置过程如下：

为每个文档定义一个检索器，并把他们添加到字典中；
定义递归检索器：在参数中定义包括root检索器（汇总元数据检索器）和其他文档检索器。

from llama_index.vector_stores.types import (    MetadataFilter,    MetadataFilters,    FilterOperator,)retriever_dict = {}query_engine_dict = {}for doc in docs:    index_id = doc.metadata["index_id"]    # filter for the specific doc id    filters = MetadataFilters(        filters=[            MetadataFilter(                key="index_id", operator=FilterOperator.EQ, value=index_id            ),        ]    )    retriever = doc_index.as_retriever(filters=filters)    query_engine = doc_index.as_query_engine(filters=filters)    retriever_dict[index_id] = retriever    query_engine_dict[index_id] = query_engine

from llama_index.retrievers import RecursiveRetriever# note: can pass `agents` dict as `query_engine_dict` since every agent can be used as a query enginerecursive_retriever = RecursiveRetriever(    "vector",    retriever_dict={"vector": index_retriever, **retriever_dict},    # query_engine_dict=query_engine_dict,    verbose=True,)nodes = recursive_retriever.retrieve("Tell me about some issues on 12/11")print(f"Number of source nodes: {len(nodes)}")nodes[0].node.metadata# OutputRetrieving with query id None: Tell me about some issues on 12/11Using query str: Using filters: [('month', '==', 12), ('day', '==', 11)]Retrieved node with id, entering: 9431Retrieving with query id 9431: Tell me about some issues on 12/11Retrieving text node: Dev awiss# DescriptionTry to use clickhouse as vectorDB.Try to chunk docs with independent parser service.Special designed schema and tricks for better query and retriever. Fixes # (issue)## Type of ChangePlease delete options that are not relevant.- [ ] Bug fix (non-breaking change which fixes an issue)- [ ] New feature (non-breaking change which adds functionality)- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)- [ ] This change requires a documentation update# How Has This Been Tested?Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration- [ ] Added new unit/integration tests- [ ] Added new notebook (that tests end-to-end)- [ ] I stared at the code and made sure it makes sense# Suggested Checklist:- [ ] I have performed a self-review of my own code- [ ] I have commented my code, particularly in hard-to-understand areas- [ ] I have made corresponding changes to the documentation- [ ] I have added Google Colab support for the newly added notebooks.- [ ] My changes generate no new warnings- [ ] I have added tests that prove my fix is effective or that my feature works- [ ] New and existing unit tests pass locally with my changes- [ ] I ran `make format; make lint` to appease the lint godsRetrieved node with id, entering: 9435Retrieving with query id 9435: Tell me about some issues on 12/11Retrieving text node: [Bug]: [nltk_data] Error loading punkt: ### Bug DescriptionI am using a vector Index which connects to a chromaDB client as my database. I have initialized the index as a chat engine. When the query the chat engine, two things happen:1. The response time is nearly 2-3mins.2. It throws the below warning```[nltk_data] Error loading punkt: [nltk_data]     connection attempt failed because the connected party[nltk_data]     did not properly respond after a period of time, or[nltk_data]     established connection failed because connected host[nltk_data]     has failed to respond>```### Version0.9.8.post1### Steps to ReproduceClone, setup and run the below repository: (Follow readme for instructions)https://github.com/umang299/document-gpt### Relevant Logs/Tracbacks_No response_Retrieved node with id, entering: 9426Retrieving with query id 9426: Tell me about some issues on 12/11Retrieving text node: Slack Loader with large lack channels### Question Validation- [X] I have searched both the documentation and discord for an answer.### QuestionHi team,I am using the [Slack Loader ](https://llamahub.ai/l/slack)from Llama Hub. For smaller Slack channels it works fine. However, for larger channels with lots of messages created over months, I keep seeing this message:`Rate limit error reached, sleeping for: 10 seconds`Is there a recommended / idiomatic way to load larger Slack channels to avoid this issue?Retrieved node with id, entering: 9425Retrieving with query id 9425: Tell me about some issues on 12/11Retrieving text node: [Feature Request]: Make llama-index compartible with models finetuned and hosted on modal.com### Feature DescriptionModal.com is a cloud computing service that allows you to finetune and host models on their workers. They provide inference points for any models finetuned on their platform.### ReasonI have not tried implementing the feature. I just read about the capabilities on modal.com and thought it would be a good integration feature for llama-index to allow for more configuration.### Value of FeatureAn integration feature to allow users who host their models on modal to use llama-index for their RAG and prompt engineering pipelines.Retrieved node with id, entering: 9439Retrieving with query id 9439: Tell me about some issues on 12/11Retrieving text node: [Bug]: Metadata filter not working with Elastic search indexing ### Bug DescriptionWhile retrieving from ES with multiple metadatafilter condition(OR/AND) its not taking it into account. It always performs an AND operation even if its explicitly mentioned OR.Example below code should filter and retrieve only 'mafia' or "Stephen King" bit its not doing as expected.filters = MetadataFilters(    filters=[        MetadataFilter(key="theme", value="Mafia"),        MetadataFilter(key="author", value="Stephen King"),    ],    condition=FilterCondition.OR,)retriever = index.as_retriever(filters=filters)### Version0.9.13### Steps to Reproducenodes = [TextNode(text="The Shawshank Redemption",metadata={"author": "Stephen King","theme": "Friendship",},),TextNode(text="The Godfather",metadata={"director": "Francis Ford Coppola","theme": "Mafia",},),TextNode(text="Inception",metadata={"director": "Christopher Nolan",},),]filters = MetadataFilters(    filters=[        MetadataFilter(key="theme", value="Mafia"),        MetadataFilter(key="author", value="Stephen King"),    ],    condition=FilterCondition.OR,)retriever = index.as_retriever(filters=filters)### Relevant Logs/Tracbacks_No response_Retrieved node with id, entering: 9427Retrieving with query id 9427: Tell me about some issues on 12/11Retrieving text node: [Feature Request]: Postgres BM25 support### Feature DescriptionFeature: add a variation of PGVectorStore which uses ParadeDB's BM25 extension.BM25 is now possible in Postgres with a Rust extension [pg_bm25): https://github.com/paradedb/paradedb/tree/dev/pg_bm25Unsure if it might be better to use [pg_search](https://github.com/paradedb/paradedb/tree/dev/pg_search) and get HNSW at the same time..I'm interested in contributing on this myself, but am just starting to look into it. Interested to hear others' thoughts.### ReasonAlthough the code comments for the PGVectorStore class currently suggest BM25 search is present in Postgres - it is not.### Value of FeatureBM25 retrieval hit rate and MRR is measurable better than Postgres full text search with tsvector and tsquery. Indexing is also supposed to be faster with pg_bm25.Number of source nodes: 6{'state': 'open', 'created_at': '2023-12-11T10:17:52Z', 'url': 'https://api.github.com/repos/run-llama/llama_index/issues/9431', 'source': 'https://github.com/run-llama/llama_index/pull/9431', 'labels': ['size:XL'], 'index_id': '9431'}

 
  步骤8：插入RetrieverQueryEngine 
  from llama_index.query_engine import RetrieverQueryEnginefrom llama_index import ServiceContextllm = OpenAI(model="gpt-3.5-turbo")service_context = ServiceContext.from_defaults(llm=llm)query_engine = RetrieverQueryEngine.from_args(recursive_retriever, llm=llm)response = query_engine.query(    "Tell me about some open issues related to agents")print(str(response)) # OutputThere were several issues created on 12/11. One of them is a bug where the metadata filter is not working correctly with Elastic search indexing. Another bug involves an error loading the 'punkt' module in the NLTK library. There are also a couple of feature requests, one for adding Postgres BM25 support and another for making llama-index compatible with models finetuned and hosted on modal.com. Additionally, there is a question about using the Slack Loader with large Slack channels. 
  四、结论 
          总之，将Llamaindex集成到多文档RAG架构的结构中预示着信息检索的新时代。它能够基于结构化元数据动态选择文档，再加上语义查询优化的技巧，重塑了我们如何利用庞大文档存储库中的知识，提高了检索过程的效率、相关性和准确性。 
  参考文献： 
  [1] https://ai.gopubby.com/structured-hierarchical-retrieval-revolutionizing-multi-document-rag-architectures-f101463db689 
  [2] https://weaviate.io/developers/wcs/quickstart 
  [3] https://docs.llamaindex.ai/en/stable/examples/query_engine/multi_doc_auto_retrieval/multi_doc_auto_retrieval.html

10月|愿你的青春不负梦想-读书笔记-01 Tracy的小书斋
本书的作者是俞敏洪，大家都很熟悉他了吧。俞敏洪老师是我行业的领头羊吧，也是我事业上的偶像。本日摘录他书中第一章中的金句：『一个人如果什么目标都没有，就会浑浑噩噩，感觉生命中缺少能量。能给我们能量的，是对未来的期待。第一件事，我始终为了进步而努力。与其追寻全世界的骏马，不如种植丰美的草原，到时骏马自然会来。第二件事，我始终有阶段性的目标。什么东西能给我能量？答案是对未来的期待。』读到这里的时候，我便
《投行人生》读书笔记小蘑菇的树洞
《投行人生》----作者詹姆斯-A-朗德摩根斯坦利副主席40年的职业洞见-很短小精悍的篇幅，比较适合初入职场的新人。第一部分成功的职业生涯需要规划1.情商归为适应能力分享与协作同理心适应能力，更多的是自我意识，你有能力识别自己的情并分辨这些情绪如何影响你的思想和行为。2.对于初入职场的人的建议，细节，截止日期和数据很重要截止日期，一种有效的方法是请老板为你所有的任务进行优先级排序。和老板喝咖啡的好
【一起学Rust | 设计模式】习惯语法——使用借用类型作为参数、格式化拼接字符串、构造函数广龙宇一起学Rust #Rust设计模式 rust 设计模式开发语言
提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录前言一、使用借用类型作为参数二、格式化拼接字符串三、使用构造函数总结前言Rust不是传统的面向对象编程语言，它的所有特性，使其独一无二。因此，学习特定于Rust的设计模式是必要的。本系列文章为作者学习《Rust设计模式》的学习笔记以及自己的见解。因此，本系列文章的结构也与此书的结构相同（后续可能会调成结构），基本上分为三个部分
git常用命令笔记咩酱-小羊 git 笔记
###用习惯了idea总是不记得git的一些常见命令，需要用到的时候总是担心旁边站了人~~~记个笔记@_@，告诉自己看笔记不丢人初始化初始化一个新的Git仓库gitinit配置配置用户信息gitconfig--globaluser.name"YourName"gitconfig--globaluser.email"[email protected]"基本操作克隆远程仓库gitclone查看
LLM 词汇表落难Coder LLMs NLP 大语言模型大模型 llama 人工智能
Contextwindow“上下文窗口”是指语言模型在生成新文本时能够回溯和参考的文本量。这不同于语言模型训练时所使用的大量数据集，而是代表了模型的“工作记忆”。较大的上下文窗口可以让模型理解和响应更复杂和更长的提示，而较小的上下文窗口可能会限制模型处理较长提示或在长时间对话中保持连贯性的能力。Fine-tuning微调是使用额外的数据进一步训练预训练语言模型的过程。这使得模型开始表示和模仿微调数
509. 斐波那契数(每日一题) lzyprime
lzyprime博客(github)创建时间：2021.01.04qq及邮箱：2383518170leetcode笔记题目描述斐波那契数，通常用F(n)表示，形成的序列称为斐波那契数列。该数列由0和1开始，后面的每一项数字都是前面两项数字的和。也就是：F(0)=0，F(1)=1F(n)=F(n-1)+F(n-2)，其中n>1给你n，请计算F(n)。示例1：输入：2输出：1解释：F(2)=F(1)+
拥有断舍离的心态，过精简生活--《断舍离》读书笔记爱吃丸子的小樱桃
不知不觉间房间里的东西越来越多，虽然摆放整齐，但也时常会觉得空间逼仄，令人心生烦闷。抱着断舍离的态度，我开始阅读《断舍离》这本书，希望从书中能找到一些有效的方法，帮助我实现空间、物品上的断舍离。《断舍离》是日本作家山下英子通过自己的经历、思考和实践总结而成的，整体内涵也从刚开始的私人生活哲学的“断舍离”升华成了“人生实践哲学”，接着又成为每个人都能实行的“改变人生的断舍离”，从“哲学”逐渐升华成“
四章-32-点要素的聚合彩云飘过
本文基于腾讯课堂老胡的课《跟我学Openlayers--基础实例详解》做的学习笔记，使用的openlayers5.3.xapi。源码见1032.html，对应的官网示例https://openlayers.org/en/latest/examples/cluster.htmlhttps://openlayers.org/en/latest/examples/earthquake-clusters.
高端密码学院笔记285 柚子_b4b4
高端幸福密码学院（高级班）幸福使者：李华第（598）期《幸福》之回归内在深层生命原动力基础篇——揭秘“激励”成长的喜悦心理案例分析主讲：刘莉一，知识扩充:成功=艰苦劳动+正确方法+少说空话。贪图省力的船夫，目标永远下游。智者的梦再美，也不如愚人实干的脚印。幸福早课堂2020.10.16星期五一笔记:1，重视和珍惜的前提是知道它的价值非常重要，当你珍惜了，你就真正定下来，真正的学到身上。2，大家需要
如何部分格式化提示模板:LangChain中的高级技巧 nseejrukjhad langchain java 服务器 python
标题:如何部分格式化提示模板:LangChain中的高级技巧内容:如何部分格式化提示模板:LangChain中的高级技巧引言在使用大型语言模型(LLM)时,提示工程是一个关键环节。LangChain提供了强大的提示模板功能,让我们能更灵活地构建和管理提示。本文将介绍LangChain中一个高级特性-部分格式化提示模板,这个技巧可以让你的提示管理更加高效和灵活。什么是部分格式化提示模板?部分格式化提
Day17笔记-高阶函数 ~在杰难逃~ Python 笔记 python 开发语言 pycharm 数据分析
高阶函数【重点掌握】函数的本质：函数是一个变量，函数名是一个变量名，一个函数可以作为另一个函数的参数或返回值使用如果A函数作为B函数的参数，B函数调用完成之后，会得到一个结果，则B函数被称为高阶函数常用的高阶函数：map(),reduce(),filter(),sorted()1.map()map(func,iterable)，返回值是一个iterator【容器，迭代器】func:函数iterab
Day1笔记-Python简介&标识符和关键字&输入输出 ~在杰难逃~ Python python 开发语言大数据数据分析数据挖掘
大家好，从今天开始呢，杰哥开展一个新的专栏，当然，数据分析部分也会不定时更新的，这个新的专栏主要是讲解一些Python的基础语法和知识，帮助0基础的小伙伴入门和学习Python，感兴趣的小伙伴可以开始认真学习啦！一、Python简介【了解】1.计算机工作原理编程语言就是用来定义计算机程序的形式语言。我们通过编程语言来编写程序代码，再通过语言处理程序执行向计算机发送指令，让计算机完成对应的工作，编程
人工智能时代，程序员如何保持核心竞争力？ jmoych 人工智能
随着AIGC（如chatgpt、midjourney、claude等）大语言模型接二连三的涌现，AI辅助编程工具日益普及，程序员的工作方式正在发生深刻变革。有人担心AI可能取代部分编程工作，也有人认为AI是提高效率的得力助手。面对这一趋势,程序员应该如何应对?是专注于某个领域深耕细作，还是广泛学习以适应快速变化的技术环境?又或者，我们是否应该将重点转向AI无法轻易替代的软技能？让我们一起探讨程序员
node.js学习小猿L node.js node.js 学习 vim
node.js学习实操及笔记温故node.js，node.js学习实操过程及笔记~node.js学习视频node.js官网node.js中文网实操笔记githubcsdn笔记为什么学node.js可以让别人访问我们编写的网页为后续的框架学习打下基础，三大框架vuereactangular离不开node.jsnode.js是什么官网：node.js是一个开源的、跨平台的运行JavaScript的运行
数据仓库——维度表一致性墨染丶eye 背诵数据仓库
数据仓库基础笔记思维导图已经整理完毕，完整连接为：数据仓库基础知识笔记思维导图维度一致性问题从逻辑层面来看，当一系列星型模型共享一组公共维度时，所涉及的维度称为一致性维度。当维度表存在不一致时，短期的成功难以弥补长期的错误。维度时确保不同过程中信息集成起来实现横向钻取货活动的关键。造成横向钻取失败的原因维度结构的差别，因为维度的差别，分析工作涉及的领域从简单到复杂，但是都是通过复杂的报表来弥补设计
【Git】常见命令(仅笔记) 好想有猫猫 Git Linux学习笔记 git 笔记 elasticsearch linux c++
文章目录创建/初始化本地仓库添加本地仓库配置项提交文件查看仓库状态回退仓库查看日志分支删除文件暂存工作区代码远程仓库使用`.gitigore`文件让git不追踪一些文件标签创建/初始化本地仓库gitinit添加本地仓库配置项gitconfig-l#以列表形式显示配置项gitconfiguser.name"ljh"#配置user.namegitconfiguser.email"[email protected]
OPENAIGC开发者大赛企业组AI黑马奖 | AIGC数智传媒解决方案 RPA中国人工智能 AIGC 传媒
在第二届拯救者杯OPENAIGC开发者大赛中，涌现出一批技术突出、创意卓越的作品。为了让这些优秀项目被更多人看到，我们特意开设了优秀作品报道专栏，旨在展示其独特之处和开发者的精彩故事。无论您是技术专家还是爱好者，希望能带给您不一样的知识和启发。让我们一起探索AIGC的无限可能，见证科技与创意的完美融合！创未来AI应用赛-企业组AI黑马奖作品名称：AIGC数智传媒解决方案参赛团队：深圳市三象智能技术
为什么你总是对下属不满意? ZhaoWu1050
【ZhaoWu的听课笔记】大多数公司，都存在两种问题。我创业四年，更是体会深切。这两种问题就是：老板经常不满意下属的表现；下属总是不知道老板想要什么；虽然这两种问题普遍存在，其实解决方法并不复杂。这节课，我们再聊聊第一个问题：为什么老板经常不满意下属表现?其实，这背后也是一条管理常识。管理学家德鲁克先生早就说过：管理者的任务，不是去改变人。*来自《卓有成效的管理者》只是大多数老板和我一样，都是一边
母亲节如何做小红书营销美橙传媒
小红书的一举一动引起了外界的高度关注。通过爆款笔记和流行话题，我们可以看到“干货”类型的内容在小红书中偏向实用的生活经验共享和生活指南非常受欢迎。根据运营社的分析，这种现象是由小红书用户心智和内容社区背后机制共同决定的。首先，小红书将使用“强搜索”逻辑为用户提供特定的“搜索场景”。在“我必须这样生活”中，大量使用了满足小红书站用户喜好和需求的内容。内容社区自制的高质量内容也吸引了寻找营销新途径的品
读书笔记|《遇见孩子，遇见更好的自己》5 抹茶社长
为人父母意味着放弃自己的过去，不要对以往没有实现的心愿耿耿于怀，只有这样，孩子们才能做回自己。985909803.jpg孩子在与父母保持亲密的同时更需要独立，唯有这样，孩子才会成为孩子，父母才会成其为父母。有耐心的人生往往更幸福，给孩子留点余地。认识到养儿育女是对耐心的考验。为失败做好心理准备，教会孩子控制情绪。了解自己的底线，说到底线，有一点很重要，父母之所以发脾气，真正的原因往往在于他们自己，
基于Python给出的PDF文档转Markdown文档的方法程序媛了了 python pdf 开发语言
注：网上有很多将Markdown文档转为PDF文档的方法，但是却很少有将PDF文档转为Markdown文档的方法。就算有，比如某些网站声称可以将PDF文档转为Markdown文档，尝试过，不太符合自己的要求，而且无法保证文档没有泄露风险。于是本人为了解决这个问题，借助GPT（能使用GPT镜像或者有条件直接使用GPT的，反正能调用GPT接口就行）生成Python代码来完成这个功能。笔记、代码难免存在
BART&BERT Ambition_LAO 深度学习
BART和BERT都是基于Transformer架构的预训练语言模型。模型架构：BERT(BidirectionalEncoderRepresentationsfromTransformers)主要是一个编码器（Encoder）模型，它使用了Transformer的编码器部分来处理输入的文本，并生成文本的表示。BERT特别擅长理解语言的上下文，因为它在预训练阶段使用了掩码语言模型（MLM）任务，即
语文主题教学学习笔记之87 东哥杂谈
“语文主题教学”学习笔记之八十七（0125）今天继续学习小学语文主题教学的实践样态。板块三：教学中体现“书艺”味道。作为四大名著之一的《水浒传》，堪称我国文学宝库之经典。对从《水浒传》中摘选的单元，教师就要了解其原生态，即评书体特点。这也要求教师要了解一些常用的评书行话术语，然后在教学时适时地加入一些，让学生体味其文本中原有的特色。学生也要尽可能地通过朗读的方式，而不单是分析讲解的方式进行学习。细
Armv8.3 体系结构扩展--原文版代码改变世界ctw ARM-TEE-Android armv8 嵌入式 arm架构安全架构芯片 Trustzone Secureboot
快速链接:.ARMv8/ARMv9架构入门到精通-[目录]付费专栏-付费课程【购买须知】:个人博客笔记导读目录(全部)TheArmv8.3architectureextensionTheArmv8.3architectureextensionisanextensiontoArmv8.2.Itaddsmandatoryandoptionalarchitecturalfeatures.Somefeat
springboot+vue项目实战一-创建SpringBoot简单项目苹果酱0567 面试题汇总与解析 spring boot 后端 java 中间件开发语言
这段时间抽空给女朋友搭建一个个人博客，想着记录一下建站的过程，就当做笔记吧。虽然复制zjblog只要一个小时就可以搞定一个网站，或者用cms系统，三四个小时就可以做出一个前后台都有的网站，而且想做成啥样也都行。但是就是要从新做，自己做的意义不一样，更何况，俺就是专门干这个的，嘿嘿嘿要做一个网站，而且从零开始，首先呢就是技术选型了，经过一番思量决定选择-SpringBoot做后端，前端使用Vue做一
阅读《认知觉醒》读书笔记就看看书
本周阅读了周岭的《认知觉醒开启自我改变的原动力》，启发较多，故做读书笔记一则，留待学习。全书共八章，讲述了大脑、潜意识、元认知、专注力、学习力、行动力、情绪力及成本最低的成长之道。具体描述了大脑、焦虑、耐心、模糊、感性、元认知、自控力、专注力、情绪专注、学习专注、匹配、深度、关联、体系、打卡、反馈、休息、清晰、傻瓜、行动、心智宽带、单一视角、游戏心态、早起、冥想、阅读、写作、运动等相关知识点。大脑
阅读笔记：阅读方法中的逻辑和转念施吉涛
聊聊一些阅读的方法论吧，别人家的读书方法刚开始想写，然后就不知道写什么了，因为作者写的非常的“精致”我有一种乡巴佬进城的感觉，看到精美的摆盘，精致的食材不知道该如何下口也就是《阅读的方法》，我们姑且来试一下强劲的大脑篇，第一节：逻辑通俗的来讲，也就是表达的排列和顺序，再进一步就是因果关系和关联实际上书已经看了大概一遍，但直到打算写一下笔记的时候，才发现作者讲的推理更多的是阅读的对象中呈现出的逻辑也
《转介绍方法论》学习笔记小可乐的妈妈
一、高效转介绍的流程：价值观---执行----方案一）转介绍发生的背景：1、对象：谁向谁转介绍？全员营销，人人参与。①员工的激励政策、客户的转介绍诱因制作客户画像：a信任；支付能力；意愿度；便利度（根据家长具备四个特征的个数分为四类）B性格分类C职业分类D年龄性别②执行：套路，策略，方法，流程2、诱因：为什么要转介绍？认同信任；多方共赢；传递美好；零风险承诺打动人心，超越期待。选择做教育，就是选择
JAVA学习笔记之23种设计模式学习 victorfreedom Java技术设计模式 android java 常用设计模式
博主最近买了《设计模式》这本书来学习，无奈这本书是以C++语言为基础进行说明，整个学习流程下来效率不是很高，虽然有的设计模式通俗易懂，但感觉还是没有充分的掌握了所有的设计模式。于是博主百度了一番，发现有大神写过了这方面的问题，于是博主迅速拿来学习。一、设计模式的分类总体来说设计模式分为三大类：创建型模式，共五种：工厂方法模式、抽象工厂模式、单例模式、建造者模式、原型模式。结构型模式，共七种：适配器
解决Obsidian写笔记中的＜img＞标签无法显示图片的问题全能全知者笔记
Obsidian中写md笔记如果使用标签会显示不出图案，后来才知道因为Obsidian的问题导致只能用绝对路径定位。所以我本人写了一个py插件，将md笔记里的img标签批量替换成Obsidian能够读取的形式。安装FixObsImgDpy:pipinstallFixObsImgDpy安装完成后在需要修复的md文件的父目录下运行命令:FixObsImgDpy就会自动修复父目录以下的全部md文件仓库
PHP，安卓，UI，java，linux视频教程合集 cocos2d-x小菜 java UI PHP android linux
╔-----------------------------------╗┆
各表中的列名必须唯一。在表 'dbo.XXX' 中多次指定了列名 'XXX'。 bozch .net .net mvc
在.net mvc5中，在执行某一操作的时候，出现了如下错误：各表中的列名必须唯一。在表 'dbo.XXX' 中多次指定了列名 'XXX'。经查询当前的操作与错误内容无关，经过对错误信息的排查发现，事故出现在数据库迁移上。回想过去：在迁移之前已经对数据库进行了添加字段操作，再次进行迁移插入XXX字段的时候，就会提示如上错误。 &
Java 对象大小的计算 e200702084 java
Java对象的大小如何计算一个对象的大小呢？
Mybatis Spring 171815164 mybatis
ApplicationContext ac = new ClassPathXmlApplicationContext("applicationContext.xml"); CustomerService userService = (CustomerService) ac.getBean("customerService"); Customer cust
JVM 不稳定参数 g21121 jvm
-XX 参数被称为不稳定参数，之所以这么叫是因为此类参数的设置很容易引起JVM 性能上的差异，使JVM 存在极大的不稳定性。当然这是在非合理设置的前提下，如果此类参数设置合理讲大大提高JVM 的性能及稳定性。可以说“不稳定参数”
用户自动登录网站永夜-极光用户
1.目标:实现用户登录后,再次登录就自动登录,无需用户名和密码 2.思路:将用户的信息保存为cookie 每次用户访问网站,通过filter拦截所有请求,在filter中读取所有的cookie,如果找到了保存登录信息的cookie,那么在cookie中读取登录信息,然后直接
centos7 安装后失去win7的引导记录程序员是怎么炼成的操作系统
1.使用root身份(必须)打开 /boot/grub2/grub.cfg 2.找到 ### BEGIN /etc/grub.d/30_os-prober ### 在后面添加 menuentry "Windows 7 (loader) (on /dev/sda1)" {
Oracle 10g 官方中文安装帮助文档以及Oracle官方中文教程文档下载 aijuans oracle
Oracle 10g 官方中文安装帮助文档下载：http://download.csdn.net/tag/Oracle%E4%B8%AD%E6%96%87API%EF%BC%8COracle%E4%B8%AD%E6%96%87%E6%96%87%E6%A1%A3%EF%BC%8Coracle%E5%AD%A6%E4%B9%A0%E6%96%87%E6%A1%A3 Oracle 10g 官方中文教程
JavaEE开源快速开发平台G4Studio_V3.2发布了無為子 AOP oracle mysql javaee G4Studio
我非常高兴地宣布,今天我们最新的JavaEE开源快速开发平台G4Studio_V3.2版本已经正式发布。大家可以通过如下地址下载。访问G4Studio网站 http://www.g4it.org G4Studio_V3.2版本变更日志功能新增 (1).新增了系统右下角滑出提示窗口功能。 (2).新增了文件资源的Zip压缩和解压缩
Oracle常用的单行函数应用技巧总结百合不是茶日期函数转换函数(核心)数字函数通用函数(核心)字符函数
单行函数; 字符函数,数字函数,日期函数,转换函数(核心),通用函数(核心) 一:字符函数: .UPPER(字符串) 将字符串转为大写 .LOWER (字符串) 将字符串转为小写 .INITCAP(字符串) 将首字母大写 .LENGTH (字符串) 字符串的长度 .REPLACE(字符串,'A','_') 将字符串字符A转换成_
Mockito异常测试实例 bijian1013 java 单元测试 mockito
Mockito异常测试实例： package com.bijian.study; import static org.mockito.Mockito.mock; import static org.mockito.Mockito.when; import org.junit.Assert; import org.junit.Test; import org.mockito.
GA与量子恒道统计 Bill_chen JavaScript 浏览器百度 Google 防火墙
前一阵子，统计**网址时，Google Analytics（GA）和量子恒道统计（也称量子统计），数据有较大的偏差，仔细找相关资料研究了下，总结如下：为何GA和量子网站统计（量子统计前身为雅虎统计）结果不同？首先：没有一种网站统计工具能保证百分之百的准确出现该问题可能有以下几个原因：（1）不同的统计分析系统的算法机制不同；（2）统计代码放置的位置和前后
【Linux命令三】Top命令 bit1129 linux命令
Linux的Top命令类似于Windows的任务管理器，可以查看当前系统的运行情况，包括CPU、内存的使用情况等。如下是一个Top命令的执行结果： top - 21:22:04 up 1 day, 23:49, 1 user, load average: 1.10, 1.66, 1.99 Tasks: 202 total, 4 running, 198 sl
spring四种依赖注入方式白糖_ spring
平常的java开发中，程序员在某个类中需要依赖其它类的方法，则通常是new一个依赖类再调用类实例的方法，这种开发存在的问题是new的类实例不好统一管理，spring提出了依赖注入的思想，即依赖类不由程序员实例化，而是通过spring容器帮我们new指定实例并且将实例注入到需要该对象的类中。依赖注入的另一种说法是“控制反转”，通俗的理解是：平常我们new一个实例，这个实例的控制权是我
angular.injector boyitech AngularJS AngularJS API
angular.injector 描述: 创建一个injector对象, 调用injector对象的方法可以获得angular的service, 或者用来做依赖注入. 使用方法: angular.injector(modules, [strictDi]) 参数详解: Param Type Details mod
java-同步访问一个数组Integer[10]，生产者不断地往数组放入整数1000，数组满时等待；消费者不断地将数组里面的数置零，数组空时等待 bylijinnan Integer
public class PC { /** * 题目：生产者-消费者。 * 同步访问一个数组Integer[10]，生产者不断地往数组放入整数1000，数组满时等待；消费者不断地将数组里面的数置零，数组空时等待。 */ private static final Integer[] val=new Integer[10]; private static
使用Struts2.2.1配置 Chen.H apache spring Web xml struts
Struts2.2.1 需要如下 jar包: commons-fileupload-1.2.1.jar commons-io-1.3.2.jar commons-logging-1.0.4.jar freemarker-2.3.16.jar javassist-3.7.ga.jar ognl-3.0.jar spring.jar struts2-core-2.2.1.jar struts2-sp
[职业与教育]青春之歌 comsci 教育
每个人都有自己的青春之歌............但是我要说的却不是青春... 大家如果在自己的职业生涯没有给自己以后创业留一点点机会,仅仅凭学历和人脉关系,是难以在竞争激烈的市场中生存下去的.... &nbs
oracle连接(join)中使用using关键字 daizj JOIN oracle sql using
在oracle连接(join)中使用using关键字 34. View the Exhibit and examine the structure of the ORDERS and ORDER_ITEMS tables. Evaluate the following SQL statement: SELECT oi.order_id, product_id, order_date FRO
NIO示例 daysinsun nio
NIO服务端代码： public class NIOServer { private Selector selector; public void startServer(int port) throws IOException { ServerSocketChannel serverChannel = ServerSocketChannel.open(
C语言学习homework1 dcj3sjt126com c homework
0、课堂练习做完 1、使用sizeof计算出你所知道的所有的类型占用的空间。 int x; sizeof(x); sizeof(int); # include <stdio.h> int main(void) { int x1; char x2; double x3; float x4; printf(&quo
select in order by , mysql排序 dcj3sjt126com mysql
If i select like this: SELECT id FROM users WHERE id IN(3,4,8,1); This by default will select users in this order 1,3,4,8, I would like to select them in the same order that i put IN() values so:
页面校验-新建项目 fanxiaolong 页面校验
$(document).ready( function() { var flag = true; $('#changeform').submit(function() { var projectScValNull = true; var s =""; var parent_id = $("#parent_id").v
Ehcache（02）——ehcache.xml简介 234390216 ehcache ehcache.xml 简介
ehcache.xml简介 ehcache.xml文件是用来定义Ehcache的配置信息的，更准确的来说它是定义CacheManager的配置信息的。根据之前我们在《Ehcache简介》一文中对CacheManager的介绍我们知道一切Ehcache的应用都是从CacheManager开始的。在不指定配置信
junit 4.11中三个新功能 jackyrong java
junit 4.11中两个新增的功能，首先是注解中可以参数化，比如 import static org.junit.Assert.assertEquals; import java.util.Arrays; import org.junit.Test; import org.junit.runner.RunWith; import org.junit.runn
国外程序员爱用苹果Mac电脑的10大理由 php教程分享 windows PHP unix Microsoft perl
Mac 在国外很受欢迎，尤其是在设计/web开发/IT 人员圈子里。普通用户喜欢 Mac 可以理解，毕竟 Mac 设计美观，简单好用，没有病毒。那么为什么专业人士也对 Mac 情有独钟呢？从个人使用经验来看我想有下面几个原因： 1、Mac OS X 是基于 Unix 的这一点太重要了，尤其是对开发人员，至少对于我来说很重要，这意味着Unix 下一堆好用的工具都可以随手捡到。如果你是个 wi
位运算、异或的实际应用 wenjinglian 位运算
一．位操作基础，用一张表描述位操作符的应用规则并详细解释。二．常用位操作小技巧，有判断奇偶、交换两数、变换符号、求绝对值。三．位操作与空间压缩，针对筛素数进行空间压缩。 &n
weblogic部署项目出现的一些问题（持续补充中……） Everyday都不同 weblogic部署失败
好吧，weblogic的问题确实…… 问题一： org.springframework.beans.factory.BeanDefinitionStoreException: Failed to read candidate component class: URL [zip:E:/weblogic/user_projects/domains/base_domain/serve
tomcat7性能调优（01） toknowme tomcat7
Tomcat优化： 1、最大连接数最大线程等设置 <Connector port="8082" protocol="HTTP/1.1" useBodyEncodingForURI="t
PO VO DAO DTO BO TO概念与区别 xp9802 java DAO 设计模式 bean 领域模型
O/R Mapping 是 Object Relational Mapping（对象关系映射）的缩写。通俗点讲，就是将对象与关系数据库绑定，用对象来表示关系数据。在O/R Mapping的世界里，有两个基本的也是重要的东东需要了解，即VO，PO。它们的关系应该是相互独立的，一个VO可以只是PO的部分，也可以是多个PO构成，同样也可以等同于一个PO（指的是他们的属性）。这样，PO独立出来，数据持

LLM之RAG实战（九）| 高级RAG 03：多文档RAG体系结构

一、Llamaindex结构化检索介绍

二、结构化层次检索的优点

三、结构化层次检索代码实现

四、结论

参考文献：

你可能感兴趣的:(ChatGPT,笔记,chatgpt,语言模型,AIGC)