QASystem_onEconomyKG(1)

Knowledge graph domains and data sources

Specific domain: Finance

At present, the application of knowledge graph in the financial field is the hottest, involving key aspects of financial risk control, marketing, and forecasting.

In the entire technology chain, the knowledge graph is at the core position, which can be said to be another qualitative leap since the electronlisatioin of financial statements. Knowledge graph is a necessary part of financial data analysis from simple quantitative model to more complex value judgment and risk assessment. It can gradually transform human experience and contacts into reusable, evolving, verifiable and transmissible knowledge model.

 

Data sources:

  • Unstructured data
  • Structured database
  • Semi-structured data
  • Multimedia data
  • Multimodal data

Data Website:

Gold headline: https://link.zhihu.com/?target=https%3A//goldtoutiao.com/

Sina Finance: http://finance.sina.com.cn/mac/

 

Application scenario of knowledge graph

  1. Auxiliary search->Precise answer
  1. Improve search accuracy
  2. Semantic search
  3. Search intent understanding
  4. Multimodal search

 

  1. Auxiliary question and answer->Human-Computer Interaction
  1. Improve QA accuracy
  2. Improve QA experience
  3. Guide the dialogue
  4. Multiple rounds of dialogue

 

  1. Auxiliary data integration->Intelligent data integration
  1. For large-scale heterogeneous data integration mechanisms
  2. Establish and mine data intrinsic associations
  3. High versatility, scalability and flexibility
  4. Knowledge reuse, non-interference data source

 

  1. Auxiliary decision-> Intelligent decision
  1. Collect and organize data
  2. Establish data association
  3. Knowledge mining, discovery, reasoning
  4. Knowledge graph assist NLP
  5. Widely used in national defense, finance, manufacturing, business, government

 

Techniques involved

The process of constructing the knowledge graph is based on the original data, using a series of automatic or semi-automatic techniques to extract the knowledge elements from the original data and store them in the data layer and pattern layer of the knowledge base.

This is an iterative update process, each round of iteration consists of three phases: knowledge extraction, knowledge fusion, and knowledge processing.

  1. Knowledge extraction

Knowledge extraction is mainly for open link data, and the usual input is natural language text or multimedia documents. Then, through the automated or semi-automated technology, the available knowledge units are extracted. The knowledge unit mainly includes three knowledge elements of entity, relationship and attribute, and based on this, forms a series of high-quality factual expressions, laying the foundation for constructing the pattern layer.

 

  1. Entity extraction

Entity extraction, also known as named entity identification, refers to the automatic identification of named entities from raw data corpora. Since the entity is the most basic element in the knowledge map, the integrity, accuracy, and recall rate of the extraction will directly affect the quality of the knowledge graph construction.

 

Entity extraction methods:

  1. Extract based on encyclopedic sites or vertical sites
  2. Rule-based and dictionary-based approach
  3. Method based on statistical machine learning
  4. Open domain-oriented extraction method

 

  1. Attribute extraction

The goal of attribute extraction is to collect attribute information of a specific entity from different information sources, such as nickname, birthday, nationality, etc.

 

  1. Relationship extraction

The basic problem of relationship extraction techniques is how to extract inter-entity relationships from text corpora.

 

  1. Knowledge fusion

Knowledge fusion consists of two parts: entity linking and knowledge merges.

The purpose of knowledge fusion is to eliminate the ambiguity of concepts and eliminate redundant and wrong concepts to ensure the quality of knowledge.

 

  1. Knowledge process

The facts themselves are not equal to knowledge. To finally obtain a structured and networked knowledge system, we also need to go through the process of knowledge processing. Knowledge processing mainly includes three aspects: ontology construction, knowledge reasoning and quality assessment.

 

  1. Ontology construction

The ontology can be manually constructed by manual editing, or it can be automatically created in a data-driven manner by computer aiding, and then modified and confirmed by a combination of algorithm evaluation and manual review. For specific areas, domain experts can be used to build ontology manually.

 

  1. Knowledge reasoning

Knowledge reasoning refers to starting from the existing entity relationship data in the knowledge base, and through computer reasoning, establishing new associations between entities, thereby expanding and enriching the knowledge network.

Methods of knowledge reasoning:

  1. logic-based reasoning
  2. graph-based reasoning.

 

  1. Quality assessment

Quality assessment is also an important part of the knowledge base building technology. Due to the limitations of the prior art, elements obtained by using open domain information extraction technology may have errors (such as entity identification errors, relationship extraction errors, etc.), and a process of quality assessment is required.

  1. Knowledge update

The amount of knowledge possessed by human beings is a monotonically increasing function of time. Therefore, the content of the knowledge graph needs to keep pace with the times. The construction process is a process of continuous iterative updating.

Logically, the update of the knowledge base includes the update of the concept layer and the update of the data layer.

There are two ways to update the content of the knowledge graph:

  • data-driven full updates
  • incremental updates.

 

Summary design

  1. Task one

Challenges

One of the challenging problems is how to build a suitable data model. To do that, we need to collect large amounts of financial data from the chose direction. The next step is to choose one method of presenting data in a way of both human and computer can understand. In this project, we use RDF (Resource Description Framework) as the way of presenting data. About database, we use neo4j, one of the most popular graph databases. To construct knowledge, we need to use py2neo module.

 

 

  1. Task two

Preliminary design(概要设计)

The question answer system we aim to build is aimed at solving English text question, it does not provide the function of analyzing pictures. Firstly, we need to construct knowledge graphs to store information with entities and relationships between them. To store the knowledge graph, we’ll use Neo4j, one of the most popular graph databases.

 

- System architecture(系统架构)

As in shown in Figure one, the system consists of four modules, the construction of answer data set module, the question analysis module, Graph Retrieval & Analysis module and Answer Analysis module. Firstly, we need to construct an answer data set based on knowledge graph as the source of answer. We need to process question and extract the target information. Then query the database and retrieval concerning graph after analyses. Based on the retrieval graph, construct a suitable answer for the question.

uploading.4e448015.gif转存失败重新上传取消

Figure one System Architecture

 

- key modules and their functions(模块及其功能)

Construct answer data set

This module is the process of constructing the knowledge graph data set for answers, it is the base of the whole system.

Question analysis

To answer questions, we need to analysis and extract the key information of question. Extracting the entities of the question and their relations is the key step of this module.

 

Graph Retrieval & Analysis

After extracting the key information out of the question, we need to analysis and extract the targeted graph out of data set.

Answer Analysis

For the retrieved graph, we need to construct a suitable and grammatically correct answer.

 

- key techniques and challenges (关键技术及挑战)

Natural language processing

To extract the key information from question and construct a suitable answer, NLP techniques in required.

Graph Search

Differ from the SQL query techniques used in relational database, we use graph search techniques in graph database. For us, this is a completely unknown field, much effort is required.

 

 

你可能感兴趣的:(知识图谱)