Specific domain: Finance
At present, the application of knowledge graph in the financial field is the hottest, involving key aspects of financial risk control, marketing, and forecasting.
In the entire technology chain, the knowledge graph is at the core position, which can be said to be another qualitative leap since the electronlisatioin of financial statements. Knowledge graph is a necessary part of financial data analysis from simple quantitative model to more complex value judgment and risk assessment. It can gradually transform human experience and contacts into reusable, evolving, verifiable and transmissible knowledge model.
Data sources:
Data Website:
Gold headline: https://link.zhihu.com/?target=https%3A//goldtoutiao.com/
Sina Finance: http://finance.sina.com.cn/mac/
The process of constructing the knowledge graph is based on the original data, using a series of automatic or semi-automatic techniques to extract the knowledge elements from the original data and store them in the data layer and pattern layer of the knowledge base.
This is an iterative update process, each round of iteration consists of three phases: knowledge extraction, knowledge fusion, and knowledge processing.
Knowledge extraction is mainly for open link data, and the usual input is natural language text or multimedia documents. Then, through the automated or semi-automated technology, the available knowledge units are extracted. The knowledge unit mainly includes three knowledge elements of entity, relationship and attribute, and based on this, forms a series of high-quality factual expressions, laying the foundation for constructing the pattern layer.
Entity extraction, also known as named entity identification, refers to the automatic identification of named entities from raw data corpora. Since the entity is the most basic element in the knowledge map, the integrity, accuracy, and recall rate of the extraction will directly affect the quality of the knowledge graph construction.
Entity extraction methods:
The goal of attribute extraction is to collect attribute information of a specific entity from different information sources, such as nickname, birthday, nationality, etc.
The basic problem of relationship extraction techniques is how to extract inter-entity relationships from text corpora.
Knowledge fusion consists of two parts: entity linking and knowledge merges.
The purpose of knowledge fusion is to eliminate the ambiguity of concepts and eliminate redundant and wrong concepts to ensure the quality of knowledge.
The facts themselves are not equal to knowledge. To finally obtain a structured and networked knowledge system, we also need to go through the process of knowledge processing. Knowledge processing mainly includes three aspects: ontology construction, knowledge reasoning and quality assessment.
The ontology can be manually constructed by manual editing, or it can be automatically created in a data-driven manner by computer aiding, and then modified and confirmed by a combination of algorithm evaluation and manual review. For specific areas, domain experts can be used to build ontology manually.
Knowledge reasoning refers to starting from the existing entity relationship data in the knowledge base, and through computer reasoning, establishing new associations between entities, thereby expanding and enriching the knowledge network.
Methods of knowledge reasoning:
Quality assessment is also an important part of the knowledge base building technology. Due to the limitations of the prior art, elements obtained by using open domain information extraction technology may have errors (such as entity identification errors, relationship extraction errors, etc.), and a process of quality assessment is required.
The amount of knowledge possessed by human beings is a monotonically increasing function of time. Therefore, the content of the knowledge graph needs to keep pace with the times. The construction process is a process of continuous iterative updating.
Logically, the update of the knowledge base includes the update of the concept layer and the update of the data layer.
There are two ways to update the content of the knowledge graph:
Challenges
One of the challenging problems is how to build a suitable data model. To do that, we need to collect large amounts of financial data from the chose direction. The next step is to choose one method of presenting data in a way of both human and computer can understand. In this project, we use RDF (Resource Description Framework) as the way of presenting data. About database, we use neo4j, one of the most popular graph databases. To construct knowledge, we need to use py2neo module.
Preliminary design(概要设计)
The question answer system we aim to build is aimed at solving English text question, it does not provide the function of analyzing pictures. Firstly, we need to construct knowledge graphs to store information with entities and relationships between them. To store the knowledge graph, we’ll use Neo4j, one of the most popular graph databases.
- System architecture(系统架构)
As in shown in Figure one, the system consists of four modules, the construction of answer data set module, the question analysis module, Graph Retrieval & Analysis module and Answer Analysis module. Firstly, we need to construct an answer data set based on knowledge graph as the source of answer. We need to process question and extract the target information. Then query the database and retrieval concerning graph after analyses. Based on the retrieval graph, construct a suitable answer for the question.
转存失败重新上传取消
Figure one System Architecture
Construct answer data set
This module is the process of constructing the knowledge graph data set for answers, it is the base of the whole system.
Question analysis
To answer questions, we need to analysis and extract the key information of question. Extracting the entities of the question and their relations is the key step of this module.
Graph Retrieval & Analysis
After extracting the key information out of the question, we need to analysis and extract the targeted graph out of data set.
Answer Analysis
For the retrieved graph, we need to construct a suitable and grammatically correct answer.
Natural language processing
To extract the key information from question and construct a suitable answer, NLP techniques in required.
Graph Search
Differ from the SQL query techniques used in relational database, we use graph search techniques in graph database. For us, this is a completely unknown field, much effort is required.