2018-06-06

数据挖掘技术在医学数据中的应用
中文摘要
随着大数据技术与人工智能技术的发展,数据挖掘技术被应用在越来越多的领域之中,其中不乏金融、教育、医疗等行业。其中,在医疗行业的应用上又包括精准医疗、基因工程、基因测序等学科前沿领域中。本文则是以数据挖掘的模型算法在医学临床数据和医院信息系统数据中所发挥的作用进行了论述。
数据挖掘技术在医学数据中应用的目的是从大量的医学数据中挖掘出潜在的且与致病有关的因素,并且在此过程中获取到更多的信息、模型、关联规则等,将这些挖掘出的成果应用于临床,从而能够帮助医生进行更快更准的疾病判断。本文的主要工作如下:
首先,本文第二章详细阐述了医学数据的特点以及常用的数据挖掘算法的理论基础,方法结构。还介绍了各种数据挖掘模型的简单解释。
其次,本文主要通过一个乳腺癌相关的医学数据集,探索了数据挖掘中的logistic回归分析预测和随机森林(决策树)分类预测技术在医学数据上的分类功能。并在分类结果上取得较好的分类精确度。之后可以作为辅助医生的一种诊断方案,对被预测得乳腺癌概率较高的患者可以重点观察,重点诊断。
最后,本文对两个数据集中所得出的分类和预测结果进行解释说明,并提出相关的对策和改进意见。并在文末提出了关于本文的不足与将来进行改进的方向。

关键词:数据挖掘;回归分析;决策树;乳腺癌

The application of data mining technology in medical data.
Abstract in Chinese
The application of data mining has become a hot topic with the development of big data technology and Artificial Intelligence Technology, and it has been applied in a great many fields, such as financial industry, educational industry, healthcare industry and other industries. Among them, the application of healthcare industry covers precision medicine, gene engineering,gene sequencing and other frontier fields . This article fully discusses the role of model algorithm of data mining in medical clinical data and hospital information system data.
The purpose of data mining technology applied in the medical data is to dig out the potential factors that are related to the disease from a large number of medical data, and to get more information, models, association rules and so on from the process. the excavated achievements are used for clinical medicine ,which can help doctors to judge disease faster and more accurate . The main work of this article is as follows:
First of all, the second chapter ot this article elaborates the characteristics of medical data and common theoretical basis and method structure of data mining algorithms. A brief explanation of various data mining models is also introduced.
Secondly, this article mainly explores the classificatory function of the logistic regression analysis and random forest (decision tree) in data mining ,through a breast cancer related medical data sets . Moreover, the classification results acquireed better classification accuracy. It can be used as a diagnostic program to assist doctors to concentrate on observating patients with a higher probability of breast cancer.
Finally, this article makes an explaination for the classification and prediction results of two data sets, and puts forward relevant countermeasures and suggestions. At the end of the article, the author comes up with the deficiency and the direction of the future improvement.

Key words: Data mining; Regression analysis; Decision tree; Breast cancer

你可能感兴趣的:(2018-06-06)