Introduction to Data Mining-Introduction

Introduction

Data Mining is a technology that blends traditional data analysis methods with sophisticated algorithms for precessing large volume of data. It also has opened up exciting opportunities for exploring and analyzing new types of data and for analyzing old types of data in new ways.

1. What is data mining?

Data Mining is the process of automatically discovering useful information o large data repository. 

Data Mming aims to 1. find novel and useful patterns and 2.provide capability of predicting.

Data Mining is an integral part of konwledge discover in database with following steps:

Input Data → Data Prepreocess → Data Mining → Postprocess → Information

Preprocess is used to transform data into an approriate format for subsequent analysis; postprocessing is used to filter only useful information and this step may involve statistical measure or hypothesis test

2.Motivating Challenges

a. Scalabiity require efficiency.

b. High dimensionality

c. Heterogeneous and Complex Data: data contains more than one type of atrtributes or even unstructured data.

d.Data Ownership and Distribution

e. Non-traditional analysis: the traditional analysis paradigm is based  on hypothesize-and-test method but it is too labour intensive.

3. The origin of Data Mining

Data mining draws ideas from statistics, artificial intelligence, pattern recognition and machine learning.

4. Data Mining Task

Data mining include two tasks: 1. Predictive tasks, 2. Descriptive tasks(descriptive data mining tasks are ofter exploratory in nature and frequently require postprecess techniques to vaild and explain the results)

Predictive tasks refer to classificaton and regression while descriptive tasks refer to association analysis, cluster analysis and anomaly detection

你可能感兴趣的:(Introduction to Data Mining-Introduction)