Database Analysis & Decision Support
Market analysis & management
Target marketing, customer relationship management, market basket analysis, cross selling, market segmentation
Risk analysis and management
Forecasting, customer retention, improved underwriting, quality control, competitive analysis
Fraud detection and management
Other applications
Text mining and web analysis
Intelligent query answering
Market Analysis & Management
Data sources?
credit card transactions, loyalty cards, discount coupons, customer complaint calls, social media, plus (public) lifestyle studies
Target marketing
find clusters of 'model' customers who share same characteristics: interest, income level, spending habits, etc
Determine customer purchasing patterns over time
conversion of sign to joint bank account: marriage ...
Cross-market analysis
associations / co-relations between product sales
prediction based on the association information
Customer profiling
data analytics can tell you what types of customers buy what products (clustering or classification)
Identifying customer requirements
identify the best products for different customers
user prediction to find what factors will attract new customers
Provide summary information
Various multidimensional summary reports
Statistical summary information (mean and variance ...)
Corporate Analysis & Risk Management
Finance planning and asset evaluation
Cash flow analysis and prediction
Contingent claim analysis to evaluate assets
Cross-sectional and time series analysis (financial-ratio, trend analysis, ...)
Resource planning
summarise and compare the resources and spending
Competition
Monitor (predict) competitors and market directions
group customers into classes and a class-based pricing procedure
set pricing strategy in a highly competitive market
Fraud Detection & Management
Applications
health care, retail, credit card services, telecommunications (phone card fraud) ..
Approach
use historical data to build models of fraudulent behaviour and use data mining to help identify similar instances.
Examples
Auto insurance: detect groups of people who stage accidents to collect on insurance
Money laundering: detect suspicious money transactions
Medical insurance: detect professional patients and rings of doctors and rings of references
Other applications
Sports
Moneyball
Astronomy
JPL and the Palomar Observatory discovered 22 quasars using data analytics
KDD process: knowledge process database
Learn the application domain (prior knowledge & goals)
Create target data set: data selection
Data cleaning and preprocessing
Data reduction and transformation
Find useful features, dimensionality/variable reduction, invariant representation
Choose functions of data mining: the 'data mining problem'
Summarisation, classification, regression, association, clustering
Choose the data mining algorithms
Data mining: find pattern of interest
Pattern evaluation and knowledge presentation
Visualisation, transformation, remove redundant patterns, ...
Use of discovered knowledge
CRISP-DM methodology: CRoss-Industry Standard Process for Data Mining
Business Understanding
Determine business objectives
Assess situation
Determine data mining goals
Produce project plan
Data Understanding
Collect initial data
Describe data
Data description report
Explore data
What is immediately obvious?
Verify data quality
What problems with the data? Sometimes called a data audit
Data Preparation
Select data
What pieces of data are needed and why?
Clean data
Deal with the data quality problems found earlier. Maybe 60+% of effort
Construct data
May need to create new instances and / or attributes.
Integrate data
May need to combine data from different tables or records into the one table or record
Format data
May need to change the format of the data. e.g. dates, remove illegal characters,...
Modelling
Select the modelling techniques
Considering the assumptions each technique makes
Generate test design
Work out how you're going to test the model quality and validity
Build the model
Run the modelling tool on the prepared data t o create a model
Assess the model
Judge the success of the model, based on its accuracy, generality, the test design and the success criteria possibly with assistance from domain experts
Evaluation
Evaluate results
Based on the original business objectives (as opposed to accuracy and generality in the modelling phase)
Review process
Quality assurance and did the project miss any important factor or task in the business problem?
Determine next steps
Do you need to do something else, or can we move to deployment?
Deployment
Plan deployment
Develop a strategy for getting the insights (and possibly model) into the business
Plan monitoring and maintenance
How do you maintain the deployed model
Produce final report
Describing all the previous steps and possibly a presentation to the customer
Review project
Reflect on the entire project. What worked?What didn't ? Hints for future?
Feature Types & their Operations