http://blog.csdn.net/pipisorry/article/details/43089121
机器学习Machine Learning - Andrew NG courses学习笔记
Introduction机器学习介绍
内容content:
What is Machine Learning
Supervised Learning
Unsupervised Learning
机器学习的来源和用例:
Machine Learning
- Grew out of work in AI
- New capability for computers
Examples:
- Database mining
Large datasets from growth of automation/web.
E.g., Web click data, medical records, biology, engineering
- Applications can’t program by hand.
E.g., Autonomous helicopter, handwriting recognition, most of
Natural Language Processing (NLP), Computer Vision.
机器学习用于商业运营的典型用例
客户潜在顾客评分、市场细分、个性化推荐、预防客户的流失、产品辅助定价、产品路线图、信贷风险评分、欺诈检测、欺诈发现等
[Machine Learning – 9 Most Common Usecases for Higher Business Growth]
机器学习的定义Machine Learning definition
Arthur Samuel (1959). Machine Learning:
Field of study that gives computers the ability to learn without being explicitly programmed.
Tom Mitchell (1998) Well-posed Learning Problem:
A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
一个计算机程序从与一些任务T还有一些性能指标P相关的经验中学习,如果用性能度量P测定在任务T上性能,则通过经验E来提高性能度量.
例子:Suppose your email program watches which emails you do or do not mark as spam, and based on that learns how to better filter spam. What is the task T in this setting?
Classifying emails as spam or not spam. Task
Watching you label emails as spam or not spam. Experence
The number (or fraction) of emails correctly classified as spam/not spam. Performance
None of the above—this is not a machine learning problem.
这个例子就是说program通过你label垃圾邮件来学习,完成垃圾邮件classifing的任务,并不断通过学习来提高performance.
机器学习算法Machine learning algorithms
- Supervised learning监督学习
- Unsupervised learning非监督学习
Others: Reinforcement learning, recommender systems.
监督学习Supervised Learning
Supervised Learning:“right answers” given,给出训练数据{(size in feet2, price in 1000)的数据集}正确的值(这里是Price)。可以认为是有标签的训练数据。
回归Regression:
Predict continuous valued output (price)
回归的例子:
分类Classification
Discrete valued output (0 or 1)
分类的例子1(1个feature):
分类的例子2(2个feature右边是更多的feature的例子):
区分分类和回归的例子:
You’re running a company, and you want to develop learning algorithms to address each of two problems.
Problem 1: You have a large inventory of identical items. You want to predict how many of these items will sell over the next 3 months.
Problem 2: You’d like software to examine individual customer accounts, and for each account decide if it has been hacked/compromised.
Question:Should you treat these as classification or as regression problems?
1 Treat both as classification problems.
2 Treat problem 1 as a classification problem, problem 2 as a regression problem.
3 Treat problem 1 as a regression problem, problem 2 as a classification problem.
4 Treat both as regression problems.
Answer:3 is right.
For problem one, I would treat this as a regression problem, because if I have, you know, thousands of items, well, I would probably just treat this as a real value,as a continuous value. And treat, therefore, the number of items I sell,as a continuous value.
And for the second problem, I would treat that as a classification problem, because I might say, set the value I want to predict with zero, to denote the account has not been hacked. And set the value one to denote an account that has been hacked into.
非监督学习Unsupervised Learning
Unsupervised Learning, which is a learning setting where you give the algorithm a ton of data and just ask it to find structure in the data for us.
not giving the algorithm the right answer for the examples in my data set.
与Supervised Learning的区别:
聚类Clustering(one type of Unsupervised Learning)
例子:
{通过聚类Genes来groups不同的人}So this is Unsupervised Learning because we're not telling the algorithm in advance that these are type 1 people, those are type 2 persons, those are type 3 persons and so on and instead what were saying is yeah here's a bunch of data.
example1 of clustering:
large computer clusters and trying to figure out which machines tend to work together and if you can put those machines together,you can make your data center work more efficiently.
social network analysis.So given knowledge about which friends you email the most or given your Facebook friends or your Google+ circles, can we automatically identify which are cohesive groups of friends,also which are groups of people that all know each other?
Market segmentation.Many companies have huge databases of customer information.So, can you look at this customer data set and automatically discover market segments and automatically group your customers into different market segments so that you can automatically and more efficiently sell or market your different market segments together?
example2 of clustering:Cocktail party problem
“ 鸡尾酒会问题”(cocktail party problem)是在计算机语音识别领域的一个问题,当前语音识别技术已经可以以较高精度识别一个人所讲的话,但是当说话的人数为两人或者多人时,语音识别率就会极大的降低。
Cocktail party problem algorithm
[W,s,v] = svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');
Supervised Learning 和Unsupervised Learning的区别的例子:
Question: Of the following examples, which would you address using an unsupervised learning algorithm?
1 Given email labeled as spam/not spam, learn a spam filter.
2 Given a set of news articles found on the web, group them into set of articles about the same story.
3 Given a database of customer data, automatically discover market segments and group customers into different market segments.
4 Given a dataset of patients diagnosed as either having diabetes or not, learn to classify new patients as having diabetes or not.
Answer:
1 and 4 is supervised learning algorithms while 2 and 3 is unsupervised learning algorithms.
Explain:If you have labeled data, you know, with spam and non-spam e-mail, we'd treat this as a Supervised Learning problem.
The news story example, that's exactly the Google News example,how you can use a clustering algorithm to cluster these articles together so that's Unsupervised Learning.
关于机器学习开发环境Octave
Why?:if you use Octave as your learning tool and as your prototyping(原型) tool, it'll let you learn and prototype learning algorithms much more quickly.use an algorithm like Octave to first prototype the learning algorithm, and only after you've gotten it to work, then you migrate it to C++ or Java or whatever.
Octave 安装教程 Octave // Matlab Tutorial
Octave文档 Octave documentation
from:http://blog.csdn.net/pipisorry/article/details/43089121
ref: [机器学习系列(4)_机器学习算法一览,应用建议与解决思路 ]