LIQING LIN

01_the_machine_learning_landscape

1. How would you define Machine Learning?

Ans: Machine Learning is about building systems that can learn from data. Learning means getting better at some task, given some performance measure.

2. Can you name four types of problems where it shines?

Ans: Machine Learning is great for complex problems for which we have no algorithmic solution(the best Machine Learning techniques can find a solution), to replace long lists of hand-tuned rules(one machine learning algorithm can often simplify code and perform better), to build systems that adapt to fluctuating environments(a Machine learning system can adapt to new data), and finally to help humans learn(e.g., data mining; geting insights about complex problems and large amounts of data).

3. What is a labeled training set?

Ans: A labeled training set is a training set that contains the desired solution(called label) for each instance.

4. what are the two most common supervised tasks?

ans: The two most common supervised tasks are regression(this task is to predict a target numeric value, given a set of features) and classification.

Note that some regression algorithm can be used for classification as well, and vice versa. For example, Logistic Regression is commonly used for classification, as it can output a value that corresponds to the probability of belonging to a given class.

5. Can you name four common unsupervised tasks?(the training data is unlabeled)

Ans: Common unsupervised tasks include clustering, (run a clustering algorithm to try to detect groups of similar visitors.)

{k-Means, Hierarchical Cluster Analysis, Expectation Maximization)

visulization,(Visualization algorithms are also good examples of unsupervised learning algorithms: you feed them a lot of complex and unlabeled data, and they output a 2D or 3D representation of your data that can easily be plotted . These algorithms try to preserve as much structure as they can (e.g., trying to keep separate clusters in the input space from overlapping in the visualization), so you can understand how the
data is organized and perhaps identify unsuspected patterns.)

{Principal Component Analysis, Kernel PCA, Locally-Linear Embedding, t-distributed Stochastic Neighbor Embedding}

dimensionality reduction, (merge several correlated features into one. It will run much faster, the data will take up less disk and memory space, and even in some cases it may also perform better.)

and association rule learning.(the goal is to dig into large amounts of data and discover interesting relations between attributes)

{Apriori, Eclat}

6. What type of Machine Learning algorithm would you use to allow a robot to walk in various unknown terrains(地形)?

Ans: Reinforcement Learning is likely to perform best if we want a robot to learn to walk in various unknown terrains since this is typically the type of problem as a supervised or semisupervised learning problem, but it would be less natural.(Reinforcement Learning system called an agent in the context, can observe the environment, select and perform actions, and get rewards in return or penalities in the form of negative rewards. It must then learn by itself what is the best strategy, called a policy to get the most reward over time. A policy defines what action the agent should choose when it is in given situation)

7. What type of algorithm would you use to segment your customers into multiple groups?

Ans: If you don't know how to define the groups, then you can use a clustering algorithm (unsupervised learning) to segment your customers into clusters of similar customers. However , if you know what groups you would like to have, then you can feed many examples of each group to a classification algorithm(supervised learning), and it will classify all your customers into these groups.

8. Would you frame the problem of spam detection as a supervised learning problem or an unsupervised learning problem?

Ans: Spam detection is a typical supervised learning problem: the alorithm is fed many emails among with their label(spam or not spam).

9. What is an online learning system?

Ans: An online learning system can learn incrementally, as opposed to a batch learning system. This makes it capable of adapting rapidly to both changing data and automous systems, and of training on very large quantities of data.

10. What is out-of-core learning?

Ans: Out-of-core algorithm can handle vast quantities of data that cannot fit in a computer's main memory. An out-of-core learning algorithm chops the data into mini-batches and use online learning techniques to learn from these mini-batches.

11. What type of learning algorithm relies on a similarity measure to make predictions?

Ans: An instance-based learning system learns the training data by heart; then, when given a new instance, it uses a similarity measure to find the most similar learned instances and uses them to make predictions.

for example: A (very basic) similarity measure between two emails could be to count
the number of words they have in common. The system would flag an email as spam if it has many words
in common with a known spam email.

12. What is the difference between a model parameter and a learning algorithm’s
hyperparameter?

Ans: A model has one or more model parameters that determine what it will predict given a new instance(e.g.,
the slope of a linear model). A learning algorithm tries to find optimal values for these parameters such that the model generalizes well to new instances. A hyperparameter is a parameter of the learning algorithm itself, not of the model(e.g., the amount of regularization to apply during learning can be controlled by a hyperparameter.). it must be set prior to training and remains constant during training.

Constraining a model to make it simpler and reduce the risk of overfitting is called regularization.

13. What do model-based learning algorithms search for? What is the most common
strategy they use to succeed? How do they make predictions?

Ans: Model-based learning algorithms search for an optimal value for the model parameters such that
the model will generalize well to new instances. We usually train such systems by minimizing a cost function
that measures how bad the system is at making predictions on the training data, plus a penalty for model
complexity if the model is regularized. To make predictions, we feed the new instance's into the model's prediction function, using the parameter values found by the learning algorithm.

14. Can you name four of the main challenges in Machine Learning?

Ans: Some of the main challenges in Machine Learning are the lack of data, poor data quality, nonrepresentative data,
uninformative features, excessively simple models that underfit the training data, and excessively complex models
that overfit the data.

15. If your model performs great on the training data but generalizes poorly to new
instances, what is happening? Can you name three possible solutions?

Ans: If a model performs great on the training data but generalizes poorly to new instances, the model is likely
overfitting the training data(or we got extremely lucky on the training data). Possible solutions to overfitting are
getting more data, simplifying the model(selecting a simpler algorithm, reducing the number of parameters or
features used, or regularizing the model), or reducing the noise in the training data.

Constraining a model to make it simpler and reduce the risk of overfitting is called regularization.

16. What is a test set and why would you want to use it?

Ans: A test set is used to estimate the generalization error that a model will make on new instances, before the model
is launched in production.

17. What is the purpose of a validation set?

Ans: A validation set is used to compare models. It makes it possible to select the best model and tune the hyperparameters.

You train multiple models with various hyperparameters using the training set, you select the model and
hyperparameters that perform best on the validation set, and when you’re happy with your model you run
a single final test against the test set to get an estimate of the generalization error.

18. What can go wrong if you tune hyperparameters using the test set?

Ans: If you tune hyperparameters using the test set, you risk overfitting the test set, and the generalization error
you measure will be optimistic(you may launch a model that performs worse than you expect).

19. What is cross-validation and why would you prefer it to a validation set?

Cross-validation is a technique that makes it possible to compare models (for model selection and hyperparameter tuning) without the need for a separate validation set. This saves precious training data.

To avoid “wasting” too much training data in validation sets, a common technique is to use crossvalidation:
the training set is split into complementary subsets, and each model is trained against a
different combination of these subsets and validated against the remaining parts. Once the model type and
hyperparameters have been selected, a final model is trained using these hyperparameters on the full
training set, and the generalized error is measured on the test set.

20. How is machine learning distinct from traditional programming?

Machine learning algorithms learn from the data.

21. Machine learning is the practice of using algorithms to analyze _data_, learn from this, and then make a determination or prediction about new data.

22. With machine learning, programmers typically write explicit code to accomplish tasks. False

23. Deep learning is a type of machine learning. True

24. Machine learning can be used to solve classification problems. True

你可能感兴趣的:(01_the_machine_learning_landscape)

01_the_machine_learning_landscape LIQING LIN
1.HowwouldyoudefineMachineLearning?Ans:MachineLearningisaboutbuildingsystemsthatcanlearnfromdata.Learningmeansgettingbetteratsometask,givensomeperformancemeasure.2.Canyounamefourtypesofproblemswhereit
解线性方程组 qiuwanchi
package gaodai.matrix; import java.util.ArrayList; import java.util.List; import java.util.Scanner; public class Test { public static void main(String[] args) { Scanner scanner = new Sc
在mysql内部存储代码 annan211 性能 mysql 存储过程触发器
在mysql内部存储代码在mysql内部存储代码，既有优点也有缺点，而且有人倡导有人反对。先看优点： 1 她在服务器内部执行，离数据最近，另外在服务器上执行还可以节省带宽和网络延迟。 2 这是一种代码重用。可以方便的统一业务规则，保证某些行为的一致性，所以也可以提供一定的安全性。 3 可以简化代码的维护和版本更新。 4 可以帮助提升安全，比如提供更细
Android使用Asynchronous Http Client完成登录保存cookie的问题 hotsunshine android
Asynchronous Http Client是android中非常好的异步请求工具除了异步之外还有很多封装比如json的处理，cookie的处理引用 Persistent Cookie Storage with PersistentCookieStore This library also includes a PersistentCookieStore whi
java面试题 Array_06 java 面试
java面试题第一，谈谈final, finally, finalize的区别。 final-修饰符（关键字）如果一个类被声明为final，意味着它不能再派生出新的子类，不能作为父类被继承。因此一个类不能既被声明为 abstract的，又被声明为final的。将变量或方法声明为final，可以保证它们在使用中不被改变。被声明为final的变量必须在声明时给定初值，而在以后的引用中只能
网站加速 oloz 网站加速
前序:本人菜鸟，此文研究总结来源于互联网上的资料，大牛请勿喷！本人虚心学习，多指教. 1、减小网页体积的大小，尽量采用div+css模式，尽量避免复杂的页面结构，能简约就简约。 2、采用Gzip对网页进行压缩； GZIP最早由Jean-loup Gailly和Mark Adler创建，用于UNⅨ系统的文件压缩。我们在Linux中经常会用到后缀为.gz
正确书写单例模式随意而生 java 设计模式单例
　　单例模式算是设计模式中最容易理解，也是最容易手写代码的模式了吧。但是其中的坑却不少，所以也常作为面试题来考。本文主要对几种单例写法的整理，并分析其优缺点。很多都是一些老生常谈的问题，但如果你不知道如何创建一个线程安全的单例，不知道什么是双检锁，那这篇文章可能会帮助到你。　　懒汉式，线程不安全　　当被问到要实现一个单例模式时，很多人的第一反应是写出如下的代码，包括教科书上也是这样
单例模式香水浓 java
懒汉调用getInstance方法时实例化 public class Singleton { private static Singleton instance; private Singleton() {} public static synchronized Singleton getInstance() { if(null == ins
安装Apache问题：系统找不到指定的文件 No installed service named "Apache2" AdyZhang apache http server
安装Apache问题：系统找不到指定的文件 No installed service named "Apache2" 每次到这一步都很小心防它的端口冲突问题，结果，特意留出来的80端口就是不能用，烦。解决方法确保几处： 1、停止IIS启动 2、把端口80改成其它（譬如90，800，，，什么数字都好） 3、防火墙(关掉试试) 在运行处输入 cmd 回车，转到apa
如何在android 文件选择器中选择多个图片或者视频？ aijuans android
我的android app有这样的需求，在进行照片和视频上传的时候，需要一次性的从照片/视频库选择多条进行上传但是android原生态的sdk中，只能一个一个的进行选择和上传。我想知道是否有其他的android上传库可以解决这个问题，提供一个多选的功能，可以使checkbox之类的，一次选择多个处理方法官方的图片选择器(但是不支持所有版本的androi，只支持API Level
mysql中查询生日提醒的日期相关的sql baalwolf mysql
SELECT sysid,user_name,birthday,listid,userhead_50,CONCAT(YEAR(CURDATE()),DATE_FORMAT(birthday,'-%m-%d')),CURDATE(), dayofyear( CONCAT(YEAR(CURDATE()),DATE_FORMAT(birthday,'-%m-%d')))-dayofyear(
MongoDB索引文件破坏后导致查询错误的问题 BigBird2012 mongodb
问题描述： MongoDB在非正常情况下关闭时，可能会导致索引文件破坏，造成数据在更新时没有反映到索引上。解决方案：使用脚本，重建MongoDB所有表的索引。 var names = db.getCollectionNames(); for( var i in names ){ var name = names[i]; print(name);
Javascript Promise bijian1013 JavaScript Promise
Parse JavaScript SDK现在提供了支持大多数异步方法的兼容jquery的Promises模式，那么这意味着什么呢，读完下文你就了解了。一.认识Promises “Promises”代表着在javascript程序里下一个伟大的范式，但是理解他们为什么如此伟大不是件简
[Zookeeper学习笔记九]Zookeeper源代码分析之Zookeeper构造过程 bit1129 zookeeper
Zookeeper重载了几个构造函数，其中构造者可以提供参数最多，可定制性最多的构造函数是 public ZooKeeper(String connectString, int sessionTimeout, Watcher watcher, long sessionId, byte[] sessionPasswd, boolea
【Java命令三】jstack bit1129 jstack
jstack是用于获得当前运行的Java程序所有的线程的运行情况(thread dump），不同于jmap用于获得memory dump [hadoop@hadoop sbin]$ jstack Usage: jstack [-l] <pid> (to connect to running process) jstack -F
jboss 5.1启停脚本　动静分离部署 ronin47
以前启动jboss，往各种xml配置文件，现只要运行一句脚本即可。start nohup sh /**/run.sh -c servicename -b ip -g clustername -u broatcast jboss.messaging.ServerPeerID=int -Djboss.service.binding.set=p
UI之如何打磨设计能力? brotherlamp UI ui教程 ui自学 ui资料 ui视频
在越来越拥挤的初创企业世界里，视觉设计的重要性往往可以与杀手级用户体验比肩。在许多情况下，尤其对于 Web 初创企业而言，这两者都是不可或缺的。前不久我们在《右脑革命：别学编程了，学艺术吧》中也曾发出过重视设计的呼吁。如何才能提高初创企业的设计能力呢?以下是 9 位创始人的体会。 1.找到自己的方式如果你是设计师，要想提高技能可以去设计博客和展示好设计的网站如D-lists或
三色旗算法 bylijinnan java 算法
import java.util.Arrays; /** 问题：假设有一条绳子，上面有红、白、蓝三种颜色的旗子，起初绳子上的旗子颜色并没有顺序，您希望将之分类，并排列为蓝、白、红的顺序，要如何移动次数才会最少，注意您只能在绳子上进行这个动作，而且一次只能调换两个旗子。网上的解法大多类似：在一条绳子上移动，在程式中也就意味只能使用一个阵列，而不使用其它的阵列来
警告:No configuration found for the specified action: \'s chiangfai configuration
1.index.jsp页面form标签未指定namespace属性。  <%@taglib prefix="s" uri="/struts-tags"%> ... <s:form action="submit" method="post"&g
redis -- hash_max_zipmap_entries设置过大有问题 chenchao051 redis hash
使用redis时为了使用hash追求更高的内存使用率，我们一般都用hash结构，并且有时候会把hash_max_zipmap_entries这个值设置的很大，很多资料也推荐设置到1000，默认设置为了512，但是这里有个坑 #define ZIPMAP_BIGLEN 254 #define ZIPMAP_END 255 /* Return th
select into outfile access deny问题 daizj mysql txt 导出数据到文件
本文转自：http://hatemysql.com/2010/06/29/select-into-outfile-access-deny%E9%97%AE%E9%A2%98/ 为应用建立了rnd的帐号，专门为他们查询线上数据库用的，当然，只有他们上了生产网络以后才能连上数据库，安全方面我们还是很注意的，呵呵。授权的语句如下： grant select on armory.* to rn
phpexcel导出excel表简单入门示例 dcj3sjt126com PHP Excel phpexcel
<?php error_reporting(E_ALL); ini_set('display_errors', TRUE); ini_set('display_startup_errors', TRUE); if (PHP_SAPI == 'cli') die('This example should only be run from a Web Brows
美国电影超短200句 dcj3sjt126com 电影
1. I see．我明白了。2. I quit! 我不干了!3. Let go! 放手!4. Me too．我也是。5. My god! 天哪!6. No way! 不行!7. Come on．来吧(赶快)8. Hold on．等一等。9. I agree。我同意。10. Not bad．还不错。11. Not yet．还没。12. See you．再见。13. Shut up!
Java访问远程服务 dyy_gusi httpclient webservice get post
随着webService的崛起，我们开始中会越来越多的使用到访问远程webService服务。当然对于不同的webService框架一般都有自己的client包供使用，但是如果使用webService框架自己的client包，那么必然需要在自己的代码中引入它的包，如果同时调运了多个不同框架的webService，那么就需要同时引入多个不同的clien
Maven的settings.xml配置 geeksun settings.xml
settings.xml是Maven的配置文件，下面解释一下其中的配置含义： settings.xml存在于两个地方： 1.安装的地方：$M2_HOME/conf/settings.xml 2.用户的目录：${user.home}/.m2/settings.xml 前者又被叫做全局配置，后者被称为用户配置。如果两者都存在，它们的内容将被合并，并且用户范围的settings.xml优先。
ubuntu的init与系统服务设置 hongtoushizi ubuntu
转载自： http://iysm.net/?p=178 init Init是位于/sbin/init的一个程序，它是在linux下，在系统启动过程中，初始化所有的设备驱动程序和数据结构等之后，由内核启动的一个用户级程序，并由此init程序进而完成系统的启动过程。 ubuntu与传统的linux略有不同，使用upstart完成系统的启动，但表面上仍维持init程序的形式。运行
跟我学Nginx+Lua开发目录贴 jinnianshilongnian nginx lua
使用Nginx+Lua开发近一年的时间，学习和实践了一些Nginx+Lua开发的架构，为了让更多人使用Nginx+Lua架构开发，利用春节期间总结了一份基本的学习教程，希望对大家有用。也欢迎谈探讨学习一些经验。目录第一章安装Nginx+Lua开发环境第二章 Nginx+Lua开发入门第三章 Redis/SSDB+Twemproxy安装与使用第四章 L
php位运算符注意事项 home198979 位运算 PHP &
$a = $b = $c = 0; $a & $b = 1; $b | $c = 1 问a,b,c最终为多少? 当看到这题时，我犯了一个低级错误，误以为位运算符会改变变量的值。所以得出结果是1 1 0 但是位运算符是不会改变变量的值的，例如： $a=1;$b=2; $a&$b; 这样a,b的值不会有任何改变
Linux shell数组建立和使用技巧 pda158 linux
1.数组定义　　[chengmo@centos5 ~]$ a=(1 2 3 4 5) 　　[chengmo@centos5 ~]$ echo $a 　　1 　　一对括号表示是数组，数组元素用“空格”符号分割开。　　 2.数组读取与赋值　　得到长度：　　[chengmo@centos5 ~]$ echo ${#a[@]} 　　5 　　用${#数组名[@或
hotspot源码(JDK7) ol_beta java HotSpot jvm
源码结构图，方便理解： ├─agent Serviceab
Oracle基本事务和ForAll执行批量DML练习 vipbooks oracle sql
基本事务的使用：从账户一的余额中转100到账户二的余额中去，如果账户二不存在或账户一中的余额不足100则整笔交易回滚 select * from account; -- 创建一张账户表 create table account( -- 账户ID id number(3) not null, -- 账户名称 nam