pipisorry

Practical Machine Learning实用机器学习章1

http://blog.csdn.net/pipisorry/article/details/46490177

实用机器学习Practical Machine Learning courses学习笔记

Practical Machine Learning实用机器学习

1.1 Prediction motivation预测的动机

课程概览About this course

This course covers the basic ideas behind machine learning/prediction, What this course depends on What would be useful
·Study design training
vs. test sets
Conceptual issues out
of sample error, ROC curves
Practical implementation the
caret package
·The Data Scientist's Toolbox
R Programming
·Exploratory analysis
Reporting Data and Reproducible Research
Regression models

机器学习的用处

Local governments > pension(退休金) payments
Google >whether you will click on an ad
Amazon >what movies you will watch
Insurance companies >what your risk of death is
Johns Hopkins >who will succeed in their programs

推荐书目及资源

The elements of statistical learning

Machine learning (more advanced material)

List of machine learning resources on Quora
List of machine learning resources from Science
Advanced notes from MIT open courseware
Advanced notes from CMU
Kaggle machine learning competitions

1.2 什么是预测What is prediction

预测问题的中心教条dogma

predict for these dots whether they're red or blue:

choosing the right dataset and that knowing what the specific question is are again paramount(最重要的)

可能存在的问题

一个例子：Google Flu trends algorithm didn't realize the search terms that people would use would change overtime.They might use different terms when they were searching, and so that would affect the algorithm's performance.And also, the way that those terms were actually being used in the algorithm wasn't very well understood.And so when the function of a particular search term changed in their algorithm, it can cause problems.

预测器的流程components of a predictor

question -> input data -> features -> algorithm -> parameters -> evaluation

Note: question: What are you trying to predict and what are you trying to predict it with?

预测的一个例子：垃圾邮件

question -> input data -> features -> algorithm -> parameters -> evaluation

Start with a general question

Can I automatically detect emails that are SPAM that are not?

Make it concrete

Can I use quantitative characteristics of the emails to classify them as SPAM/HAM?

Note:try to make it as concrete as possible

question -> input data -> features -> algorithm -> parameters -> evaluation

rss.acs.unt.edu/Rdoc/library/kernlab/html/spam.html

question -> input data -> features -> algorithm -> parameters -> evaluation

library(kernlab)
data(spam)
head(spam)

question -> input data -> features -> algorithm-> parameters -> evaluation

Our simple algorithm

Find a value C .
frequency of 'your' > C predict "spam"

Practical Machine Learning实用机器学习章1_第2张图片

Note:best cut off is above 0.5 then we say that it's SPAM, and if it's below 0.5 we can say that it's HAM.

question -> input data -> features -> algorithm -> parameters -> evaluation

question -> input data -> features -> algorithm -> parameters -> evaluation

1.3 步骤的相对重要性Relative importance of steps

{about the tradeoffs and the different components of building a machine learning algorithm}

Relative order of importance:question > data > features > algorithms

...

Then creating features is an important component in that if you don't compress the data in the right way you might lose all of the relevant and valuable information.

And finally, in my experience it's been the algorithmis often the least important part of building a machine learning algorithm.It can be very important depending on the exact modality of the type of data that you're using.For example, image data and voice data can require certain kinds of prediction algorithms that might not necessarily be as.

An important point

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.--John Tukey

In other words, an important component of knowing how to do prediction is to know when to give up,when the data that you have is just not sufficient to answer the question that you're trying to answer.

Garbage in = Garbage out

question -> input data -> features -> algorithm -> parameters -> evaluation

May be easy (movie ratings -> new movie ratings)
May be harder (gene expression data -> disease)
Depends on what is a "good prediction".
Often more data > better models
The most important step!

The key point that you need to remember.When making that decision is garbage in garbage out.In other words if you have bad data that you collected.Or data that isn't very useful for performing predictions.No matter how good your machine learning algorithm is. You'll often get very bad results out.

特征关系重大Features matter!

question -> input data -> features -> algorithm -> parameters -> evaluation

Properties of good features

Lead to data compression
Retain relevant information
Are created based on expert application knowledge

Note:there's a debate in the community about whether it's better to create features automatically or whether it's better to use expert domain knowledge.And in general it seems that the, expert domain knowledge can help quite a bit in many, many applications and so should be consulted when building a features for machine learning algorithm.

Common mistakes

Trying to automate feature selection
Not paying attention to data-specific quirks
Throwing away information unnecessarily

Note:

1. Some common mistake are trying to automate feature selection in a way that doesn't allow for you to understand how those features are actually.Being applied to make good predictions.Black box predictions can be very useful,can be very accurate but they can also change on a dime if we're not paying attention to how those features actually do predict the outcome.

2. The function of a particular set of data set might be that there's outlines if there's weird behaviors of specific features and not understanding those can cause problems.
Based on a bunch of features they sort of collected in an unsupervised way.In other words they filtered through the data in a way to identify those features that might be useful for later predictive algorithms.But even when they did this, they went back and looked at those features and tried to figure out why they would be predictive and so for example these features, this feature here makes it very clear why it would be a good predictor for a cat.

算法并没有那么重要Algorithms matter less than you'd think

question -> input data -> features -> algorithm -> parameters -> evaluation
Using a very, sensible approach will get you a very large weight of solving the problem.And then, getting the absolute best method can improve, but it often doesn't improve that much over, sort of, most good sensible methods.

建立机器学习算法要考虑的问题

Note:Scalable means it's easy to apply to a large data set.Whether that's because it's very very fast or whether it's because it's parallelizable across multiple samples for example.

[Gaining access to the best machine-learning methods]

Prediction is about accuracy tradeoffs

Interpretability versus accuracy
Speed versus accuracy
Simplicity versus accuracy
Scalability versus accuracy

Interpretability matters

Note:如医疗上，医生更容易理解决策树形式的。

http://www.cs.cornell.edu/~chenhao/pub/mldg-0815.pdf

Scalability matters

http://www.techdirt.com/blog/innovation/articles/20120409/03412518422/

http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html

Note:netflix从未应用竞赛的算法到实际中是因为拓展性问题，不能应用到大数据中

1.4 训练集和测试集误差 In and out of sample errors

In Sample Error: The error rate you get on the samedata set you used to build your predictor. Sometimescalled resubstitution error.

Out of Sample Error: The error rate you get on a newdata set. Sometimes called generalization error.

Key ideas

Out of sample error is what you care about
In sample error < out of sample error
The reason is overfitting ： Matching your algorithm to the data you have

Note:

1. In sample error is always going to be a little bit optimistic,from what the error is that you would get from a new sample.And the reason why is, in your specific sample, sometimes your prediction algorithm will tune itself a little bit to the noise that you collected in that particular data set.

2. So sometimes you want to be able to give up a little bit of accuracy in the sample you have, to be able to get accuracy on new data sets.In other words, when the noise is a little bit different, your algorithm will berobust.

过拟合Overfitting

Data have two parts
- Signal ： the part we're trying to use to predict
- Noise ： random variation in the dataset that we get, because the data are measured noisily.
The goal of a predictor is to find signal
You can always design a perfect in-sample predictor; You capture both signal + noise when you do that; Predictor won't perform as well on new samples

Note:the goal of a predictor is to find a signal and ignore the noise.

1.5 预测的研究设计Prediction study design

{about how to minimize the problems that can be caused by in sample verses out of sample errors}

Define your error rate
Split data into:
- Training, Testing, Validation (optional)
On the training set pick features
- Use cross-validation
On the training set pick prediction function
- Use cross-validation
If no validation
- Apply 1x to test set
If validation
- Apply to test set and refine
- Apply 1x to validation

Note:

1. 5:why do we only apply it one time?If we applied multiple models to our testing set, then, and pick the best one, then we're using the test set, in some sense, to train the model.In other words, we're still getting an optimistic view of what the data error would be on a completely new dataset.
2. 6:apply your best prediction models all to your test set and refine them a little bit.So what you might find is that some features don't work so well when you're doing out of sample prediction and you might refine and adjust your model a little bit.But now, again, like I said, your test set error is going to be a little bit optimistic error for what your actual out of sample error will be.And so what we do is we again, apply our model to exactly one time to the validation set, only the best one to get our prediction.

3. The idea is that there is one dataset that's held out from the very start, that you only apply exactly one model to, and you never do any training or tuning or testing to, and that will give you a good estimate of your out of sample error rates.

Know the benchmarks

http://www.heritagehealthprize.com/c/hhp/leaderboard

Study design

Practical Machine Learning实用机器学习章1_第7张图片

http://www2.research.att.com/~volinsky/papers/ASAStatComp.pdf

Avoid small sample sizes

One classifier is flipping a coin
Probability of perfect classification is approximately:
- (12)samplesize
- n=1 flipping coin 50% chance of 100% accuracy
- n=2 flipping coin 25% chance of 100% accuracy
- n=10 flipping coin 0.10% chance of 100% accuracy

Note:make sure that especially our test sizes are of relatively large size so we can be sure that we're not just getting good prediction accuracy by chance.

数据集划分规则Rules of thumb for prediction study design

If you have a large sample size
- 60% training
- 20% test
- 20% validation
If you have a medium sample size
- 60% training
- 40% testing
If you have a small sample size
- Do cross validation
- Report caveat of small sample size

Note:

1. 2:you don't get to refine your models in a test set and then apply them to a validation set.But it might insure that your testing site is of sufficient size.

2. 3：First of all, you might reconsider whether you have enough samples to be able to build a prediction algorithm in the first place.But suppose your dead set on building a prediction or machine learning algorithm, then the idea might be to do cross validation and report the caveat of the small sample size and the fact that you never got to predict this in an out of sample or a testing data set.

要记住的原则Some principles to remember

1. Set the test/validation set aside and don't look at it
2. In general randomly sample training and test
3. Your data sets must reflect structure of the problem
- If predictions evolve with time split train/test in time chunks (calledbacktesting in finance)
4. All subsets should reflect as much diversity as possible
- Random assignment does this
- You can also try to balance by features - but this is tricky

Note:

1. the test set or the validation set should be set aside and never looked at when building your model.In other words, you would need to have one data set which you apply only one model to, only one time, and that data set should be completely independent of anything you use to build the prediction model.
2. So for example, if you have, time fit time force data, in other words you have, data collected over time, you might want to build your, training set inchunksof time, but again, random chunks of time and build them on random predictions.
3. In other words, if you want to sample any data set that might have sources of dependence over time or across space,you need to sample your data in chunks.This is called backtesting in finance.And it's basically the idea that you want to be able to use chunks of data that consist of observations over time.

1.6 误差类型Types of errors

{about the types of errors and the ways that evaluate the prediction functions}

基本术语 Basic terms

对于二分类问题：

In general, Positive = identified and negative = rejected. Therefore:

True positive = correctly identified

False positive = incorrectly identified

True negative = correctly rejected

False negative = incorrectly rejected

Medical testing example:

True positive = Sick people correctly diagnosed as sick

False positive= Healthy people incorrectly identified as sick

True negative = Healthy people correctly identified as healthy

False negative = Sick people incorrectly identified as healthy.

关键量 Key quantities

敏感性sensitivity（召回率recall）、特异性specificity、正预测值positive predictive value（精确度precision）、负预测值、准确性accuracy

Practical Machine Learning实用机器学习章1_第8张图片

Practical Machine Learning实用机器学习章1_第9张图片

Practical Machine Learning实用机器学习章1_第10张图片

Note:
1. the columns correspond to what your disease status is, in this particular example,positive means that you have the disease.
2. And the test is our prediction, our machine learning algorithm.A positive means we predict that you have a disease.
3. 准确性the accuracy is just a probability that we classified you to the correct outcome.It's the true positives, and the true negatives, just added up.
[ Precision/Recall精确度/召回率]

[http://en.wikipedia.org/wiki/Sensitivity_and_specificity]

筛选试验 Screening tests

Practical Machine Learning实用机器学习章1_第11张图片

The general population(0.1%)
if you got a positive test, what's the probability that you actually have the disease?(也就是已知下面题目(左)中的sensitivity和specificity以及0.001的人群感染率，求positive predictive value)

Practical Machine Learning实用机器学习章1_第12张图片

Note:
1. 敏感性和特异性解释：In other words, the probability that we'll get it right, if you're diseased is 99%, and the probability we'll get it right if you're healthy is 99%.
2. 另两种解法：

Practical Machine Learning实用机器学习章1_第13张图片

Practical Machine Learning实用机器学习章1_第14张图片

{右图中+代表实际有病，T+代表算法检测有病}
3. 算出结果很小的原因是：
The reason is 99%(99) of a small number, so 99 out of 100. is still smaller than 1%(999) out of a much bigger number.

Practical Machine Learning实用机器学习章1_第15张图片

High risk sub-population(10%)
If instead we consider the case where 10% of people are actually sick

Practical Machine Learning实用机器学习章1_第16张图片

Note:这时计算出的结果为92%，明显大了很多
结论：
If you're predicting a rare event.You have to be aware of, how rare that event is.This goes back to the idea of knowing what population you're sampling from.
[ http://www.biostat.jhsph.edu/~iruczins/teaching/140.615/]

连续型数据度量 For continuous data

Practical Machine Learning实用机器学习章1_第17张图片

Note:
1. The goal here is to see how close you are to the truth.
2. squared this distance is a little bit hard to interpret on the same scale as the predictions or the truth.And so they take the square root of that quantity.

But RMSE often doesn't work when there are a lot of outliers.Or the values of the variables can have very different scales.for example, if you have one really,really large value.It might really raise the mean.
常见的误差度量方法 Common error measures

Median absolute deviation 平均绝对偏差
- Continuous data, often more robust
Sensitivity (recall)
- If you want few missed positives
Specificity
- If you want few negatives called positives
Accuracy
- Weights false positives/negatives equally
Concordance调和
- One example is kappa

Note:
1. Median absolute deviation:they take the median of the diff, distance between the observed value,and the predicted value, and they do the absolute value instead of doing the squared value.And so again, that requires all of the distances to be positive,but it's a little bit more robust to the size of those errors.
2.3. sensitivity and specificity are very commonly used when talking about particularly medical tests, but they also are particularly widely used if you care about one type of error more than the other type of error.
4. accuracy is an important point if again you have a very large discrepancy(差异) in number of times that you're a positive or a negative.
5. For multiclass cases,one particular distance measure, kappa.But there are a whole large class of distance measures, and they all have different properties, that can be used when you have multiclass data.
from: http://blog.csdn.net/pipisorry/article/details/46490177

你可能感兴趣的:(learning,machine)

机器学习与深度学习间关系与区别 ℒℴѵℯ心·动ꦿ໊ོ꫞ 人工智能学习深度学习 python
一、机器学习概述定义机器学习（MachineLearning,ML）是一种通过数据驱动的方法，利用统计学和计算算法来训练模型，使计算机能够从数据中学习并自动进行预测或决策。机器学习通过分析大量数据样本，识别其中的模式和规律，从而对新的数据进行判断。其核心在于通过训练过程，让模型不断优化和提升其预测准确性。主要类型1.监督学习（SupervisedLearning）监督学习是指在训练数据集中包含输入
简单了解 JVM 记得开心一点啊 jvm
目录♫什么是JVM♫JVM的运行流程♫JVM运行时数据区♪虚拟机栈♪本地方法栈♪堆♪程序计数器♪方法区/元数据区♫类加载的过程♫双亲委派模型♫垃圾回收机制♫什么是JVMJVM是JavaVirtualMachine的简称，意为Java虚拟机。虚拟机是指通过软件模拟的具有完整硬件功能的、运行在一个完全隔离的环境中的完整计算机系统（如：JVM、VMwave、VirtualBox）。JVM和其他两个虚拟机
JVM、JRE和 JDK：理解Java开发的三大核心组件 Y雨何时停T Java java
Java是一门跨平台的编程语言，它的成功离不开背后强大的运行环境与开发工具的支持。在Java的生态中，JVM（Java虚拟机）、JRE（Java运行时环境）和JDK（Java开发工具包）是三个至关重要的核心组件。本文将探讨JVM、JDK和JRE的区别，帮助你更好地理解Java的运行机制。1.JVM：Java虚拟机（JavaVirtualMachine）什么是JVM？JVM，即Java虚拟机，是Ja
深度 Qlearning：在直播推荐系统中的应用 AGI通用人工智能之禅程序员提升自我硅基计算碳基计算认知计算生物计算深度学习神经网络大数据 AIGC AGI LLM Java Python 架构设计 Agent 程序员实现财富自由
深度Q-learning：在直播推荐系统中的应用关键词：深度Q-learning,强化学习,直播推荐系统,个性化推荐1.背景介绍1.1问题的由来随着互联网技术的飞速发展,直播平台如雨后春笋般涌现。面对海量的直播内容,用户很难快速找到自己感兴趣的内容。因此,个性化推荐系统在直播平台中扮演着越来越重要的角色。1.2研究现状目前,主流的个性化推荐算法包括协同过滤、基于内容的推荐等。这些方法在一定程度上缓
管理员权限的软件不能开机自启动的解决方法 ss_ctrl
这是几种解决方法：1.将启动参数写入到32位注册表里面去在64位系统下我们64位的程序访问此HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Run注册表路径，是可以正确访问的，32位程序访问此注册表路径时，默认会被系统自动映射到HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft
golang学习笔记--MPG模型 xxzed golang #学习笔记学习笔记 golang
MPG模式：M（Machine）：操作系统的主线程P（Processor）：协程执行需要的资源（上下文context），可以看作一个局部的调度器，使go代码在一个线程上跑，他是实现从N：1到N：M映射的关键G（Goroutine）：协程，有自己的栈。包含指令指针（instructionpointer）和其它信息（正在等待的channel等等），用于调度。一个P下面可以有多个G1、当前程序有三个M,
【开发环境搭建】Macbook M1搭建Java开发环境 weixin_44329069 java 开发语言
JDK安装与配置下载并安装JDK：ARM64DMG安装包下载链接：JDK21forMac(ARM64)。双击下载的DMG文件，按照提示安装JDK。配置环境变量：打开终端，使用vim编辑.bash_profile文件：vim~/.bash_profile在文件中添加以下内容来设置JAVA_HOME：exportJAVA_HOME=/Library/Java/JavaVirtualMachines/j
云服务业界动态简报-20180128 Captain7
一、青云青云QingCloud推出深度学习平台DeepLearningonQingCloud，包含了主流的深度学习框架及数据科学工具包，通过QingCloudAppCenter一键部署交付，可以让算法工程师和数据科学家快速构建深度学习开发环境，将更多的精力放在模型和算法调优。二、腾讯云1.腾讯云正式发布腾讯专有云TCE(TencentCloudEnterprise)矩阵，涵盖企业版、大数据版、AI
机器学习VS深度学习 nfgo 机器学习
机器学习（MachineLearning,ML）和深度学习（DeepLearning,DL）是人工智能（AI）的两个子领域，它们有许多相似之处，但在技术实现和应用范围上也有显著区别。下面从几个方面对两者进行区分：1.概念层面机器学习：是让计算机通过算法从数据中自动学习和改进的技术。它依赖于手动设计的特征和数学模型来进行学习，常用的模型有决策树、支持向量机、线性回归等。深度学习：是机器学习的一个子领
ResNet的半监督和半弱监督模型 Valar_Morghulis
Billion-scalesemi-supervisedlearningforimageclassificationhttps://arxiv.org/pdf/1905.00546.pdfhttps://github.com/facebookresearch/semi-supervised-ImageNet1K-models/权重在timm中也有：https://hub.fastgit.org/r
联邦学习 Federated learning Google I/O‘19 笔记努力搬砖的星期五笔记联邦学习机器学习机器学习 tensorflow
FederatedLearning:MachineLearningonDecentralizeddatahttps://www.youtube.com/watch?v=89BGjQYA0uE文章目录FederatedLearning:MachineLearningonDecentralizeddata1.DecentralizeddataEdgedevicesGboard:mobilekeyboa
PCL 怎样可视化深度图像 LeonDL168 PCL 计算机视觉人工智能视觉检测图像处理算法
本小节讲解如何可视化深度图像的两种方法，在3D视窗中以点云形式进行可视化（深度图像来源于点云），另一种是，将深度值映射为颜色，从而以彩色图像方式可视化深度图像。代码首先，在PCL（PointCloudLearning）中国协助发行的书提供光盘的第7章例2文件夹中，打开名为range_image_visualization.cpp的代码文件，同文件夹下可以找到相关的测试点云文件room_scan1.
FISCO BCOS（十七）——— go SDK的使用林中有神君 #FISCO BCOS 2.8.0 golang 服务器 linux fisco bcos 区块链
1、创建一个工作目录root@wyg-virtual-machine:~/fisco#mkdirgoWorkSpace2、下载go-sdkroot@wyg-virtual-machine:~/fisco/
Git报错（一）fatal: Could not read from remote repository. librarycode
解决方案来自CSDN：https://blog.csdn.net/cxwtsh123/article/details/79194263?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-3.control&dist_request_id=&depth_1-utm_source=distr
VOC数据集转换为CoCo数据集（亲测有效）情书学长人工智能学习笔记图像处理
#VOC数据集格式VOC格式的数据集分为3部分，Annotations、ImageSets、JPEGImages。（一）Annotations：存放数据标注的xml文件，格式如下：CUMID_train0001.pngC:\Users\86182\Desktop\CUMID_train\0001.pngUnknown2040136830MachineUnspecified0011933491451
【Vesta发号器源码】PropertyMachineIdsProvider DeanChangDM
Vesta发号器源码解析——PropertyMachineIdsProvider属性配置文件持有Id的模式,没啥东西，比单个的多了一个获取下一个的方法封装实现上略有一点点区别privatelong[]machineIds;privateintcurrentIndex;publiclonggetNextMachineId(){returngetMachineId();}publiclonggetMa
Awesome TensorFlow weixin_30594001 人工智能移动开发大数据
AwesomeTensorFlowAcuratedlistofawesomeTensorFlowexperiments,libraries,andprojects.Inspiredbyawesome-machine-learning.WhatisTensorFlow?TensorFlowisanopensourcesoftwarelibraryfornumericalcomputationusin
【ShuQiHere】探索人工智能核心：机器学习的奥秘 ShuQiHere 人工智能机器学习
【ShuQiHere】什么是机器学习？机器学习（MachineLearning,ML）是人工智能（ArtificialIntelligence,AI）中最关键的组成部分之一。它使得计算机不仅能够处理数据，还能从数据中学习，从而做出预测和决策。无论是语音识别、自动驾驶还是推荐系统，背后都依赖于机器学习模型。机器学习与传统的编程不同，它不再依赖于人类编写的固定规则，而是通过数据自我改进模型，从而更灵活
综述论文“A Survey of Zero-Shot Learning: Settings, Methods, and Applications” 硅谷秋水机器学习机器学习神经网络深度学习
该零样本学习综述，发表于ACMTrans.Intell.Syst.Technol.10,2,Article13(January2019)摘要：大多数机器学习方法着重于对已经在训练中看到其类别的实例进行分类。实际上，许多应用程序需要对实例进行分类，而这些实例的类以前没有见过。零样本学习（Zero-ShotLearning）是一种强大而有前途的学习范例，其中训练实例涵盖的类别与想分类的类别是不相交的。
go-etcd实战小书go golang 实战演练 golang etcd 服务发现服务注册微服务
etcd简介etcdisastronglyconsistent,distributedkey-valuestorethatprovidesareliablewaytostoredatathatneedstobeaccessedbyadistributedsystemorclusterofmachines.Itgracefullyhandlesleaderelectionsduringnetwork
梯度提升机 (Gradient Boosting Machines, GBM) ALGORITHM LOL boosting 集成学习机器学习
梯度提升机(GradientBoostingMachines,GBM)通俗易懂算法梯度提升机（GradientBoostingMachines，GBM）是一种集成学习算法，主要用于回归和分类问题。GBM本质上是通过训练一系列简单的模型（通常是决策树），然后将这些模型组合起来，从而提高整体预测性能。基本步骤初始模型：首先，我们用一个简单的模型（如一个常数值）作为预测模型，记为F0(x)F_0(x)F
机器学习 VS 表示学习 VS 深度学习 Efred.D 人工智能机器学习深度学习人工智能
文章目录前言一、机器学习是什么?二、表示学习三、深度学习总结前言本文主要阐述机器学习,表示学习和深度学习的原理和区别.一、机器学习是什么?机器学习(machinelearning),是从有限的数据集中学习到一定的规律,再把学到的规律应用到一些相似的样本集中做预测.机器学习的历史可以追溯到20世纪40年代McCulloch提出的人工神经元网络,目前学界大致把机器学习分为传统机器学习和机器学习两个类别
端到端的自动驾驶论文与代码整理大别山伧父自动驾驶
LearningbyCheatinggithubcodearxivpaperconferenceonrobotlearning最新进展(May2021)Checkoutourlatestfollow-upwork:WorldonRails(2020)Checkoutoursubmissiontothe2020CARLAChallenge!pass
JVM 架构 : 运行时数据区 & 内存结构光剑书架上的书
JVM:JavaVirtualMachine架构JVMArchitectureRuntimeDataArea/MemoryStructureClassloaderClassloaderisasubsysteminJVM,whichisprimarilyresponasibleforloadingthejavaclasses,thereare3differentclassloaders:Bootst
Lt-8 Multithreading yanlingyun0210 java
IntendedLearningOutcomesTounderstandtheconceptofconcurrency.Tounderstandthedifferenceofaprocessandathread.TodefineathreadusingtheThreadclassandRunnableinterface.TocontrolthreadswithvariousThreadmethod
如何使用Pytorch-Metric-Learning？鱼儿也有烦恼 PyTorch pytorch
文章目录如何使用Pytorch-Metric-Learning？1.Pytorch-Metric-Learning库9个模块的功能1.1Sampler模块1.2Miner模块1.3Loss模块1.4Reducer模块1.5Distance模块1.6Regularizer模块1.7Trainer模块1.8Tester模块1.9Utils模块2.如何使用PyTorchMetricLearning库中的
risc-v特权模式狮子座硅农（Leo ICer） risc-v
risc-v架构定义了3种工作模式，又称为特权模式（privilegedmode）。机器模式（machinemode），简称M模式；监督模式（supervisormode），简称S模式；用户模式（usermode），简称U模式。risc-v架构定义机器模式为必选模式，另外两种模式为可选模式，通过不同的模式组合可以实现不同的系统。risc-v架构支持几种不同的存储器地址管理机制，包括对物理地址和虚拟
推荐开源项目：PyTorch-Metric-Learning 潘惟妍
推荐开源项目：PyTorch-Metric-Learningpytorch-metric-learningTheeasiestwaytousedeepmetriclearninginyourapplication.Modular,flexible,andextensible.WritteninPyTorch.项目地址:https://gitcode.com/gh_mirrors/py/pytorc
推荐：FastAPI驱动的稳定扩散LLMs演示项目褚知茉Jade
推荐：FastAPI驱动的稳定扩散LLMs演示项目FastAPI-for-Machine-Learning-Live-DemoThisrepositorycontainsthefilestobuildyourveryownAIimagegenerationwebapplication!OutlinedarethecorecomponentsoftheFastAPIwebframework,anda
【python】【Ray的概述】资源存储库 python 开发语言
Overview概述Rayisanopen-sourceunifiedframeworkforscalingAIandPythonapplicationslikemachinelearning.Itprovidesthecomputelayerforparallelprocessingsothatyoudon’tneedtobeadistributedsystemsexpert.Rayminimi
rust的指针作为函数返回值是直接传递，还是先销毁后创建？ wudixiaotie 返回值
这是我自己想到的问题，结果去知呼提问，还没等别人回答，我自己就想到方法实验了。。 fn main() { let mut a = 34; println!("a's addr:{:p}", &a); let p = &mut a; println!("p's addr:{:p}", &a
java编程思想 -- 数据的初始化百合不是茶 java 数据的初始化
1.使用构造器确保数据初始化 /* *在ReckInitDemo类中创建Reck的对象 */ public class ReckInitDemo { public static void main(String[] args) { //创建Reck对象 new Reck(); } }
[航天与宇宙]为什么发射和回收航天器有档期 comsci
地球的大气层中有一个时空屏蔽层,这个层次会不定时的出现,如果该时空屏蔽层出现,那么将导致外层空间进入的任何物体被摧毁,而从地面发射到太空的飞船也将被摧毁... 所以,航天发射和飞船回收都需要等待这个时空屏蔽层消失之后,再进行 &
linux下批量替换文件内容商人shang linux 替换
1、网络上现成的资料　　格式: sed -i "s/查找字段/替换字段/g" `grep 查找字段 -rl 路径` 　　linux sed 批量替换多个文件中的字符串　　sed -i "s/oldstring/newstring/g" `grep oldstring -rl yourdir` 　　例如：替换/home下所有文件中的www.admi
网页在线天气预报 oloz 天气预报
网页在线调用天气预报 <%@ page language="java" contentType="text/html; charset=utf-8" pageEncoding="utf-8"%> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transit
SpringMVC和Struts2比较杨白白 springMVC
1. 入口 spring mvc的入口是servlet，而struts2是filter（这里要指出，filter和servlet是不同的。以前认为filter是servlet的一种特殊），这样就导致了二者的机制不同，这里就牵涉到servlet和filter的区别了。参见：http://blog.csdn.net/zs15932616453/article/details/8832343 2
refuse copy, lazy girl! 小桔子 copy
妹妹坐船头啊啊啊啊！都打算一点点琢磨呢。文字编辑也写了基本功能了。。今天查资料，结果查到了人家写得完完整整的。我清楚的认识到： 1.那是我自己觉得写不出的高度 2.如果直接拿来用，很快就能解决问题 3.然后就是抄咩~~ 4.肿么可以这样子，都不想写了今儿个，留着作参考吧！拒绝大抄特抄，慢慢一点点写！
apache与php整合 aichenglong php apache web
一 apache web服务器 1 apeche web服务器的安装 1)下载Apache web服务器 2)配置域名(如果需要使用要在DNS上注册) 3)测试安装访问http://localhost/验证是否安装成功 2 apache管理 1)service.msc进行图形化管理 2)命令管理，配
Maven常用内置变量 AILIKES maven
Built-in properties ${basedir} represents the directory containing pom.xml ${version} equivalent to ${project.version} (deprecated: ${pom.version}) Pom/Project properties Al
java的类和对象百合不是茶 JAVA面向对象类对象
java中的类： java是面向对象的语言，解决问题的核心就是将问题看成是一个类，使用类来解决 java使用 class 类名来创建类，在Java中类名要求和构造方法，Java的文件名是一样的创建一个A类： class A{ } java中的类：将某两个事物有联系的属性包装在一个类中，再通
JS控制页面输入框为只读 bijian1013 JavaScript
在WEB应用开发当中，增、删除、改、查功能必不可少，为了减少以后维护的工作量，我们一般都只做一份页面，通过传入的参数控制其是新增、修改或者查看。而修改时需将待修改的信息从后台取到并显示出来，实际上就是查看的过程，唯一的区别是修改时，页面上所有的信息能修改，而查看页面上的信息不能修改。因此完全可以将其合并，但通过前端JS将查看页面的所有信息控制为只读，在信息量非常大时，就比较麻烦。
AngularJS与服务器交互 bijian1013 JavaScript AngularJS $http
对于AJAX应用（使用XMLHttpRequests）来说，向服务器发起请求的传统方式是：获取一个XMLHttpRequest对象的引用、发起请求、读取响应、检查状态码，最后处理服务端的响应。整个过程示例如下： var xmlhttp = new XMLHttpRequest(); xmlhttp.onreadystatechange
[Maven学习笔记八]Maven常用插件应用 bit1129 maven
常用插件及其用法位于：http://maven.apache.org/plugins/ 1. Jetty server plugin 2. Dependency copy plugin 3. Surefire Test plugin 4. Uber jar plugin 1. Jetty Pl
【Hive六】Hive用户自定义函数(UDF) bit1129 自定义函数
1. 什么是Hive UDF Hive是基于Hadoop中的MapReduce，提供HQL查询的数据仓库。Hive是一个很开放的系统，很多内容都支持用户定制，包括：文件格式：Text File，Sequence File 内存中的数据格式： Java Integer/String, Hadoop IntWritable/Text 用户提供的 map/reduce 脚本：不管什么
杀掉nginx进程后丢失nginx.pid，如何重新启动nginx ronin47 nginx 重启 pid丢失
nginx进程被意外关闭，使用nginx -s reload重启时报如下错误：nginx: [error] open() “/var/run/nginx.pid” failed (2: No such file or directory)这是因为nginx进程被杀死后pid丢失了，下一次再开启nginx -s reload时无法启动解决办法：nginx -s reload 只是用来告诉运行中的ng
UI设计中我们为什么需要设计动效 brotherlamp UI ui教程 ui视频 ui资料 ui自学
随着国际大品牌苹果和谷歌的引领，最近越来越多的国内公司开始关注动效设计了，越来越多的团队已经意识到动效在产品用户体验中的重要性了，更多的UI设计师们也开始投身动效设计领域。但是说到底，我们到底为什么需要动效设计？或者说我们到底需要什么样的动效？做动效设计也有段时间了，于是尝试用一些案例，从产品本身出发来说说我所思考的动效设计。一、加强体验舒适度嗯，就是让用户更加爽更加爽的用你的产品。
Spring中JdbcDaoSupport的DataSource注入问题 bylijinnan java spring
参考以下两篇文章： http://www.mkyong.com/spring/spring-jdbctemplate-jdbcdaosupport-examples/ http://stackoverflow.com/questions/4762229/spring-ldap-invoking-setter-methods-in-beans-configuration Sprin
数据库连接池的工作原理 chicony 数据库连接池
随着信息技术的高速发展与广泛应用，数据库技术在信息技术领域中的位置越来越重要，尤其是网络应用和电子商务的迅速发展，都需要数据库技术支持动态Web站点的运行，而传统的开发模式是：首先在主程序（如Servlet、Beans）中建立数据库连接；然后进行SQL操作，对数据库中的对象进行查询、修改和删除等操作；最后断开数据库连接。使用这种开发模式，对
java 关键字 CrazyMizzz java
关键字是事先定义的，有特别意义的标识符，有时又叫保留字。对于保留字，用户只能按照系统规定的方式使用，不能自行定义。 Java中的关键字按功能主要可以分为以下几类：（1）访问修饰符 public,private,protected p
Hive中的排序语法 daizj 排序 hive order by DISTRIBUTE BY sort by
Hive中的排序语法 2014.06.22 ORDER BY hive中的ORDER BY语句和关系数据库中的sql语法相似。他会对查询结果做全局排序，这意味着所有的数据会传送到一个Reduce任务上，这样会导致在大数量的情况下，花费大量时间。与数据库中 ORDER BY 的区别在于在hive.mapred.mode = strict模式下，必须指定 limit 否则执行会报错。
单态设计模式 dcj3sjt126com 设计模式
单例模式（Singleton）用于为一个类生成一个唯一的对象。最常用的地方是数据库连接。使用单例模式生成一个对象后，该对象可以被其它众多对象所使用。 <?phpclass Example{ // 保存类实例在此属性中 private static&
svn locked dcj3sjt126com Lock
post-commit hook failed (exit code 1) with output: svn: E155004: Working copy 'D:\xx\xxx' locked svn: E200031: sqlite: attempt to write a readonly database svn: E200031: sqlite: attempt to write a
ARM寄存器学习 e200702084 数据结构 C++c C#F#
无论是学习哪一种处理器，首先需要明确的就是这种处理器的寄存器以及工作模式。 ARM有37个寄存器，其中31个通用寄存器，6个状态寄存器。 1、不分组寄存器（R0-R7）不分组也就是说说，在所有的处理器模式下指的都时同一物理寄存器。在异常中断造成处理器模式切换时，由于不同的处理器模式使用一个名字相同的物理寄存器，就是
常用编码资料 gengzg 编码
List<UserInfo> list=GetUserS.GetUserList(11); String json=JSON.toJSONString(list); HashMap<Object,Object> hs=new HashMap<Object, Object>(); for(int i=0;i<10;i++) {
进程 vs. 线程 hongtoushizi 线程 linux 进程
我们介绍了多进程和多线程，这是实现多任务最常用的两种方式。现在，我们来讨论一下这两种方式的优缺点。首先，要实现多任务，通常我们会设计Master-Worker模式，Master负责分配任务，Worker负责执行任务，因此，多任务环境下，通常是一个Master，多个Worker。如果用多进程实现Master-Worker，主进程就是Master，其他进程就是Worker。如果用多线程实现
Linux定时Job：crontab -e 与 /etc/crontab 的区别 Josh_Persistence linux crontab
一、linux中的crotab中的指定的时间只有5个部分：* * * * * 分别表示：分钟，小时，日，月，星期，具体说来：第一段代表分钟 0—59 第二段代表小时 0—23 第三段代表日期 1—31 第四段代表月份 1—12 第五段代表星期几，0代表星期日 0—6 如： */1 * * * * 每分钟执行一次。 *
KMP算法详解 hm4123660 数据结构 C++算法字符串 KMP
字符串模式匹配我们相信大家都有遇过，然而我们也习惯用简单匹配法（即Brute-Force算法)，其基本思路就是一个个逐一对比下去，这也是我们大家熟知的方法，然而这种算法的效率并不高，但利于理解。假设主串s="ababcabcacbab",模式串为t="
枚举类型的单例模式 zhb8015 单例模式
E.编写一个包含单个元素的枚举类型[极推荐]。代码如下： public enum MaYun {himself; //定义一个枚举的元素，就代表MaYun的一个实例private String anotherField;MaYun() {//MaYun诞生要做的事情//这个方法也可以去掉。将构造时候需要做的事情放在instance赋值的时候：/** himself = MaYun() {*
Kafka+Storm+HDFS ssydxa219 storm
cd /myhome/usr/stormbin/storm nimbus &bin/storm supervisor &bin/storm ui &Kafka+Storm+HDFS整合实践kafka_2.9.2-0.8.1.1.tgzapache-storm-0.9.2-incubating.tar.gzKafka安装配置我们使用3台机器搭建Kafk
Java获取本地服务器的IP 中华好儿孙 java Web 获取服务器ip地址
System.out.println("getRequestURL:"+request.getRequestURL()); System.out.println("getLocalAddr:"+request.getLocalAddr()); System.out.println("getLocalPort:&quo

Practical Machine Learning实用机器学习 章1

An important point

Garbage in = Garbage out

特征关系重大Features matter!

算法并没有那么重要Algorithms matter less than you'd think

Know the benchmarks

Study design

Avoid small sample sizes

筛选试验 Screening tests

连续型数据度量 For continuous data

你可能感兴趣的:(learning,machine)

Practical Machine Learning实用机器学习章1