机器学习导论�
Say you are practising basketball on your own and you are trying to shoot the ball into the hoop. If you fail at the first try, your first instinct would most probably be to move forward or backwards, maybe jump higher or go lower, or even stretch your hands properly. Thing is, whatever you do, you are trying to get that ball into the basket. If it does not work, you keep trying new tactics to eventually reach your goal. This is the concept of machine learning.
假设您自己练习篮球,并且试图将球射入篮筐。 如果您第一次尝试失败,那么您的第一个直觉很可能是向前或向后移动,可能会跳得更高或更低,甚至正确地伸手。 事情是,无论您做什么,您都在努力将那个球放进篮筐。 如果它不起作用,您将继续尝试新的策略以最终实现目标。 这就是机器学习的概念。
Machine learning is an application of artificial intelligence that provides systems with the ability to automatically learn and improve from experience without being explicitly programmed. It focuses on the development of computer programs that can access data and use statistical analysis to predict an output while updating outputs as new data becomes available(ie learn).
机器学习是人工智能的一种应用,它使系统能够自动学习并从经验中进行改进,而无需进行明确的编程。 它着重于计算机程序的开发,该程序可以访问数据并使用统计分析来预测输出,同时随着新数据的获得(即学习)更新输出。
“A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.”- Tom Mitchel
“如果计算机的程序在T上的性能(由P来衡量)随着经验E的提高而提高,那么据说计算机程序就可以从经验E中学习一些任务T和一些性能指标P。”
机器学习分类 (Classification of Machine Learning)
There are various categories of machine learning. They are:
机器学习有各种类别。 他们是:
- Supervised Learning 监督学习
- Unsupervised Learning 无监督学习
- Reinforcement Learning 强化学习
Supervised Learning: Here, the system has been supplied with previously labelled data so it can apply what has been learned from those labelled examples to new data to predict future events. It is like someone trying to memorize new facts while comparing it to a note. This learning algorithm can compare its output with the correct, intended output and find errors in order to modify the model accordingly. A typical example would be email classification as spam, where you already have some emails that have been labelled “spam”, and you classify new emails as spam or not depending on whether they have the same qualities as the spam mails. Regression is another type of supervised learning.
监督学习:这里已向系统提供了以前标记的数据,因此可以将从那些标记的示例中学到的信息应用于新数据以预测未来事件。 就像有人试图在将新事实与笔记进行比较时记住新事实一样。 该学习算法可以将其输出与正确的预期输出进行比较,并发现错误,以便相应地修改模型。 一个典型的示例是将电子邮件分类为垃圾邮件,其中您已经有一些已标记为“垃圾邮件”的电子邮件,并且根据新邮件是否具有与垃圾邮件相同的质量,将新邮件分类为垃圾邮件。 回归是另一种监督学习。
Unsupervised Learning: Here, the system is presented with unlabeled, uncategorized data leaving to the algorithm to determine the data patterns on its own. The system doesn’t figure out the right output, but it explores the data and can draw inferences from datasets to describe hidden structures from unlabeled data. Recommendation systems usually seen on the web in that does marketing automation are based on this type of learning. Clustering and association are types of unsupervised learning.
无监督学习:在这里,系统将显示未标记,未分类的数据,并留给算法自行确定数据模式。 该系统无法找出正确的输出,但可以浏览数据并可以从数据集中得出推论,以描述未标记数据中的隐藏结构。 在网络上通常可以看到的推荐系统可以进行营销自动化,它是基于这种学习类型的。 聚类和关联是无监督学习的类型。
Reinforcement Learning: Here, you present the system with examples that lack labels as in unsupervised learning, but this time around, you accompany an example with positive or negative feedback (a reward system) according to the solution the algorithm proposes. It is a type of dynamic programming that trains algorithms using a system of reward and punishment. This method allows the algorithm or agent to automatically determine the ideal behaviour within a specific context in order to maximize its performance. The learning algorithm, or agent, learns by interacting with its environment and is typically seen when computers learn to play games, outperform human players, and even optimize its score.
强化学习:在这里,您向系统展示的示例缺少无监督学习中的标签,但是这次,根据算法提出的解决方案,您将为示例提供正面或负面的反馈(奖励系统)。 这是一种动态编程,它使用奖励和惩罚系统来训练算法。 此方法允许算法或代理自动确定特定上下文内的理想行为,以使其性能最大化。 学习算法或代理是通过与环境互动来学习的,通常在计算机学习玩游戏,超越人类玩家甚至优化其分数时才能看到。
选择正确的机器学习问题 (Choosing the Right Machine Learning Problem)
You have collected a bunch of data and want to use machine learning techniques to analyse this data, how do you choose the right machine learning problem for your use case? The problem categories we will cover in this article are:
您已经收集了很多数据,并希望使用机器学习技术来分析这些数据,如何为您的用例选择正确的机器学习问题? 我们将在本文中介绍的问题类别是:
- Classification 分类
- Regression 回归
- Clustering 聚类
- Dimensionality reduction 降维
Classification: When you need to classify your input data into categories or classes, it turns out that predicting categories is a very common use case and these categories could be virtually anything. Like I mentioned in the email example above, is this email “spam” or “not spam”? Should you send it to the “inbox” or “spam” folder? As a financial trader constantly monitoring stock markets, given past information on the market, company performance, stock performance, should you “buy”, “sell” or “hold”? Or say you are working with image data and want to do object recognition, is this a “cat”, “mouse” or “dog”. The list is endless, but we can see that the output of a classification model is one category or class.
分类:当您需要将输入数据分类为类别或类别时,事实证明预测类别是一个非常普遍的用例,而这些类别实际上可以是任何东西。 就像我在上面的电子邮件示例中提到的那样,此电子邮件是“垃圾邮件”还是“非垃圾邮件”? 您应该将其发送到“收件箱”或“垃圾邮件”文件夹吗? 作为一名金融交易员,不断监控股票市场,鉴于过去的市场信息,公司业绩,股票表现,您应该“买”,“卖”还是“持有”? 或者说您正在使用图像数据并且想要进行对象识别,这是“猫”,“鼠标”还是“狗”。 列表是无止境的,但是我们可以看到分类模型的输出是一个类别或类。
Regression: When you want your model to predict continuous numeric values, you would want to use a regression model. As a financial trader, given current market sentiments, previous earnings of the company and you need to predict the price of the stock tomorrow, then a regression model is your guy. You might be analysing the performance of different cars available given the attributes of a car and you want to predict its mileage or even trying to predict the price of a house considering the location and other conditions of the house. Once you are able to observe the nature of the problem, it is easier to know what to use.
回归 :当您希望模型预测连续的数值时,您将要使用回归模型。 作为金融交易员,考虑到当前的市场情绪,公司的先前收益以及您需要预测明天的股票价格,那么回归模型就是您的理想选择。 给定汽车的属性,您可能正在分析可用的不同汽车的性能,并且您想要预测其行驶里程,甚至考虑房屋的位置和其他条件来尝试预测房屋的价格。 一旦您能够观察到问题的本质,就更容易知道使用什么。
Clustering: When you have a really large dataset with no idea of what is in it, to make some sense of it, you may want to try clustering. In social media ads targeting, finding users that are interested in a particular field so you can target specific ads to them is an application of clustering. Another one is document discovery, you could gather all documents related to armed robbery and see if you can find patterns in the cases. Clustering just allows you to self discover patterns in fine details.
聚类:如果您有一个非常大的数据集,却不知道其中的内容,那么从某种意义上讲,您可能想尝试聚类。 在社交媒体广告定位中,找到对特定字段感兴趣的用户,以便您可以将特定广告定位到他们,这是集群的一种应用。 另一个是文件发现,您可以收集与武装抢劫有关的所有文件,看看是否可以找到案件中的模式。 聚类仅允许您自行发现详细的模式。
Dimensionality Reduction: This is a preprocessing technique used to perform feature detection on your data. Let’s say you have 500 different variables, which of them are most significant? What features do you pay more attention to? This is where dimensionality reduction comes to play. It is used to preprocess your data to build more robust machine learning models with better performance whether they are classification, regression or any other kind. Dimensionality reduction helps us find latent factors when we have large data and no target values.
降维 :这是一种预处理技术,用于对数据执行特征检测。 假设您有500个不同的变量,其中哪个变量最重要? 您需要注意哪些功能? 这就是降维的作用所在。 无论是分类,回归还是任何其他类型的数据,它都可以用于预处理数据以构建更强大的,性能更好的机器学习模型。 当我们拥有大数据且没有目标值时,降维可帮助我们找到潜在因素。
结论 (Conclusion)
Machine Learning comes into the picture when problems cannot be solved by means of typical approaches. It enables the analysis of large data delivers faster, more accurate results in order to identify profitable opportunities or dangerous risks.
当无法通过典型方法解决问题时,机器学习就会成为现实。 它可以对大数据进行分析,从而提供更快,更准确的结果,从而确定可获利的机会或危险的风险。
This article is intended to just give an introduction to the concept of Machine learning. There is a lot more to learn and it can be done by wanting to learn, creating time and finding the right resources online. I hope I have been able to make you want to learn more/
本文旨在仅介绍机器学习的概念。 还有很多东西要学习,可以通过学习,创造时间并在线找到合适的资源来完成。 希望我能够使您想了解更多/
翻译自: https://medium.com/@amarachi.anyim00/an-introduction-to-machine-learning-493d16017d9b
机器学习导论�