机器学习非参数学习
If you ever tried to read articles about machine learning on the internet, most likely you came across two types of them—cryptic algebraic texts entrusted to a secret society of statisticians and data scientists. And fluffy fairytales of how robots will take over the world by 2025.
如果您曾经尝试阅读有关Internet上机器学习的文章,则很可能会遇到两种类型的文章-委托统计学家和数据科学家秘密社团使用的隐式代数文本。 到2025年,机器人将如何占领世界的蓬松童话。
Machine learning has been an invisible part of our lives for many years now. In fact, it dates back to the 1950s. Today, many businesses rely on this technology. If you use the internet, you have somewhat helped companies like Facebook, Google, Netflix, and Amazon train their machine learning models.
多年来,机器学习已成为我们生活中无形的一部分。 实际上,它可以追溯到 1950年代。 如今,许多企业都依赖此技术。 如果使用互联网,则对Facebook , Google , Netflix和Amazon等公司的机器学习模型有所帮助。
Since this technology is so widely adopted and promises to solve some of our world's most complex problems, now would be a good time as any to catch up on the state of things regarding this emerging phenomenon.
由于这项技术已被广泛采用,并有望解决我们世界上一些最复杂的问题,因此,现在是任何时候赶上有关这种新兴现象的现状的好时机。
To level with you, I am still learning about how machines learn. I began this journey three years ago and had to sit through a classroom, read hundreds of articles, and attend countless webinars, which could have been an email. So, I’ve decided to write this post I’ve been wishing existed for a long time.
与您一起学习,我仍在学习机器的学习方式。 我三年前开始了这一旅程,不得不坐在教室里,阅读数百篇文章,并参加了无数的网络研讨会,这本可以是一封电子邮件。 因此,我决定写这篇我一直希望存在很长时间的文章。
This post will be your 15 minutes crash course on everything machine learning, from theories to actual real-world applications. It is guaranteed to equip you with the necessary knowledge to start a conversation with an ML engineer if you ever have to sit next to one on a plane.
这篇文章将是您15分钟的速成课程,涉及从理论到实际应用的所有机器学习。 如果您不得不坐在飞机上,可以保证您具有与ML工程师开始对话的必要知识。
为什么要教机器? (Why teach the machines?)
Why should we even care to teach machines in the first place? Well, it is because machines are better than humans at finding hidden patterns when there are many variables.
我们为什么还要首先关心机器教学? 嗯,这是因为当变量很多时,机器在发现隐藏模式方面比人类更好。
Okay, multiply 13 by 87 right now in your mind. Did it feel uncomfortable? Now try to multiply 2 by 2. Better?
好吧,现在就将13乘以87。 感到不舒服吗? 现在尝试将2乘以2。更好吗?
The human mind will find it difficult to compute (13 x 87) compared to (2x2). Whereas for computers, it is a problem of similar complexity. If we assign (read punish) a human to solve simple multiplication problems all day long, they would eventually get bored and abandon the task. In comparison, machines are particularly good at it.
与(2x2)相比,人的大脑会发现难以计算(13 x 87)。 而对于计算机来说,这是一个类似复杂性的问题。 如果我们整天指派(阅读惩罚 )人来解决简单的乘法问题,他们最终将感到无聊并放弃了这项任务。 相比之下,机器尤其擅长。
机器可以学到什么? (What can the machines learn?)
Without all the glamour and fluff, machines can be taught to predict the results based on incoming data. We need three components to teach a machine. Data, features, and algorithms.
没有所有的魅力和不足,可以教会机器根据传入的数据预测结果。 我们需要三个组件来教授一台机器。 数据,功能和算法。
And of course, some physical hardware.
当然还有一些物理硬件。
机器如何学习? (How does a machine learn?)
When I was a kid, I learned how to draw this image of a house. Over the years, I learned how to describe a house in Urdu, English, and French languages without drawing one. As I grew older, I saw homes of all shapes and kinds in my neighborhood, country, and the world. I am so good at it now that I can identify a house in a split second without even having to think much.
小时候,我学会了如何绘制房屋的图像。 多年以来,我学会了如何用乌尔都语,英语和法语描述一所房屋,而无需绘制房屋。 随着年龄的增长,我在邻居,国家和世界各地看到各种形状和类型的房屋。 我现在非常擅长,我什至可以不花很多时间就可以在一瞬间找到房子。
Interestingly, this is also how a machine learns to recognize images.
有趣的是,这也是机器学习识别图像的方式。
First, you teach the machine what a house looks like by showing several images of homes from your 1TB external hard-drive (dataset). Next, you help the machine understand the various features—such as doors, windows, and chimneys. Finally, you design an algorithm which teaches the computer to say “that is a house” when it sees one.
首先,通过显示来自1TB外部硬盘驱动器( 数据集)的房屋的几张图像,告诉机器房屋的外观。 接下来,帮助机器了解各种功能 ,例如门,窗户和烟囱。 最后,您设计一种算法 ,教计算机在看到一所房子时说“那是一所房子”。
So does that mean that your programmer-bro is a machine learning expert? Likely not. Let’s take the example of self-driving vehicles. If we were to approach this as a classical computer programming problem, we would need to teach the car every possible movement on this planet's roads. With machine learning, it doesn't need to memorize all the moves but tries to generalize situations and act rationally.
那么这是否意味着您的程序员兄弟是机器学习专家? 可能不会。 让我们以自动驾驶汽车为例。 如果我们将其作为经典的计算机编程问题来解决,那么我们将需要教会汽车在地球上所有可能的运动。 借助机器学习,无需记住所有动作,而是尝试概括情况并采取合理的行动。
This is what makes machine learning such a powerful tool!
这就是使机器学习如此强大的工具的原因!
机器做了些什么? (What have the machines been up to?)
In the past decade, we have seen various machine learning applications ranging from saving human lives to recommending songs to groove to your drive back home.
在过去的十年中,我们看到了各种机器学习应用程序,从挽救生命到推荐歌曲到开车回家,一应俱全。
There are as many different ways to approach a machine learning problem as there are its applications. Experts may disagree on their preference for methods, algorithms, and features. But they all agree on one thing. Getting the right data is the real challenge of any machine learning project.
解决机器学习问题的方法有很多,也有很多应用程序。 专家可能会对他们对方法,算法和功能的偏爱有所不同。 但是他们都同意一件事。 获得正确的数据是任何机器学习项目的真正挑战。
Many companies have found smarter ways to solve this problem. Take Facebook, for example. They have been incentivizing users to tag themselves and their friends in pictures for years now. Over time they have captured plenty of data to train their face-recognition model. Today when you upload a new photo to Facebook, they already know your friend Hasan who just came back from his summer holiday in Hunza.
许多公司已经找到解决此问题的更明智的方法。 以Facebook为例。 多年来,他们一直在激励用户在图片中标记自己和朋友。 随着时间的流逝,他们已经捕获了大量的数据来训练他们的面部识别模型。 今天,当您将新照片上传到Facebook时,他们已经认识了您的朋友Hasan,他刚刚从Hunza的暑假回来了。
Have you binge-watched something on YouTube? The other day I was watching a travel vlog. When I opened the app, I intended to watch a quick video before going to bed. I ended up watching four other videos until 2:00 AM. That is not a coincidence, and it is highly likely that you have also had a similar experience. YouTube uses a machine learning-powered recommender engine that curates your own ‘personal playlist’ based on your viewing history, making the viewing experience ‘addictive.’
您在YouTube上狂饮观看了吗? 前几天,我正在看旅游视频博客。 当我打开该应用程序时,我打算在睡觉前观看快速视频。 我最后看了另外四个视频,直到2:00 AM。 这不是巧合,很可能您也有过类似的经历。 YouTube使用了基于机器学习的推荐器引擎 ,该引擎根据您的观看历史记录来整理您自己的“个人播放列表”,使观看体验具有“吸引力”。
That, right there, is the beauty of technology. When done right, it becomes a transparent part of our lives. So much so that we forget it ever existed.
那就是技术之美。 如果做得对,它将成为我们生活中透明的一部分。 如此之多,以至于我们忘记了它曾经存在过。
AI = ML? (AI = ML?)
You might have noticed me using the terms AI and ML interchangeably. Are these the same? No. Let me explain:
您可能已经注意到我交替使用术语AI和ML。 这些都一样吗? 不,让我解释一下:
Artificial Intelligence (AI) is the name of a whole knowledge field, similar to physics or mathemetics. Any technique which enables a machine to mimic human behavior can be labeled as AI.
人工智能(AI)是整个知识领域的名称,类似于物理学或数学。 任何使机器能够模仿人类行为的技术都可以标记为AI。
Machine Learning (ML) is an important part of artificial intelligence, but not the only one. There are two parts to ML. Teaching the machine and the machine becoming smart over time.
机器学习(ML)是人工智能的重要组成部分,但不是唯一的组成部分。 ML有两个部分。 对机器进行教学,随着时间的推移,机器将变得智能。
Deep learning is an emerging technology used for training and building neural networks. Many people use this term to sound ‘cool.’
深度学习是用于训练和构建神经网络的新兴技术。 许多人用这个术语听起来很“酷”。
机器说什么语言? (What languages do the machines speak?)
There is no such thing as the ‘best language for machine learning.’ It depends on your background and the requirement of the project. Most developers prefer using python over other languages because of easiness and speed of coding.
没有“用于机器学习的最佳语言”这样的东西。 这取决于您的背景和项目要求。 大多数开发人员更喜欢使用python 由于编码的简便性和速度而优于其他语言。
To understand why so, let us try to say ‘hello world’ in three different languages:
为了理解为什么,让我们尝试用三种不同的语言说“ hello world”:
C:
C:
Java:
Java:
Python:
Python:
I will leave it at that and let you decide for yourself.
我会留在那里,让您自己决定。
R is another commonly used language among data-scientists and statisticians. Some call it “statistics on steroids.” However, this amount of steroids would be sufficient to kill a young Hulk.
R是数据科学家和统计学家中另一种常用的语言。 有人称其为“类固醇统计”。 但是,这些类固醇足以杀死一个年轻的绿巨人。
我最喜欢的机器学习方法 (My favorite machine learning methods)
Machine learning has evolved significantly since the 1950s. And in the process, it has branched out. I will share a few essential and commonly used machine learning methods you have most likely used or even helped train. For a comprehensive list, head on over here.
自1950年代以来,机器学习已经有了长足的发展。 在此过程中,它已经扩展了。 我将分享一些您最可能使用或什至帮助培训过的基本且常用的机器学习方法。 有关完整列表,请转到此处 。
1.分类: (1. Classification:)
A few years ago, I decided to change the landscape of my garden. I began reading about different kinds of trees, shrubs, vines, and flowering plants. I was particularly fascinated to learn about the variety of flowering-plants. I wanted to grow orchids and jasmine. While researching, I found a nursery in my neighborhood, which specialized in flowering plants and had jasmine in stock, so I decided to make a trip. At the nursery, I found a new flowering-plant; ‘lavendar’. I added it to my shopping cart and knowledge-base.
几年前,我决定改变花园的景观。 我开始阅读有关各种树木,灌木,藤本植物和开花植物的文章。 我特别着迷于了解各种开花植物。 我想种兰花和茉莉花。 在研究时,我在附近发现了一个苗圃,专门种植开花植物,并养有茉莉花,所以我决定去旅行。 在苗圃里,我发现了一种新的开花植物。 “薰衣草”。 我将其添加到我的购物车和知识库中。
In machine-learning terms — this is classification. The method predicts the class based on incoming data points. Classification is an excellent method to answer “yes-or-no” and “this-or-that” questions.
用机器学习的术语来说就是分类。 该方法根据传入的数据点预测类别。 分类是回答“是或否”和“此或那个”问题的绝佳方法。
Where is it used?
在哪里使用?
Email service providers, such as g-mail, use a simple classification algorithm to mark spam mails.
电子邮件服务提供商(例如g-mail)使用简单的分类算法标记垃圾邮件。
Google uses classification to decide which search results suit your query.
Google使用分类来确定哪些搜索结果适合您的查询。
Spotify uses classification algorithms to create playlists.
Spotify使用分类算法创建播放列表。
Banks use it to determine if an applicant is eligible for a loan or not.
银行使用它来确定申请人是否有资格获得贷款 。
2.回归 (2. Regression)
Regression is a relatively advanced form of classification. It predicts numerical values based on incoming data points instead of just the category.
回归是分类的一种相对高级的形式。 它根据传入的数据点而不只是类别来预测数值。
Where is it used?
在哪里使用?
Stock market analysts use regression methods to predict future performance.
股票市场分析师使用回归方法来预测未来表现。
Companies use regression tools to estimate the sales of a product based on seasonality.
公司使用回归工具根据季节来估计产品的销售 。
Many property websites use regression tools to determine a fair price for the listing based on its features.
许多房地产网站都使用回归工具根据其功能确定上市的公平价格 。
3.聚类 (3. Clustering)
My friend, Sara, owns a famous street food business. She makes the best BBQ in the world. She recently saved enough money to buy herself a new stall. However, she is confused because her last investment was not as profitable as expected. The foot-fall on her stall was way less than she imagined. So where should she invest?
我的朋友萨拉(Sara)拥有一家著名的街头食品店。 她做世界上最好的烧烤。 她最近省下了足够的钱来给自己买一个新摊位。 但是,她感到困惑,因为她的最后一笔投资没有达到预期的利润。 她失速的脚步比她想象的要少。 那么她应该在哪里投资呢?
This is a great problem to solve using clustering methods!
这是使用聚类方法解决的一个大问题!
Clustering algorithms divide data points into groups based on natural similarities.
聚类算法根据自然相似度将数据点分为几组。
Where is it used?
在哪里使用?
Android and Apple use clustering algorithms to create photo albums on your mobile phone.
Android和Apple使用群集算法在您的手机上创建相册 。
Google maps suggest the nearest coffee shop to you based on clustering methods.
Google地图根据聚类方法向您建议最近的咖啡店 。
Amazon and other e-commerce platforms use clustering algorithms to recommend similar products.
亚马逊和其他电子商务平台使用集群算法来推荐类似的产品。
Marketers use clustering methods to segment their target audience.
营销人员使用聚类方法细分目标受众。
4.神经网络 (4. Neural Networks)
Neural networks have been in and out of fashion for over 70 years. But in the past ten years, they’ve been trending at an all-time high, to the extent that a few ‘influencers’ have started throwing around #deeplearning in their posts to increase the reach.
神经网络已经流行了70多年。 但是在过去的十年中,他们一直处于历史最高水平,在某种程度上,一些“影响者”开始在他们的帖子中四处散播#deeplearning以扩大影响范围。
Any neural network is basically a collection of neurons and connections between them. Neural networks can replace all of the above machine learning methods. Many call it the rich man’s silver bullet. But you’d have to be really rich to afford a neural-net for every AI problem. Why? Becasue neural-nets require a lot of computing power.
任何神经网络基本上都是神经元及其之间连接的集合。 神经网络可以替代上述所有机器学习方法。 许多人称之为富人的银弹。 但是您必须真正有钱才能为每个AI问题提供神经网络。 为什么? 因为神经网络需要大量的计算能力。
Where is it used?
在哪里使用?
Governments and agencies around the world use neural nets to manage crowds and identify people of interest.
全世界的政府和机构都使用神经网络来管理人群并识别感兴趣的人。
Siri and Alexa use it to understand your voice command.
Siri和Alexa使用它来理解您的语音命令。
Google uses it to translate your messages from one language to another.
Google使用它将您的消息从一种语言翻译成另一种语言。
Deepfake is another fascinating yet scary application of AI. It fakes video or audio recordings that look and sound just like the real thing. And it fakes it so well that it is tough to tell the difference. Watch for youself:
Deepfake是AI的另一个引人入胜但又令人恐惧的应用。 它会伪造看起来像真实声音的视频或音频记录。 而且它伪造得很好,以至于很难分辨出区别。 注意自己:
最后,多远就是多远? (Finally, how far is too far?)
Machine morality police (or the AI ethics community) is asking this fundamental question:
机器道德警察(或AI道德社区)在问这个基本问题:
How far is too far? And how much further can we go?
多远才算远? 我们还能走多远?
To me, machine learning is a tool that extends human cognition, just like calculators, which enable us to do maths faster, and sticky-notes, which help us to memorize more things.
对我来说,机器学习是一种扩展人类认知的工具,就像计算器一样,它使我们能够更快地进行数学运算,而粘滞 便笺则可以帮助我们记忆更多的事物。
However, this tool is far more potent than any other we have ever had. It may appear a threat to many because ‘intelligence’ has always been the human mind's provenance only. Now machines can learn too and become intelligent.
但是,此工具比我们以前拥有的任何工具都强大得多。 它可能对许多人构成威胁,因为“智能”一直只是人类头脑的起源。 现在,机器也可以学习并变得智能。
But is intelligence all that makes us human? What about trust? Compassion? Courage? Resilience? And Love?
但是,智慧使我们成为人类吗? 信任呢? 同情? 勇气? 弹性? 和爱?
I believe that the question “when will the machines outsmart humans?” is flawed. The question we should be asking is, “what will differentiate us from the machines if not our intelligence?”. Think about it.
我相信这个问题“什么时候机器会比人类聪明?” 有缺陷。 我们应该问的问题是:“如果没有我们的智慧,我们与机器之间的区别是什么?”。 想一想。
谢谢阅读 (Thanks for reading)
If you like my work and want to support me:
如果您喜欢我的工作并想支持我:
The best way to show some love is by following me on Medium here
表达爱意的最好方法是在此处的 “ Medium”上关注我
Join me on LinkedIn here
在这里加入我的LinkedIn
翻译自: https://medium.com/@msaadashfaq/learning-unlearning-and-machine-learning-bcb55962eaac
机器学习非参数学习