自学机器学习
The field of machine learning is becoming more and more mainstream every year. With this growth come many libraries and tools to abstract away some of the most difficult concepts to implement for people starting out.
机器学习领域每年都变得越来越主流。 随着这种增长,许多库和工具可以抽象出一些最难以实现的概念,供初学者使用。
Most people will say you need a higher level degree in ML to work in the industry. If you love working with data and practical math, then I would say this is not true. I did not graduate college with a Machine Learning or data degree yet I am working with ML right now at a startup. I want to share what I used to learn and how I got here in hopes that it will help someone else.
大多数人会说,您需要更高水平的ML才能在行业中工作。 如果您喜欢处理数据和实用数学,那么我会说这是不对的。 我没有获得机器学习或数据学位的大学文凭,但是我现在在一家初创公司正在使用ML。 我想分享我曾经学习过的东西以及我如何来到这里,希望对别人有帮助。
I knew Python already when I started, but, if you don’t, I recommend learning basic and intermediate Python first. The language is pretty easy to learn compared to others. Python is also home to the largest data science/ML community so there are tons of tools to help as you learn.
我刚开始时就已经了解Python,但如果您不了解,建议您先学习基础和中级Python。 与其他语言相比,该语言非常易于学习。 Python也是最大的数据科学/ ML社区的所在地,因此有大量的工具可以帮助您学习。
Learn Python: freeCodeCamp Python Crash Course
学习Python:freeCodeCamp Python速成课程
With that out of the way, the first thing you should do is download “The Machine Learning Podcast” by OCDevel (overcast.fm, iTunes) into your favorite podcast app. Listen to the first 10–15 episodes. They are very good at giving an overview of the machine learning ecosystem and there are also recommended resources which are linked on the OCDevel site.
有了这样的方式,你应该做的第一件事就是下载“机器学习播客”,由OCDevel( overcast.fm , iTunes的 )成自己喜欢的播客应用。 听前10–15集。 他们非常擅长概述机器学习生态系统,并在OCDevel网站上链接了推荐的资源。
Anaconda & Jupyter Notebook — These are a must for ML & data science. Follow the instructions here to install and set them up.
Anaconda和Jupyter笔记本—这是ML和数据科学的必备条件。 请按照此处的说明进行安装和设置。
Visual Studio Code with Python Plugin — I never thought I would be recommending a Microsoft product, but I am honestly impressed with their open source commitment lately. This is now my favorite code editor, even for doing some things in Python — like debugging code.
带有Python插件的 Visual Studio Code-我从没想过我会推荐Microsoft产品,但是最近他们对开放源代码的承诺给我留下了深刻的印象。 现在,这是我最喜欢的代码编辑器,即使是在Python中做某些事情(例如调试代码)。
Kaggle.com is the best place to find datasets when you are starting out. Go ahead and sign up for an account and poke around the site. You will notice that there are many competitions for people of all experience levels and even tutorials to go with them (like this beginner-friendly one about the Titanic). These datasets will be very helpful to practice with while you are learning Python libraries.
刚开始时, Kaggle.com是查找数据集的最佳场所。 继续并注册一个帐户,然后在该站点周围戳一下。 您会注意到,针对所有经验水平的人们都有很多竞赛,甚至还有与之相关的教程(例如关于Titanic的初学者友好竞赛)。 这些数据集对于学习Python库时的练习将非常有帮助。
Next, it’s important to learn the common Python libraries for working with data: Numpy, Matplotlib, Pandas, Scikit-Learn, etc. I recommend starting with this course from datacamp. It goes over some basics which you can skip or use for review and the Numpy section is a good intro.
接下来,重要的是要学习用于处理数据的通用Python库:Numpy,Matplotlib,Pandas,Scikit-Learn等。我建议从datacamp开始此课程 。 它介绍了一些基础知识,您可以跳过这些基础知识或将其用于审阅,并且Numpy部分是不错的介绍。
Pandas is a must learn but also takes a little while to grasp since it does so many things. It’s built on top of Numpy and is used for cleaning, preparing, and analyzing data. It also has built-in tools for things like visualization. I used a lot of resources to learn Pandas and practice with it. Here are a few:
熊猫是一门必须学习的东西,但它需要花一点时间才能掌握,因为它做了很多事情。 它建立在Numpy之上,用于清理,准备和分析数据。 它还具有用于可视化之类的内置工具。 我使用了大量资源来学习熊猫并进行实践。 这里有一些:
Learn Pandas on Kaggle
在Kaggle上学习熊猫
Learn Pandas Video Course | Notebook for Course
学习熊猫视频课程 | 课程笔记本
Jupyter Notebook Extra Examples: Basics | Plotting with Matplotlib & Pandas | And Many More
Jupyter Notebook额外示例: 基础知识 | 用Matplotlib和Pandas进行绘图 | 还有很多
After Pandas comes Scikit-Learn. This is where things start to be applied more to actual machine learning algorithms. Scikit-Learn is a scientific Python library for machine learning.
熊猫来了之后,Scikit-Learn开始了。 在这里,事物开始更多地应用于实际的机器学习算法。 Scikit-Learn是用于机器学习的科学Python库。
The best resource I found for this so far is the book “Hands on Machine Learning with Scikit-Learn and Tensorflow”. I think it does a very good job of teaching you step-by-step with practical examples. The first half is about Scikit-Learn, so I did that part first and then came back to the Tensorflow portion.
到目前为止,我找到的最好的资源是“ 使用Scikit-Learn和Tensorflow进行机器学习动手 ”一书。 我认为通过实际示例逐步教您做得很好。 上半部分是关于Scikit-Learn的,所以我先做那部分,然后再回到Tensorflow部分。
There are many other Python libraries like Keras and PyTorch, but I will get into those later. This is already a lot to learn :)
还有许多其他Python库,例如Keras和PyTorch,但我将在后面介绍。 这已经学到很多了:)
This is the first step into machine learning. Scikit-Learn has shallow learning functions like linear regression built into the library. The Scikit-Learn book that I mention above teaches about many types of common machine learning algorithms and lets you practice with hands on examples.
这是机器学习的第一步。 Scikit-Learn具有浅层学习功能,例如内置于库中的线性回归。 我上面提到的Scikit-Learn书介绍了许多类型的常见机器学习算法,并让您亲身实践了示例。
While that’s good, I still found it useful to also go through Andrew Ng’s Machine Learning course from Stanford. It’s available to be audited for free on Coursera (there is a podcast for this course on iTunes, but it’s a little hard to follow and well over a decade old). The quality of instruction is amazing and it’s one of the most recommended resources online (it’s not the easiest to get through which is why I recommend it down here).
虽然这很好,但我仍然觉得通过斯坦福大学的安德鲁·伍(Andrew Ng)的机器学习课程仍然很有用。 可以在Coursera上免费对其进行审核 (iTunes上有此课程的播客,但是有点难以理解,而且已有十年之久了)。 教学质量令人惊叹,它是在线上最推荐的资源之一(不容易掌握,这就是为什么我在这里推荐它的原因)。
Start going through the Andrew Ng course slowly and don’t get frustrated if you don’t understand something. I had to put it down and pick it up several times. I also took Matlab in college, which is the language he uses in the course, so I didn’t have trouble with that part. But if you want to use Python instead, you can find the examples translated online.
慢慢开始学习吴国栋的课程,如果您不了解某些内容,请不要沮丧。 我不得不放下并捡起它几次。 我还曾在大学期间上过Matlab,这是他在课程中使用的语言,因此我在这方面没有遇到任何麻烦。 但是,如果您想改用Python,则可以找到在线翻译的示例 。
Yes, math is necessary. However, I don’t feel like an intense, math-first approach is best way to learn; it’s intimidating for many people. As OCDevel suggests in his podcast (linked above), spend most of your time learning practical machine learning and maybe 15–20% studying the math.
是的,数学是必须的。 但是,我并不认为强烈的数学优先方法是学习的最佳方法。 这对许多人来说都是令人生畏的。 正如OCDevel在他的播客(上面链接)中所建议的那样,将您的大部分时间花在学习实用的机器学习上,并且可能有15-20%的时间学习数学。
I think the first step here is to learn/brush up on statistics. It can be easier to digest and be both a lot fun and practical. After statistics, you will definitely need to learn a bit of linear algebra and some calculus to really know what’s going on in deep learning. This will take some time, but here are some of the resources that I recommend for this.
我认为这里的第一步是学习/掌握统计数据。 它可以更容易消化,并且既有趣又实用。 经过统计后,您肯定需要学习一些线性代数和一些微积分才能真正了解深度学习的内容。 这将需要一些时间,但是以下是我为此推荐的一些资源。
Statistics Resources:
统计资源:
I think the statistics courses on Udacity are quite good. You can start with this one and then explore the other ones they offer.
我认为有关Udacity的统计课程相当不错。 您可以从这个开始,然后探索他们提供的其他服务。
I loved the book, “Naked Statistics”. It’s full of practical examples and enjoyable to read.
我喜欢这本书“ Naked Statistics ”。 它充满了实用的示例,并且阅读愉快。
It’s also useful to understand Bayesian statistics and how it differs Frequentist and Classical models. This coursera course does a great job explaining these concepts — there is also a part 2 of the course here.
了解贝叶斯统计数据及其与频率模型和古典模型的区别也很有用。 本课程课程在解释这些概念方面做得非常出色-这里还有课程的第二部分 。
Linear Algebra Resources:
线性代数资源:
The book, “Linear Algebra, Step by Step” is excellent. It’s like a high school/college textbook but well written and easy to follow. There are also plenty of exercises for each chapter with answers in the back.
《 线性代数,循序渐进 》一书非常出色。 它就像一本高中/大学的教科书,但是写得很好并且易于阅读。 每章也有很多练习,后面都有答案。
Essence of Linear Algebra video series — The math explanations by 3blue1brown are amazing. I highly recommend his math content.
线性代数视频系列的精髓-3blue1brown的数学解释令人惊叹。 我强烈推荐他的数学内容。
Calculus Resources:
微积分资源:
I had taken a few years of Calculus before, but I still needed to brush up quite a bit. I picked up a used textbook for Calc. 1 at a local bookstore to start. Here are some online resources that helped me as well.
我之前已经学过微积分了几年,但是我仍然需要重温一下。 我拿起Calc用过的教科书。 1在本地书店开始。 这里有一些在线资源也对我有帮助。
Essence of Calculus video series
微积分的本质视频系列
Understanding Calculus from The Great Courses Plus
从The Great Courses Plus 了解微积分
Other Helpful Math:
其他实用数学:
Mathematical Decision Making from The Great Courses Plus
精品课程中的数学决策
After learning some math and the basics of data science and machine learning, it’s time to jump into more algorithms and neural networks.
在学习了一些数学以及数据科学和机器学习的基础知识之后,该该跳入更多的算法和神经网络了。
You probably got a taste of deep learning already with some of the resources I mentioned in part 1, but here are some really good resources to introduce you to neural networks anyhow. At least they will be a good review and fill in some gaps for you.
借助我在第1部分中提到的一些资源,您可能已经对深度学习有所了解,但是无论如何,这里有一些非常好的资源向您介绍了神经网络。 至少他们会是一个很好的评论,并为您填补一些空白。
3blue1brown’s Series Explaining Neural Networks
3blue1brown的系列解释神经网络
Deeplizard’s Intro to Deep Learning Playlist
Deeplizard的深度学习播放列表简介
While you are working through the Andrew Ng Stanford course, I recommend checking out fast.ai. They have several high quality, practical video courses that can really help to learn and cement these concepts. The first is Practical Deep Learning for Coders and second — just released — is Cutting Edge Deep Learning For Coders, Part 2. I picked up so many things from watching and re-watching some of these videos. Another amazing feature of fast.ai is the community forum; probably one of the most active AI forums online.
当您完成Andrew Ng Stanford课程时,我建议您查看fast.ai。 他们有几本高质量的实用视频课程,可以真正帮助学习和巩固这些概念。 第一个是面向程序员的实用深度学习 ,第二个(刚刚发布)是《 面向程序员的 尖端深度学习》,第2部分 。 我从观看和重新观看其中的一些视频中学到了很多东西。 fast.ai的另一个惊人功能是社区论坛 ; 可能是在线上最活跃的AI论坛之一。
I think it’s a good idea to learn a little bit from all three of these libraries. Keras is a good place to start as it’s API is made to be simpler and more intuitive. Right now, I use almost entirely PyTorch, which is my personal favorite, but they all have pro’s and con’s. Thus it’s good to be able to which one to choose in different situations.
我认为从这三个库中学习一点点是一个好主意。 Keras是一个很好的起点,因为它的API变得更简单,更直观。 现在,我几乎完全使用PyTorch,这是我个人最喜欢的,但是它们都有优点和缺点。 因此,能够在不同情况下选择哪一个是很好的。
Keras
凯拉斯
Deeplizard Keras Playlist — This channel has some seriously good explanations and examples. You can following along with the videos for free, or have access to the code notebooks as well by subscribing on Patreon at the $3 (USD) tier.
Deeplizard Keras播放列表 -该频道有一些非常不错的解释和示例。 您可以免费观看视频,也可以通过以3美元的价格订阅Patreon来访问代码笔记本。
I also found the documentation for Keras to be quite good
我还发现Keras的文档相当不错
Datacamp has many well-written tutorials for ML and Keras like this one
Datacamp有许多针对ML和Keras的精心编写的教程,例如这篇
Tensorflow
张量流
The Tensorflow section of book, “Hands on Machine Learning with Scikit-Learn and Tensorflow” (mentioned above also)
本书的Tensorflow部分,“ 使用Scikit-Learn和Tensorflow进行机器学习 ”(也如上所述)
Deeplizard Tensorflow Series
Deeplizard Tensorflow系列
PyTorch
火炬
Deeplizard Pytorch Series
Deeplizard Pytorch系列
Udacity Pytorch Bootcamp — I’m currently taking Udacity’s Deep Reinforcement Learning nanodegree and I thought their PyTorch section earlier in the course was very good. They are about to launch it for free to the public! Here are some of their PyTorch notebooks on Github.
Udacity Pytorch训练营 —我目前正在学习Udacity的深度强化学习纳米学位,我认为他们在课程早期的PyTorch部分非常好。 他们即将免费向公众发布! 这是在Github上一些他们的PyTorch笔记本 。
Fast.ai is also built with PyTorch — You will be learning this library some if you go through their courses.
Fast.ai也由PyTorch构建-如果您学习了这些课程,您将在该库中学习一些。
I have found it very helpful to read current research as I learn. There are plenty of resources that help making complicated concepts, and the math behind them, easier to digest. These papers are also a lot more fun to read then you may realize.
我发现在学习时阅读当前的研究非常有帮助。 有大量资源可帮助简化复杂的概念及其背后的数学运算。 您可能会发现,阅读这些论文也更加有趣。
fast.ai blog
fast.ai博客
Distill .pub — Machine Learning Research explained clearly
Distill .pub-机器学习研究清楚地解释了
Two Minute Papers — Short video breakdowns of AI and other research papers
两分钟论文 -人工智能和其他研究论文的简短视频摘要
Arvix Sanity — More intuitive tool to search through, sort, and save research papers
Arvix Sanity —搜索,分类和保存研究论文的更直观的工具
Deep Learning Papers Roadmap
深度学习论文路线图
Machine Learning Subreddit — They have ‘what are you reading’ threads discussing research papers
机器学习Subreddit-他们拥有“您在读什么”主题讨论研究论文
Arxiv Insights — This channel has some great breakdowns of AI research papers
Arxiv见解 —该频道提供了一些AI研究论文的细目分类
The Data Skeptic — They have a lot of good shorter episodes, called [mini]s where they cover machine learning concepts
数据怀疑论者 —他们有很多很好的简短故事,称为[mini],涵盖了机器学习概念
Software Engineering Daily Machine Learning
软件工程日常机器学习
OCDevel Machine Learning Podcast — I already mentioned this one, but I’m listing it again just in case you missed it
OCDevel机器学习播客 —我已经提到了这一播客 ,但是为了防止您错过它,我再次列出它
Neural Networks and Deep Learning E-book
神经网络和深度学习电子书
Machine Learning Yearning (free draft) by Andrew Ng
机器学习的渴望 (免费草稿),作者:Andrew Ng
Please clap if this was helpful :)
如果有帮助请拍手:)
Social Media: @gwen_faraday
社交媒体: @gwen_faraday
If you know of any other resources that are good, or see that I am missing something, please leave links in the comments. Thank you.
如果您知道其他任何有用的资源,或者发现我缺少任何东西,请在评论中保留链接。 谢谢。
翻译自: https://www.freecodecamp.org/news/the-best-resources-i-used-to-teach-myself-machine-learning-part-1-292232d167/
自学机器学习