自学数据科学之Python

Do you want to learn Python for data science, but don’t want to take a slow, expensive course? Most courses are just rehashed versions of the excellent free content out there. Here are resources for self-starters to acquire this valuable skill at their own pace!

您是否想要学习Python来进行数据科学的学习,但又不想花费一个缓慢而昂贵的课程?大多数课程只是在那里提供重复而优秀的免费内容。这里有一些资源,可以让你以自己的速度获得这些宝贵的技能。

自学数据科学之Python_第1张图片

At its heart, data science is about problem solving, exploration, and extracting valuable information from data. To do so effectively, you'll need to be able to wrangle data sets, implement statistical models, write programs, and much more.

本质上讲,数据科学是关于解决问题、探索和从数据中提取有价值的信息的。为了有效地做到这一点,您需要能够为数据集而争论,实现统计模型,编写程序,等等。

Therefore,developing sharp programming skills is critical to your success. It's like learning how to ride a bike in a crowded city. Not only will you reach your destinations faster, but you'll also have the freedom to visit areas you could never reach on foot.

因此,开发敏锐的编程技能对你的成功至关重要。这就像在拥挤的城市里学习如何骑自行车一样。你不仅可以更快地到达目的地,而且你还可以自由地游览那些你永远无法到达的地方。

Plus, your chosen programming tool will become your trusty sidekick in this journey. For most aspiring data scientists, we strongly recommend starting with Python. Then, you should learn R after you become fluent with Python.

另外,你选择的编程工具将会成为你在这段旅程中值得信赖的伙伴。对于大多数有抱负的数据科学家,我们强烈建议从Python开始。然后,在您精通Python之后,您应该学习R。

Python is one of the most widespread languages in the world, and it has a passionate community of users:

Python是世界上最广泛的语言之一,它有一个充满激情的用户社区:

自学数据科学之Python_第2张图片
Python popularity in 2016, TIOBE Index

Within the data science community, Python is even more popular. Here's why...

在数据科学社区中,Python甚至更受欢迎。这是为什么…

Why Learn Python for Data Science?

为什么要学习Python来做数据科学?

Some people judge the quality of a programming language by the simplicity of its "hello, world!" program. Python does pretty well by this standard:

有些人通过简单的“hello,world!”来判断一种编程语言的质量。Python在这个层面上做得很好:

print"hello, world!"

For comparison, here's the same output in Java: 对比Java:

public class Main {
        public static void main(String[] args){
                System.out.println("hello, world!");
        }
}

Great, case closed! See you back here after you've mastered Python, sound good?

好,案例结束!在你掌握了Python之后,再回到这里,不错吧?

Okay, okay... but in all seriousness... simplicity is definitely one of Python's biggest strengths. Thanks to its precise and efficient syntax, Python can often accomplish the same tasks with much less code compared to other languages. This makes implementing solutions refreshingly fast.

严肃的讲,Python最大的优势之一就是它绝对简单。由于其精确而高效的语法,与其他语言相比,Python可以用更少的代码完成相同的任务。这使得实现解决方案的速度令人耳目一新。

In addition, Python's vibrant data science community means you'll be able to find plenty of tutorials, code snippets, and people to commiserate with fixes to common bugs. Stackoverflow will be one of your best friends.

此外,Python充满活力的数据科学社区意味着您将能够找到大量的教程、代码片段和人们对常见错误的修复。Stackoverflow将是你最好的朋友之一。

Finally, Python has all-star lineup of libraries (a.k.a. packages) for numeric and scientific computing, all of which will make your life much easier. More on this later.

最后,Python拥有全明星阵容的库(又名包),用于数字和科学计算,所有这些都将使您的生活更轻松。稍后更多解释。

The Self-Starter Way

自学之路

We believe in a hyper-practical, action-centric approach to learning Python for data science as quickly as possible, but you must be a self-starter to succeed with this strategy.

我们相信以一种超实际的、以行动为中心的方法,尽可能快地学习Python,但是您必须是一个善于自学的人,才可以使用此种策略取得成功。

The reason is that we're going to completely cut out "classroom" study. You'll learn just enough of the fundamentals to jump into real-world problems, and then gradually build mastery over time by"just doing shit."(not the formal term)

原因是我们将完全停止“课堂”的学习。你会学到足够多的基本知识,从而进入到现实世界的问题中,然后通过“just doing shit.”随着时间的推移逐步掌控Python。(不是正式的术语)

You'll also have a ton of fun using this method because it's the fastest way gain the essential programming skills required to start doing data science.

使用这种方法也会有很多乐趣,因为它是获得开始进行数据科学所需的基本编程技能的最快方法。

However, you must first build a rock-solid foundation of core programming concepts. This is the one place where you cannot take any shortcuts because you'll need to know how to translate solutions in your head into instructions for a computer. Effective programming is not about memorizing syntax, but rather mastering a new way of thinking.

但是,您必须首先构建一个坚实的核心编程概念基础。这是一个你不能走捷径的地方,因为你需要知道如何将解决方案转化为计算机的指令。有效的编程并不是要记住语法,而是要掌握一种新的思维方式。

We recommend learning Python for data science through the following 3 reliable steps:

我们建议通过以下三个可靠步骤学习Python:

1 Core Programming Concepts
Learn how to solve problems using code.

2 Drills and Challenges
Practice to master the core skills.

3 Essential Data Science Libraries
Equip the tools needed for data science.

After completing these 3 steps, you'll be ready to dive into projects and analyses while continuing to learn as you go.

完成这3个步骤后,您将准备好投入项目和分析,同时继续学习。

Aside: Installing Python through Anaconda

旁白:通过Anaconda安装Python

There are many ways to install Python on your computer, but we recommend installing it through the Anaconda bundle, which includes many of the libraries you'll need for data science. Here's a quick tutorial on installing Python using Anaconda.

在您的计算机上安装Python有很多方法,但是我们建议通过Anaconda包安装它,它包含了许多您将需要的用于数据科学的库。下面是一个关于使用Anaconda安装Python的快速教程。

Python 2.7 or 3.0+? Use Python 2.7, plain and simple. Python 2.7 is more widely used in almost every field. It supports more packages, especially those required for machine learning.

Python 2.7或3.0 + ? 使用Python 2.7,简单而简单。Python 2.7在几乎所有领域都被广泛使用。它支持更多的包,特别是那些需要机器学习的包。

Step 1: Core Programming Concepts
步骤1:核心编程概念

The amount of time you spend at this step depends on how much previous programming experience you have and whether you can work on this full-time or part-time, but it typically ranges from 1 week to 6 weeks.

你在这个步骤上花费的时间取决于你有多少以前的编程经验,以及你是否可以全职或兼职工作,但通常从1周到6周不等。

If you are completely new to programming, be prepared to spend at least 1 month on this step. You'll want the time to absorb these rich concepts. They form the base needed to learn Python for data science quickly.

如果你对编程完全陌生,那就准备在这个步骤上花至少一个月的时间。你需要时间来吸收这些丰富的概念。它们构成了快速学习Python的基础。

Among all the courses, tutorials, and guides out there, we've found the following two resources to be the best for self-starters. They are both self-paced, hands-on, and comprehensive (and free).

在所有的课程、教程和指南中,我们发现了以下两种资源,它们是最适合自我推进的。他们都是有节奏的,易于上手的,全面的(和免费的)。

自学数据科学之Python_第3张图片

You're new to programming?

你初次接触编程?

How to Think Like a Computer Scientist is a fantastic interactive online book that takes a whirlwind tour through key programming concepts (with Python). If you're new to programming, we suggest starting here, as it's like a condensed "Computer Science 101" course.

《How to Think Like a Computer Scientist》是一种奇妙的交互式在线书籍,它可以通过关键的编程概念(用Python)进行旋风式的旅行。如果你是编程新手,我们建议从这里开始,因为它就像一个浓缩的“计算机科学101”课程。

自学数据科学之Python_第4张图片

You've programmed before?

你接触过编程?

Learn Python the Hard Way is an excellent online book for people with some previous exposure to programming concepts. The "hard way" simply refers to learning through instructive exercises. Through 52 short exercises, you'll start with setting up Python and incrementally work your way up to writing multi-file programs.

《Learn Python the Hard Way》是一本优秀的在线书籍,对于那些以前接触过编程概念的人来说是一本优秀的在线书籍。“the hard way”仅仅是指通过有意义的练习来学习。通过52个简短的练习,您将从设置Python开始,逐步地开始编写多文件程序。

Step 2: Drills and Challenges
步骤2:训练和挑战

If you want to learn Python for data science well, then don't skip this step.

如果想学好Python做数据科学,那就不要跳过这步哦......

After you grasp the core programming concepts, spend a week or two solidifying them by completing drills and challenges.

在掌握了核心编程概念之后,花一两个星期的时间来完成练习和挑战,巩固它们。

If you try to jump into a real project right away, you'll be overwhelmed by the number of moving parts. It's easy for our brains to trick us into believing we know something after reading about it in a book, but it takes concentrated practice to really learn the skills.

如果你想马上投入到一个真正的项目中,你将会被移动部件的数量所淹没。我们的大脑很容易让我们相信我们在一本书中读到过它,但它需要集中的练习才能真正地学习这些技能。

Think about it this way. Professional basketball players cannot just play games all the time if they want to improve. They must also spend hours every day practicing specific shots from different parts of the court.

这样想,职业篮球运动员如果想要进步,就不能一直打比赛。他们还必须每天花几个小时来练习在球场的不同区域投篮。

When you take your newfound programming skills and hone them through short, targeted drills and challenges, you'll improve much faster than jumping into projects immediately.

当你采用新的编程技能,通过简短的、有针对性的训练和挑战来磨练它们时,你会比立即投入到项目中来的快得多。

Here's what we recommend:

我们推荐:

自学数据科学之Python_第5张图片

Get into fighting shape...

进入战斗形态……

Code Fights is a platform with many short coding challenges that can be completed in 5-minute chunks (although it's so fun that you might find yourself playing through it for hours at at time). You'll gain points along the way and unlock new levels, making it a nice way to track your progression as well.

代码冲突是一个平台,有许多简短的编码挑战,可以在5分钟的时间内完成(尽管它很有趣,你可能会发现自己在一段时间内玩了几个小时)。你会在这个过程中获得积分并解锁新的关卡,这也是追踪你的进展的一种很好的方式。

自学数据科学之Python_第6张图片

Solve a mystery...

解决一个谜……

The Python Challenge is one of the coolest puzzles on the web, so don't be put off by its 1990's graphics. You can complete all 33 levels with the help of Python scripts. One user called it "an addictive way to learn the ins and outs of Python..." We agree!

Python挑战是web上最酷的难题之一,所以不要被它的1990年的图形所抛弃。在Python脚本的帮助下,您可以完成所有33个级别。一名用户称这是“一种让人上瘾的学习Python语言的方式。”“我们同意!

自学数据科学之Python_第7张图片

Consider alternative solutions...

考虑替代方案……

PracticePython.org is a collection of short practice problems in Python. It's updated almost every week with a new problem. What's really nice is that the author includes multiple user-submitted solutions for each problem so you can see alternative ways of solving them.

PracticePython.org 是一个在Python中存在的短实践问题的集合。它几乎每周都会更新一个新问题。真正的好处是,作者为每个问题都包含了多个用户提交的解决方案,因此您可以看到解决它们的其他方法。

Step 3: Essential Data Science Libraries
步骤3:重要的数据科学库

Now you're almost ready to dive into real data science projects!

现在,您已经准备好进入真正的数据科学项目了!

First, we built a strong foundation of core concepts. Then, we practiced pure Python through drills and challenges. Now, we're going to focus on the for data science part of "how to learn Python for data science."

首先,我们建立了核心概念的坚实基础。然后,我们通过练习和挑战练习了纯Python。现在,我们将把重点放在数据科学的“如何学习Python数据科学”的部分。

As we mentioned earlier, Python has an all-star lineup of libraries that are essential for data science. To begin, we recommend acquiring a working knowledge of NumPy,pandas,SciPy and matplotlib, while using them in the IPython notebook environment. This is the core stack of tools you'll need for data analysis.

正如我们前面提到的,Python拥有一个全明星阵容的库,这些库对数据科学非常重要。首先,我们建议在IPython notebook环境中使用它们的工作知识,包括NumPypandasSciPymatplotlib。这是需要进行数据分析的核心工具集。

Other important libraries, such as scikit-learn(machine learning) or beautifulsoup4(web scraping), can be picked up when you need to learn their specific use cases later.

其他重要的库,如scikitlearn(机器学习)或beautifulsoup4(web抓取),当您需要学习它们的特定用例时,可以被安装。

The Big 5 Essential Libraries

五大基本库

NumPy- NumPy is the grand-daddy of all data science libraries. It allows easy and efficient numeric computation, and many other machine learning libraries are built on top of it.

NumPy-NumPy是所有数据科学库的鼻祖。它允许简单而有效的数字计算,并且许多其他的机器学习库都是建立在它之上的。

Pandas- Pandas is high-performance library for data structures and exploratory analysis.

Pandas- Pandas是高性能的图书馆,用于数据结构和探索性分析。

Matplotlib- Flexible plotting and visualization library.

Matplotlib-灵活的绘图和可视化库。

IPython- Interactive shell for Python that makes it much easier to explore data and debug errors. Makes it much more enjoyable to learn Python for data science.

IPython——用于Python的交互式shell,可以更容易地研究数据和调试错误。为数据科学学习Python更让人愉快。

SciPy- Extends NumPy with more functionality, such as calculating integrals, linear algebra, and statistics.

SciPy-用更多的功能扩展NumPy,例如计算积分、线性代数和统计。

Training Videos

培训视频

NumPy Beginner (Video),(Course Materials)- Excellent, thorough introduction to scientific computing with NumPy.

初学者(视频),(课程材料)-用NumPy进行科学计算的优秀,全面的介绍。

Introduction to Pandas and Exploratory Data Analysis (Video)- Pandas, IPython, and matplotlib for exploratory data analysis.

介绍Pandas和探索性数据分析(视频)——Pandas、IPython和matplotlib,用于探索数据分析。

你可能感兴趣的:(自学数据科学之Python)