关于怎么学习python,并将python用于数据科学、数据分析、机器学习中的一篇很好的文章
So, you want to become a data scientist or may be you are already one and want toexpand(扩张) your toolrepository(贮藏室). You have landed at the right place. The aim of this page is to provide a comprehensive learning path to people new to python for data analysis. This path provides a comprehensiveoverview(综述) of steps you need to learn to use Python for data analysis. If you already have some background, or don’t need all thecomponents(成分), feel free toadapt(适应) your own paths and let us know how you made changes in the path.
Before starting your journey, the first question to answer is:
Why use Python?
or
How would Python be useful?
Watch the first 30 minutes of this talk from Jeremy, Founder of DataRobot at PyCon 2014, Ukraine to get an idea of how useful Python could be.
Now that you have made up your mind, it is time to set up your machine. The easiest way toproceed(开始) is to justdownload Anaconda from Continuum.io . It comes packaged with most of the things you will need ever. The majordownside(下降趋势) of taking thisroute(路线) is that you will need to wait for Continuum to update their packages, even when there might be an update available to theunderlying(潜在的) libraries. If you are a starter, that should hardly matter.
If you face any challenges in installing(安装), you can find moredetailed instructions for various OS here
You should start by understanding the basics of the language, libraries and datastructure(结构). The python track fromCodecademy is one of the best places to start your journey. By end of this course, you should be comfortable writing small scripts on Python, but also understand classes and objects.
Specifically learn: Lists, Tuples, Dictionaries, List comprehensions(理解), Dictionary comprehensions
Assignment: Solve the python tutorial(辅导的) questions on HackerRank. These should get your brain thinking on Python scripting
Alternate resources: If interactive(交互式的) coding is not your style of learning, you can also look at TheGoogle Class for Python. It is a 2 day class series and also covers some of the parts discussed later.
You will need to use them a lot for data cleansing(净化), especially if you are working on text data. The best way to learn Regular expressions is to go through the Google class and keep this cheat sheet handy.
Assignment: Do the baby names exercise
If you still need more practice, follow this tutorial(个别指导) for text cleaning. It will challenge you on various stepsinvolved(包含) in datawrangling(争论).
This is where fun begins! Here is a brief introduction to various libraries. Let’s start practicing some common operations.
You can also look at Exploratory(勘探的) Data Analysis with Pandas andData munging with Pandas
Additional Resources:
Assignment: Solve this assignment(分配) from CS109 course from Harvard.
Go through this lecture form CS109. You can ignore(驳回诉讼) the initial 2 minutes, but what follows after that isawesome(可怕的)! Follow this lecture up withthis assignment
Now, we come to the meat of this entire process. Scikit-learn is the most useful library onpython(巨蟒) for machine learning. Here is abriefoverview(综述) of the library. Go through lecture 10 to lecture 18 fromCS109 course from Harvard. You will go through an overview of machine learning, Supervised learningalgorithms(算法) likeregressions(回归), decision trees,ensemble(全体) modeling and non-supervised learning algorithms likeclustering(聚集). Followindividual(个人的) lectures with theassignments from those lectures.
Additional Resources:
Assignment: Try out this challenge on Kaggle
Congratulations, you made it!
You now have all what you need in technical skills. It is a matter of practice and what better place to practice than compete with fellow Data Scientists on Kaggle. Go, dive into one of the live competitions currently running onKaggle and give all what you have learnt a try!
Now that you have learnt most of machine learning techniques, it is time to give Deep Learning a shot. There is a good chance that you already know what is Deep Learning, but if you still need a briefintro(介绍),here it is.
I am myself new to deep learning, so please take these suggestions with apinch(匮乏) of salt. The mostcomprehensive(综合的) resource isdeeplearning.net. You will find everything here – lectures, datasets, challenges, tutorials. You can also try thecourse from Geoff Hinton a try in a bid to understand the basics of Neural Networks.
P.S. In case you need to use Big Data libraries, give Pydoop and PyMongo a try. They are not included here as Big Data learning path is an entire topic in itself.
from: http://blog.csdn.net/pipisorry/article/details/44245575ref:http://www.analyticsvidhya.com/learning-paths-data-science-business-analytics-business-intelligence-big-data/learning-path-data-science-python/
全中文