面向程序猿的数据科学与机器学习知识体系及资料合集

  • DataScience & Machine Learning Reference

  • Introduction & Overview:入门与概览

    • Introduction

      • Machine Learning

      • Deep Learning

      • Statistics

    • News:行业与新闻

    • Application:数据挖掘/机器学习/深度学习的实际应用案例

  • Resources:资源

    • Collections:资源汇总帖

    • Books:书籍

    • Video Courses:视频教程

    • Blogs & Forum:博客与论坛

  • Methodology:方法论

    • Data Process:数据处理

    • Machine Learning:机器学习

    • Nature Language Processing:自然语言处理

    • Deep Learning:深度学习

  • Application:应用

    • Recommend System:推荐系统

  • CrawlerSE:爬虫与搜索引擎

    • Crawler:爬虫

    • Search Engine:搜索引擎

  • Toolkits:工具

    • Language

      • Python

      • Java

      • Matlab

      • R

    • ClusterComputing

  • Data Visual:数据可视化

    • Books:书籍

    • Video Courses:视频教程

    • Toolkits:工具

  • Data Sets

    • Collections:资源汇总帖

      • 单一数据库

      • 跨学科数据库与搜索引擎

    • Text:文本

    • Social Network:社交网络

    • Media:影音图片

    • Recognition

    • Driving Data:驾驶数据

    • Domain:领域数据

      • Sports:体育

      • Medicines:医药

      • Alien:外星人

      • Foods:饮食

      • Finance:金融

  • Others:其他

    • Competition:机器学习相关竞赛

    • Career:职业

本文面向程序猿的数据科学与机器学习知识体系及资料合集从属于程序猿的数据科学与机器学习实战手册。本文很多内容来自于hitvoice@github的建议与收集,特此感谢。

DataScience & Machine Learning Reference

本文是笔者在学习DataScience过程中所有资源的汇总,本文着眼于各个领域的入门介绍以及综述性质资源的汇总,并不会过多的深挖前沿,若有兴趣了解更多,可以关注笔者的程序猿的数据科学与机器学习实战手册。本文主线从对数据科学与机器学习入门概览开始,继而提供一系列的资源、书籍与教程,然后介绍各个具体的领域内的参考文章,最后介绍一系列的实用工具。笔者的数据科学与机器学习世界观图解如下,其从属于笔者的编程世界观与方法论系列:

面向程序猿的数据科学与机器学习知识体系及资料合集_第1张图片

本文会随着笔者自身学习实践中格局与能力的提升而不断完善,笔者并非纯粹的机器学习与数据挖掘研究者,更多的是从工程的角度来寻找能够与工程相结合应用的方面。

Introduction & Overview:入门与概览

Introduction

  • 数据科学与机器学习导论

  • [数据分析,数据挖掘,数据科学,机器学习与大数据之间的异同](

                                 https://www.quora.com/What-is-the-difference-between-Data-Analytics-Data-Analysis-Data-Mining-Data-Science-Machine-Learning-and-Big-Data-1)
  • 如何向非计算机科学与技术的人解释机器学习与数据挖掘

Machine Learning

  • Visual Intro To Machine Learning:图解如何基于决策树对于纽约与San Francisco的房产进行分类

  • A Gentle Guide to Machine Learning

  • Machine Learning basics for a newbie

  • What is machine learning, and how does it work?

Deep Learning

  • 有趣的机器学习概念纵览:从多元拟合,神经网络到深度学习,给每个感兴趣的人

  • [[翻译] 神经网络的直观解释](http://www.hackcv.com/index.p...卷积神经网络的讲解非常通俗易懂。

  • Deep-Learning-Papers-Reading-Roadmap:为每个对深度学习感兴趣的朋友整理的论文阅读路线图

  • 程序员的深度学习入门指南:来自费良宏在2016QCon全球软件开发大会(上海)上的演讲。

Statistics

  • 知乎:「数据会说谎」的真实例子有哪些?

News:行业与新闻

  • 深度学习框架大战正在进行,谁将夺取“深度学习工业标准”的荣耀?

Application:数据挖掘/机器学习/深度学习的实际应用案例

  • 深度学习带来的变革:深度学习的十个典型应用

  • 2015 年 Quora关于其机器学习具体应用的讲解

Resources:资源

Collections:资源汇总帖

  • 机器学习入门资源不完全汇总:本文是 机器学习日报的一个专题合集。

  • Top-down learning path: Machine Learning for Software Engineers:针对软件工程师的机器学习进阶之路

Books:书籍

  • 2014 - DataScience From Scratch

  • 2012 - 李航:统计方法学

  • 2015 - Data Mining, The Textbook

  • 2016 - 周志华 机器学习

  • 2012 - Machine Learning A Probabilistic Perspective

  • 2012 - 深入浅出机器学习 中文版

  • 南京大学计算机科学与技术系 数据挖掘课程

Video Courses:视频教程

  • University of Illinois at Urbana-Champaign:Text Mining and Analytics

  • 台大 机器学习技法

  • 斯坦福 机器学习课程

  • CS224d: Deep Learning for Natural Language Processing

  • Unsupervised Feature Learning and Deep Learning:来自斯坦福的无监督特征学习与深度学习系列教程

  • 小象 机器学习视频教程

  • 小象 深度学习视频教程

Blogs & Forum:博客与论坛

Methodology:方法论

Data Process:数据处理

Machine Learning:机器学习

  • 10 Machine Learning Algorithms Explained to an ‘Army Soldier’

  • Top 10 data mining algorithms in plain English

  • 10 Machine Learning Terms Explained in Simple English

  • A Tour of Machine Learning Algorithms

  • The 10 Algorithms Machine Learning Engineers Need to Know

  • Comparing supervised learning algorithms

Nature Language Processing:自然语言处理

Deep Learning:深度学习

  • 重磅论文:解析深度卷积神经网络的14种设计模式

Application:应用

Recommend System:推荐系统

CrawlerSE:爬虫与搜索引擎

Crawler:爬虫

Search Engine:搜索引擎

Toolkits:工具

Language

Python

  • Jupyter:交互式编程与数据展示

  • data-science-ipython-notebooks:一系列基于IPython的数据科学代码展示

  • The Open Source Data Science Masters

Java

Matlab

R

ClusterComputing

    • [Madout]()

    • [MLib]()

    DeepLearning:深度学习工具集

    • Evaluation of Deep Learning Toolkits

    • 代码解析深度学习系统编程模型:TensorFlow vs. CNTK

    • tensorflow-playground:Play with neural networks!

    • dl-docker:将常用的深度学习工具打包在了一个Docker镜像中

    • deep-learning-models:Keras code and weights files for popular deep learning models.

    • Top Deep Learning Projects-

    Data Visual:数据可视化

    Books:书籍

    Video Courses:视频教程

    • John C. Hart Coursera

    Toolkits:工具

    Data Sets

    Collections:资源汇总帖

    • awesome-public-datasets:An awesome list of high-quality open datasets in public domains (on-going).

    • Wikimedia Dumps:Wiki上的数据打包下载

    • Reddit Datasets:Reddit上关于数据集的讨论板块
      | Militarized Interstate Disputes | Nearly 200 years of international threats, conflicts, etc. for modelling or prediction. Includes action taken, level of hostility, fatalities, and outcomes. | Multiple datasets, e.g., 962KB, 179KB | http://www.correlatesofwar.or... |

    单一数据库

    • http://archive.ics.uci.edu/ml/

    • http://crawdad.org/

    • http://data.austintexas.gov

    • http://snap.stanford.edu/data...

    • http://data.cityofchicago.org

    • http://data.govloop.com

    • http://data.gov.uk/data.gov.in

    • http://data.medicare.gov

    • http://www.dados.gov.pt/pt/ca...

    • http://data.sfgov.org

    • http://data.sunlightlabs.com

    • https://datamarket.azure.com/

    • http://econ.worldbank.org/dat...

    • http://gettingpastgo.socrata.com

    • http://public.resource.org/

    • http://timetric.com/public-data/

    • http://www.bls.gov/

    • http://www.crunchbase.com/

    • http://www.dartmouthatlas.org/

    • http://www.data.gov/

    • http://www.datakc.org

    • http://dbpedia.org

    • http://www.factual.com/

    • http://www.freebase.com/

    • http://www.infochimps.com

    • http://build.kiva.org/

    • http://www.imdb.com/interfaces

    • http://knoema.com

    • http://daten.berlin.de/

    • http://www.qunb.com

    • http://databib.org/

    • http://datacite.org/

    • http://data.reegle.info/

    • http://data.wien.gv.at/

    • http://data.gov.bc.ca

    跨学科数据库与搜索引擎

    • https://www..com/datasets

    • http://usgovxml.com

    • http://aws.amazon.com/datasets

    • http://databib.org

    • http://datacite.org

    • http://figshare.com

    • http://linkeddata.org

    • http://thewebminer.com/

    • http://thedatahub.org

    • http://ckan.net

    • http://quandl.com

    • Open Data Inception(这里有 2500+ 开源接口)

    Text:文本

    • 20 Newsgroups:The text from 20000 messages taken from 20 Usenet newsgroups for text analysis, classification, etc. 61.6MB

    • Amazon Reviews:Over 142 million product reviews for sentiment analysis, recommender systems, and more.20GB
      | SMS Spam Collection | A collection of 5,574 SMS (text) messages, some spam, some normal, for spam filtering. | 204KB | http://www.dt.fee.unicamp.br/... |

    Social Network:社交网络

    • http://enigma.io

    • http://www.ufindthem.com/

    • http://NetworkRepository.com(有视觉互动分析的机器学习数据库)

    • http://MLvis.com

    • Yahoo Instant Messenger Friends Connectivity Graph:Connections between Yahoo users who communicate with each other using Yahoo messenger, can be used to identify key social contacts/influencers. Add dataset to cart to access. 共 28MB。

    Media:影音图片

    • Labeled Faces in the Wild:13,000 named faces for facial recognition. Multiple training and test sets. 共173MB

    • Mushroom Identification:For hypothetically classifying mushrooms as edible or poisonous based on its characteristics.3 files, 480KB

    • NORB 3D Object Recognition:Binocular images of 50 toy figurines for 3D object recognition from image.Multiple files, over 5GB total

    • One Million Songs :Audio features and metadata for a subset (10,000) of the one million popular songs dataset for recognition/classification.1.8GB

    • Hate Speech Identification:A sampling of Twitter posts that have been judged based on whether they are offensive or contain hate speech, as a training set for text analysis.2.66MB

    • Hidden Beauty of Flickr Pictures:15,000 Flickr photo IDs that have received ratings based on aesthetics, for image analysis.138KB, use Flickr API to get images

    Recognition

    | Human Activity Recognition with Smartphones | Sensor data for recognizing the human activity - walking, sitting, etc. | 25MB | https://www.kaggle.com/uciml/... |

    Driving Data:驾驶数据

    • UDA City 开源的223G的关于自动驾驶的历史数据

    Domain:领域数据

    Sports:体育

    • Football Strategy:Thousands of scenarios to make the best coaching decisions. 共876KB

    • Horses for Courses:Horse-racing data for predicting race results. 共 19MB

    • NBA & MLB Stats:Current and past season stats for teams and players for fantasy sports predictions.

    Medicines:医药

    • National Survey on Drug Use and Health:Predict drug use based on health survey questions. 共2GB

    • Prostate Cancer:Tumor and nontumor samples, used to recognize prostate cancer. 共 4.8MB

    • Record of Heart Sound:Recordings of normal and abnormal heartbeats, used to recognize heart murmur, etc. 共47.7MB

    Alien:外星人

    • UFO Reports:80,000 historic reports for classification or regression. This dataset has been standardized from the source data at nuforc.org 共14.6MB。

    Foods:饮食

    • Wine Quality:Chemical properties of red and white wines (separately) and quality, for classification. 3个文件,共343KB。

    Finance:金融

    Others:其他

    Competition:机器学习相关竞赛

    • 阿里天池 新人实战赛

    • Kaggle:官方新人赛,不错的入门学习

    • Kaggle Tutorial:基于旅馆推荐比赛实例的完整Tutorial

    • Driven Data

    • Innocentive

    • Crowdanalytix

    • Tunedit

    • DataFountain:DF,CCF指定中国专业的数据竞赛平台

    Career:职业

    • Quora 关于机器学习的招聘启事

    • Google 关于机器学习与人工智能岗位的招聘启事

    你可能感兴趣的:(datascience)