Python 数据挖掘推荐模块

best python modules for machine learning, data mining, natural language processing, network analysis, and web scraping

This list is my summary of Quora question What are the best Python 2.7 modules for data mining?

Basics:

  • numpy - numerical library, numpy.scipy.org/
  • scipy - Advanced math, signal processing, optimization, statistics, www.scipy.org/
  • matplotlib, python plotting - Matplotlib, matplotlib.org
Machine Learning and Data Mining:
  • MDP, a collection of supervised and unsupervised learning algorithms, pypi.python.org/pypi/MDP/2.4
  • mlpy, Machine Learning Python, mlpy.sourceforge.net
  • NetworkX, for graph analysis, networkx.lanl.gov/
  • Orange, Data Mining Fruitful & Fun, biolab.si
  • pandas, Python Data Analysis Library, pandas.pydata.org
  • pybrain, pybrain.org
  • scikits-learn - Classic machine learning algorithms - Provide simple an efficient solutions to learning problems, scikit-learn.org/stable/
Natural Language:
  • NLTK, Natural Language Toolkit, nltk.org
For web scraping:
  • Scrapy, An open source web scraping framework for Python scrapy.org
  • urllib/urllib2

你可能感兴趣的:(数据挖掘)