Python for big data |
1 Basic stack |
1.1 numpy |
1.2 scipy |
1.3 pandas |
1.3.1 "Python for Data Analysis" by Wes McKinney |
1.4 scikits image |
1.5 scikits learn |
1.6 scikits statsmodels |
1.7 nltk |
1.8 matplotlib |
2 Newer packages |
2.1 Numba |
2.2 wiseRF |
2.3 Blaze |
3 Integrated platforms |
3.1 Continuum.io |
3.1.1 Anaconda |
3.1.2 Wakari |
3.2 PiCloud |
3.2.1 Python + AWS |
3.3 wise.io |
3.3.1 MLaaS |
3.3.1.1 RandomForest |
3.4 ipython |
3.4.1 Notebook |
3.5 Orange |
4 Visualization |
4.1 matplotlib |
4.2 Bokeh |
4.2.1 ggplot for python |
4.3 Mayavi |
4.4 Nodebox |
4.5 igraph |
4.6 pandas |
4.6.1 pandas.tools.rplot |
4.7 Google APIs |
4.7.1 googleVis |
5 Data formats |
5.1 Flat text |
5.1.1 xreadlines |
5.1.2 readLines |
5.1.3 pandas |
5.1.3.1 read_csv |
5.1.3.2 read_fwf |
5.1.4 xlrd/xlwt/xlutils |
5.2 HDF5 |
5.2.1 PyTables |
5.2.2 h5py |
5.3 SQL |
5.3.1 SQLAlchemy |
5.3.2 pysqlite3 |
5.3.3 pyodbc |
5.3.3.1 Vertica |
5.3.3.2 Netezza |
5.3.3.3 Teradata |
5.4 NoSQL |
5.4.1 MongoDB |
5.4.1.1 PyMongo |
5.4.2 CouchDB |
5.4.2.1 couchdb-python |
5.4.2.2 couchdbkit |
5.5 JSON |
5.5.1 Standard library |
5.5.1.1 json |
5.5.2 simplejson |
5.6 XML |
5.6.1 Standard library |
5.6.1.1 xml |
5.7 HBase |
5.7.1 HappyBase |
6 MapReduce |
6.1 Hadoop interface |
6.1.1 Hadoop Streaming |
6.1.1.1 Hadoopy |
6.1.1.2 example |
6.1.1.3 dumbo |
6.1.1.4 mrjob Used and developed by Yelp |
6.1.2 Pydoop |
6.1.2.1 uses Hadoop Pipes |
6.2 disco Used and developed by Nokia |
7 Glue |
7.1 rpy2 |
7.1.1 R |
7.2 PySpark |
7.2.1 Spark |
7.3 ipython |
7.3.1 magic |
7.3.1.1 R |
7.3.1.2 SQL |
7.3.1.3 matlab/octave |
7.3.1.4 IDL |
7.4 Jython |
7.4.1 Java |
7.5 boto |
7.5.1 Amazon Web Services |
8 GPU |
8.1 NumbaPro |
8.2 PyCUDA |
9 Parallel |
9.1 ipython |
9.1.1 ipcluster |
9.2 pp |
9.3 dispy |
10 Efficiency |
10.1 Cython |
11 Packages |
11.1 PyPI |
11.1.1 30686 packages |