Predictive Modeling in the Cloud with Scikit-learn and IPython
IPython with its notebook interface is an interactive programming environment that is particularly well suited for data exploration, modelling and sharing of analysis results notably vianbviewer.ipython.org.
Scikit-learna versatile Machine Learning library for Python that blends well with the NumPy and SciPy ecosystem and is used by a growing user-base of both academic researchers and data scientists and engineers in the tech industry.
The two projects offer together a productive environment for building and evaluating predictive models from data. In particular IPythondistributed computing capabilitiesmake it possible to offload computational intensive Machine Learning tasks to clusters of tens or hundreds of nodes without breaking the interactive experience.
The goal of the presentation is to showcase how to setup an ad hoc data modelling environment using a cluster provisioned in a public cloud and use it perform common predictive modelling operations such as:
cross-validated model assessment and automated search for the best parameters for common feature extraction and machine learning algorithms,
parallel training of out-of-core text classification models for sentiment analysis,
parallel training of large randomized ensembles of decision trees (a.k.a. Random Forests).
People planning to attend this session also want to see:
How Twitter Monitors Millions of Time-series
The IPython Notebook: Get Close to Your Data with Python and JavaScript
Probabilistic Programming: What, Why, How, and When
The Sidekick Pattern: Using Small Data to Increase the Value of Big Data
Olivier Grisel
Software Engineer, INRIA
Olivier Grisel is a software engineer in theParietal teamofINRIA. He works to improve the speed and scalability of thescikit-learnmachine learning library for the Python / Numpy / Scipy ecosystem. He also likes to share interesting Machine Learning papers and tricks on twitter: @ogrisel
Web site