scikit-learn中的神经网络 (Neural Networks in scikit-learn)
scikit-learn is my first choice when it comes to classical Machine Learning in Python. It has a good set of algorithms, supports sparse datasets, it is fast and has many utility functions, like cross-validation, grid search, etc.
scikit-learn是我在使用Python进行经典机器学习时的首选。 它具有一套很好的算法,支持稀疏数据集,它速度快并且具有许多实用功能,例如交叉验证,网格搜索等。
When it comes to advanced modeling, scikit-learn many times falls shorts. If you need Boosting, Neural Networks or t-SNE, it is better to avoid scikit-learn.
当涉及到高级建模时,scikit-learn很多时候都不够用。 如果您需要Boosting,神经网络或t-SNE, 最好避免使用scikit-learn。
scikit-learn has two basic implementations for Neural Nets. There is MLPClassifier for classification and MLPRegressor for regression. While both have a rich set of arguments, there isn’t an option to customize layers of a Neural Network (beyond setting the number of hidden units for each layer). There is also no GPU support.
scikit-learn有两个神经网络的基本实现。 有用于分类的MLPClassifier和用于回归的MLPRegressor。 尽管两者都具有丰富的参数集,但是无法选择自定义神经网络的层(除了为每个层设置隐藏单元的数量之外)。 也没有GPU支持。
认识scikit-neuralnetwork (Meet scikit-neuralnetwork)
scikit-neuralnetwork addresses the issues with scikit-learn mentioned above. While there are already superior libraries available like PyTorch or Tensorflow, scikit-neuralnetwork may be a good choice for those coming from a scikit-learn ecosystem.
scikit-神经网络解决了上述scikit-learn的问题。 虽然已经有PyTorch或Tensorflow这样的高级库可用,但是scikit神经网络对于那些来自scikit学习生态系统的人来说可能是一个不错的选择。
From developersscikit-neuralnetwork is a deep neural network implementation without the learning cliff! This library implements multi-layer perceptrons as a wrapper for the powerful pylearn2 library that’s compatible with scikit-learn for a more user-friendly and Pythonic interface.
来自开发者的cikit-neuralnetwork是无需学习的深层神经网络实现! 该库将多层感知器实现为功能强大的pylearn2库的包装,该库与scikit-learn兼容,从而提供了更加用户友好的Pythonic界面。
安装sknn (Install sknn)
To install sknn is as simple as installing any other Python package:
安装sknn就像安装其他Python包一样简单:
pip install scikit-neuralnetwork
定制神经网络 (Custom Neural Nets)
sknn offers a simple way to make a custom Neural Net. scikit-learn users will feel at home with a familiar API:
sknn提供了一种制作自定义神经网络的简单方法。 scikit-learn用户将通过熟悉的API感到宾至如归:
from sknn.mlp import Classifier, Layer
nn = Classifier(
layers=[
Layer("Maxout", units=100, pieces=2),
Layer("Softmax")],
learning_rate=0.001,
n_iter=25)
nn.fit(X_train, y_train)
X_train, y_train variables are numpy arrays, so you can directly replace your scikit-learn model with a Neural Net from sknn. It even supports sparse datasets.
X_train和y_train变量是numpy数组,因此您可以使用sknn的神经网络直接替换scikit-learn模型。 它甚至支持稀疏数据集。
Check out an Ebook I wrote recently if you are interested in how to create features from a textual dataset:
如果您对如何从文本数据集中创建要素感兴趣,请查看我最近写的一本电子书:
卷积神经网络 (Convolutional Neural Nets)
sknn has a support for Convolutional Neural Nets. Finally, you will be able to achieve a state-of-the-art score on MNIST in a scikit-learn ecosystem.
sknn支持卷积神经网络。 最后,您将能够在scikit学习生态系统中获得MNIST的最新分数。
from sknn.mlp import Classifier, Convolution, Layer
nn = Classifier(
layers=[
Convolution("Rectifier", channels=8, kernel_shape=(3,3)),
Layer("Softmax")],
learning_rate=0.02,
n_iter=5)
nn.fit(X_train, y_train)
递归神经网络 (Recurrent Neural Nets)
What about RNNs, like Long Short Term Memory (LTSM) or Gated Recurrent Unit (GRU)? RNNs are usually used for modeling sequences like time series or textual data.
诸如长期短期记忆(LTSM)或门控循环单元(GRU)之类的RNN呢? RNN通常用于对序列进行建模,例如时间序列或文本数据。
By going through the documentation, it seems there isn’t direct support for RNNs. There is support for native and custom layers, which should make the implementation of RNN possible.
通过阅读文档,似乎没有对RNN的直接支持。 支持本机层和自定义层,这应该使RNN的实现成为可能。
From documentation:You can use this feature to implement recurrent layers like LSTM or GRU, and any other features not directly supported. Keep in mind that this may affect compatibility in future releases, and also may expose edge cases in the code (e.g. serialization, determinism).
来自文档:您可以使用此功能来实现递归层,例如LSTM或GRU,以及不直接支持的任何其他功能。 请记住,这可能会影响将来版本的兼容性,并且还可能在代码中暴露出极端情况(例如,序列化,确定性)。
If you plan to work with RNNs I would recommend learning PyTorch or TensorFlow. If you need a quick start guide, I wrote an article about it a while ago:
如果您打算使用RNN,我建议您学习PyTorch或TensorFlow。 如果您需要快速入门指南,我前一段时间写了一篇有关它的文章:
流水线 (Pipelines)
scikit-learn has pipelines, which package feature transformation with modeling. Pipelines reduce the chance of overfitting and generally reduce the chances of various mistakes. Pipelines are also very useful when making cross-validation or grid search.
scikit-learn具有管道,该管道将特征转换与建模打包在一起。 管道减少了过拟合的机会,并且通常减少了各种错误的机会。 在进行交叉验证或网格搜索时,管道也非常有用。
Many Machine Learning libraries don’t support scikit-learn pipelines so we need to implement it by ourselves. The great think about sknn is that it fully supports scikit-learn pipelines.
许多机器学习库不支持scikit-learn管道,因此我们需要自己实现。 关于sknn的伟大思想是它完全支持scikit-learn管道。
Below is an example of a pipeline that scales features and trains a simple Neural Net.
下面是一个可扩展功能并训练简单神经网络的管道示例。
from sknn.mlp import Classifier, Layer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler
pipeline = Pipeline([
('min/max scaler', MinMaxScaler(feature_range=(0.0, 1.0))),
('neural network', Classifier(layers=[Layer("Softmax")], n_iter=25))])
pipeline.fit(X_train, y_train)
Check out my Churn prediction article for the advanced usage of a pipeline. I show how to process categorical and numerical features separately.
查看我的Churn预测文章,了解管道的高级用法。 我展示了如何分别处理分类和数字特征。
GPU支持 (GPU support)
Unlike scikit-learn, sknn has support for GPUs as it is based on the Lasagne library. Note, GPU support requires an NVIDIA GPU with CUDA support. If you have a MacBook you most probably have a Radeon GPU, which doesn’t support CUDA.
与scikit-learn不同,sknn基于Lasagne库支持GPU。 请注意,GPU支持需要具有CUDA支持的NVIDIA GPU。 如果您使用的是MacBook,则很可能是不支持CUDA的Radeon GPU。
From Lasagne documentationThanks to Theano, Lasagne transparently supports training your networks on a GPU, which may be 10 to 50 times faster than training them on a CPU. Currently, this requires an NVIDIA GPU with CUDA support, and some additional software for Theano to use it.
从Lasagne文档感谢Theano,Lasagne透明地支持在GPU上训练网络,这可能比在CPU上训练网络快10至50倍。 当前,这需要具有CUDA支持的NVIDIA GPU,以及供Theano使用的其他软件。
To use GPU Backend you just need to import:
要使用GPU后端,您只需导入:
# Use the GPU in 32-bit mode, falling back otherwise.
from sknn.platform import gpu32
From documentation:WARNING: This will only work if your program has not yet imported the
theano
module, due to the way that library is designed. IfTHEANO_FLAGS
are set on the command-line, they are not overridden.来自文档:警告:由于库的设计方式,这仅在您的程序尚未导入
theano
模块的情况下才有效。 如果在命令行上设置了THEANO_FLAGS
,则不会覆盖它们。
你走之前 (Before you go)
Follow me on Twitter to join me on my creative journey.
在Twitter上关注我,加入我的创作之旅。
These are a few links that might interest you:
这些链接可能会让您感兴趣:
- Your First Machine Learning Model in the Cloud- AI for Healthcare- Parallels Desktop 50% off- School of Autonomous Systems- Data Science Nanodegree Program- 5 lesser-known pandas tricks- How NOT to write pandas code
翻译自: https://towardsdatascience.com/deep-learning-with-scikit-learn-1de142d96118