In this chapter, we will learn what is CNTK, its features, difference between its version 1.0 and 2.0 and important highlights of version 2.7.
在本章中,我们将学习什么是CNTK,其功能,1.0和2.0版之间的区别以及2.7版的重要亮点。
Microsoft Cognitive Toolkit (CNTK), formerly known as Computational Network Toolkit, is a free, easy-to-use, open-source, commercial-grade toolkit that enables us to train deep learning algorithms to learn like the human brain. It enables us to create some popular deep learning systems like feed-forward neural network time series prediction systems and Convolutional neural network (CNN) image classifiers.
Microsoft Cognitive Toolkit(CNTK),以前称为Computational Network Toolkit,是一种免费,易于使用的开源商业级工具包,使我们能够训练深度学习算法来像人脑一样学习。 它使我们能够创建一些流行的深度学习系统,例如前馈神经网络时间序列预测系统和卷积神经网络(CNN)图像分类器 。
For optimal performance, its framework functions are written in C++. Although we can call its function using C++, but the most commonly used approach for the same is to use a Python program.
为了获得最佳性能,其框架功能使用C ++编写。 尽管我们可以使用C ++调用其函数,但是最常用的方法是使用Python程序。
Following are some of the features and capabilities offered in the latest version of Microsoft CNTK:
以下是最新版本的Microsoft CNTK中提供的一些功能:
CNTK has highly optimised built-in components that can handle multi-dimensional dense or sparse data from Python, C++ or BrainScript.
CNTK具有高度优化的内置组件,可以处理来自Python,C ++或BrainScript的多维密集或稀疏数据。
We can implement CNN, FNN, RNN, Batch Normalisation and Sequence-to-Sequence with attention.
我们可以注意实现CNN,FNN,RNN,批处理规范化和序列到序列。
It provides us the functionality to add new user-defined core-components on the GPU from Python.
它为我们提供了从Python在GPU上添加新的用户定义的核心组件的功能。
It also provides automatic hyperparameter tuning.
它还提供自动超参数调整。
We can implement Reinforcement learning, Generative Adversarial Networks (GANs), Supervised as well as Unsupervised learning.
我们可以实施强化学习,生成对抗网络(GAN),监督学习和无监督学习。
For massive datasets, CNTK has built-in optimised readers.
对于海量数据集,CNTK具有内置的优化读取器。
CNTK provides us parallelism with high accuracy on multiple GPUs/machines via 1-bit SGD.
CNTK通过1位SGD在多个GPU /机器上为我们提供了高精度的并行性。
To fit the largest models in GPU memory, it provides memory sharing and other built-in methods.
为了适合GPU内存中最大的模型,它提供了内存共享和其他内置方法。
CNTK has full APIs for defining your own network, learners, readers, training and evaluation from Python, C++, and BrainScript.
CNTK具有完整的API,用于定义您自己的网络,学习者,读者,Python,C ++和BrainScript的培训和评估。
Using CNTK, we can easily evaluate models with Python, C++, C# or BrainScript.
使用CNTK,我们可以轻松地使用Python,C ++,C#或BrainScript评估模型。
It provides both high-level as well as low-level APIs.
它提供了高级和低级API。
Based on our data, it can automatically shape the inference.
根据我们的数据,它可以自动塑造推理。
It has fully optimised symbolic Recurrent Neural Network (RNN) loops.
它具有完全优化的符号递归神经网络(RNN)循环。
CNTK provides various components to measure the performance of neural networks you build.
CNTK提供了各种组件来衡量您构建的神经网络的性能。
Generates log data from your model and the associated optimiser, which we can use to monitor the training process.
从您的模型和关联的优化器生成日志数据,我们可以使用它们来监视训练过程。
Following table compares CNTK Version 1.0 and 2.0:
下表比较了CNTK版本1.0和2.0:
Version 1.0 | Version 2.0 |
---|---|
It was released in 2016. | It is a significant rewrite of the 1.0 Version and was released in June 2017. |
It used a proprietary scripting language called BrainScript. | Its framework functions can be called using C++, Python. We can easily load our modules in C# or Java. BrainScript is also supported by Version 2.0. |
It runs on both Windows and Linux systems but not directly on Mac OS. | It also runs on both Windows (Win 8.1, Win 10, Server 2012 R2 and later) and Linux systems but not directly on Mac OS. |
版本1.0 | 版本2.0 |
---|---|
它于2016年发布。 | 它是对1.0版的重大重写,于2017年6月发布。 |
它使用了称为BrainScript的专有脚本语言。 | 可以使用C ++,Python调用其框架函数。 我们可以轻松地用C#或Java加载模块。 2.0版还支持BrainScript。 |
它可以在Windows和Linux系统上运行,但不能直接在Mac OS上运行。 | 它还可以在Windows(Win 8.1,Win 10,Server 2012 R2和更高版本)和Linux系统上运行,但不能直接在Mac OS上运行。 |
Version 2.7 is the last main released version of Microsoft Cognitive Toolkit. It has full support for ONNX 1.4.1. Following are some important highlights of this last released version of CNTK.
2.7版是Microsoft Cognitive Toolkit的最后一个主要发行版本。 它完全支持ONNX 1.4.1。 以下是该CNTK的最新发行版的一些重要亮点。
Full support for ONNX 1.4.1.
完全支持ONNX 1.4.1。
Support for CUDA 10 for both Windows and Linux systems.
Windows和Linux系统均支持CUDA 10。
It supports advance Recurrent Neural Networks (RNN) loop in ONNX export.
它支持ONNX导出中的高级递归神经网络(RNN)循环。
It can export more than 2GB models in ONNX format.
它可以以ONNX格式导出2GB以上的模型。
It supports FP16 in BrainScript scripting language’s training action.
它在BrainScript脚本语言的训练操作中支持FP16。
Here, we will understand about the installation of CNTK on Windows and on Linux. Moreover, the chapter explains installing CNTK package, steps to install Anaconda, CNTK files, directory structure and CNTK library organisation.
在这里,我们将了解在Windows和Linux上CNTK的安装。 此外,本章还介绍了安装CNTK软件包,安装Anaconda,CNTK文件,目录结构和CNTK库组织的步骤。
In order to install CNTK, we must have Python installed on our computers. You can go to the link https://www.python.org/downloads/ and select the latest version for your OS, i.e. Windows and Linux/Unix. For basic tutorial on Python, you can refer to the link https://www.tutorialspoint.com/python3/index.htm.
为了安装CNTK,我们必须在计算机上安装Python。 您可以转到链接https://www.python.org/downloads/并为您的操作系统选择最新版本,即Windows和Linux / Unix。 有关Python的基本教程,您可以参考链接https://www.tutorialspoint.com/python3/index.htm 。
CNTK is supported for Windows as well as Linux so we will walk through both of them.
Windows和Linux支持CNTK,因此我们将逐步介绍它们。
In order to run CNTK on Windows, we will be using the Anaconda version of Python. We know that, Anaconda is a redistribution of Python. It includes additional packages like Scipy andScikit-learn which are used by CNTK to perform various useful calculations.
为了在Windows上运行CNTK,我们将使用Anaconda版本的Python。 我们知道,Anaconda是Python的重新分发。 它包括CNTK用于执行各种有用计算的其他软件包,例如Scipy和Scikit-learn 。
So, first let see the steps to install Anaconda on your machine −
因此,首先让我们看看在您的计算机上安装Anaconda的步骤-
Step 1−First download the setup files from the public website https://www.anaconda.com/distribution/.
步骤1-首先从公共网站https://www.anaconda.com/distribution/下载设置文件。
Step 2 − Once you downloaded the setup files, start the installation and follow the instructions from the link https://docs.anaconda.com/anaconda/install/.
步骤2-下载设置文件后,开始安装并按照链接https://docs.anaconda.com/anaconda/install/中的说明进行操作。
Step 3 − Once installed, Anaconda will also install some other utilities, which will automatically include all the Anaconda executables in your computer PATH variable. We can manage our Python environment from this prompt, can install packages and run Python scripts.
步骤3-一旦安装,Anaconda还将安装一些其他实用程序,这些实用程序将自动在您的计算机PATH变量中包含所有Anaconda可执行文件。 我们可以从此提示管理Python环境,可以安装软件包并运行Python脚本。
Once Anaconda installation is done, you can use the most common way to install the CNTK package through the pip executable by using following command −
Anaconda安装完成后,可以使用以下命令通过最常见的方式通过pip可执行文件安装CNTK软件包-
pip install cntk
There are various other methods to install Cognitive Toolkit on your machine. Microsoft has a neat set of documentation that explains the other installation methods in detail. Please follow the link https://docs.microsoft.com/en-us/cognitive-toolkit/Setup-CNTK-on-your-machine.
还有多种其他方法可以在计算机上安装Cognitive Toolkit。 Microsoft提供了一套简洁的文档,详细介绍了其他安装方法。 请点击链接https://docs.microsoft.com/en-us/cognitive-toolkit/Setup-CNTK-on-your-machine 。
Installation of CNTK on Linux is a bit different from its installation on Windows. Here, for Linux we are going to use Anaconda to install CNTK, but instead of a graphical installer for Anaconda, we will be using a terminal-based installer on Linux. Although, the installer will work with almost all Linux distributions, we limited the description to Ubuntu.
Linux上CNTK的安装与Windows上的安装有些不同。 在这里,对于Linux,我们将使用Anaconda安装CNTK,但是我们将在Linux上使用基于终端的安装程序,而不是Anaconda的图形安装程序。 尽管该安装程序可与几乎所有Linux发行版一起使用,但我们仅将说明限于Ubuntu。
So, first let see the steps to install Anaconda on your machine −
因此,首先让我们看看在您的计算机上安装Anaconda的步骤-
Step 1 − Before installing Anaconda, make sure that the system is fully up to date. To check, first execute the following two commands inside a terminal −
步骤1-在安装Anaconda之前,请确保系统是最新的。 要检查,首先在终端内执行以下两个命令:
sudo apt update
sudo apt upgrade
Step 2 − Once the computer is updated, get the URL from the public website https://www.anaconda.com/distribution/ for the latest Anaconda installation files.
步骤2-更新计算机后,从公共网站https://www.anaconda.com/distribution/获取URL,以获取最新的Anaconda安装文件。
Step 3 − Once URL is copied, open a terminal window and execute the following command −
步骤3-复制URL后,打开终端窗口并执行以下命令-
wget -0 anaconda-installer.sh url SHAPE \* MERGEFORMAT
y
f
x
| }
Replace the url placeholder with the URL copied from the Anaconda website.
用从Anaconda网站复制的URL替换url占位符。
Step 4 − Next, with the help of following command, we can install Anaconda −
步骤4-接下来,在以下命令的帮助下,我们可以安装Anaconda-
sh ./anaconda-installer.sh
The above command will by default install Anaconda3 inside our home directory.
默认情况下,以上命令将在我们的主目录中安装Anaconda3 。
Once Anaconda installation is done, you can use the most common way to install the CNTK package through the pip executable by using following command −
Anaconda安装完成后,可以使用以下命令通过最常见的方式通过pip可执行文件安装CNTK软件包-
pip install cntk
Once CNTK is installed as a Python package, we can examine its file and directory structure. It’s at C:\Users\ \Anaconda3\Lib\site-packages\cntk, as shown below in screenshot.
CNTK作为Python软件包安装后,我们可以检查其文件和目录结构。 在C:\ Users \ \ Anaconda3 \ Lib \ site-packages \ cntk, 如下面的屏幕快照所示。
Once CNTK is installed as a Python package, you should verify that CNTK has been installed correctly. From Anaconda command shell, start Python interpreter by entering ipython. Then, import CNTK by entering the following command.
将CNTK作为Python软件包安装后,您应验证CNTK已正确安装。 在Anaconda命令外壳中,通过输入ipython启动Python解释器。 然后,通过输入以下命令导入CNTK 。
import cntk as c
Once imported, check its version with the help of following command −
导入后,在以下命令的帮助下检查其版本-
print(c.__version__)
The interpreter will respond with installed CNTK version. If it doesn’t respond, there will be a problem with the installation.
解释器将响应已安装的CNTK版本。 如果没有响应,则说明安装存在问题。
CNTK, a python package technically, is organised into 13 high-level sub-packages and 8 smaller sub-packages. Following table consist of the 10 most frequently used packages:
CNTK从技术上讲是python软件包,分为13个高级子软件包和8个较小的子软件包。 下表包含10个最常用的软件包:
Sr.No | Package Name & Description |
---|---|
1 | cntk.io Contains functions for reading data. For example: next_minibatch() |
2 | cntk.layers Contains high-level functions for creating neural networks. For example: Dense() |
3 | cntk.learners Contains functions for training. For example: sgd() |
4 | cntk.losses Contains functions to measure training error. For example: squared_error() |
5 | cntk.metrics Contains functions to measure model error. For example: classificatoin_error |
6 | cntk.ops Contains low-level functions for creating neural networks. For example: tanh() |
7 | cntk.random Contains functions to generate random numbers. For example: normal() |
8 | cntk.train Contains training functions. For example: train_minibatch() |
9 | cntk.initializer Contains model parameter initializers. For example: normal() and uniform() |
10 | cntk.variables Contains low-level constructs. For example: Parameter() and Variable() |
序号 | 包装名称和说明 |
---|---|
1个 | cntk.io 包含读取数据的功能。 例如: next_minibatch() |
2 | 图层 包含用于创建神经网络的高级功能。 例如: Dense() |
3 | 学习者 包含培训功能。 例如: sgd() |
4 | 损失 包含测量训练误差的功能。 例如: squared_error() |
5 | 计量指标 包含测量模型误差的函数。 例如: classificatoin_error |
6 | cntk.ops 包含用于创建神经网络的底层函数。 例如: tanh() |
7 | 随机 包含生成随机数的函数。 例如: normal() |
8 | 火车 包含训练功能。 例如: train_minibatch() |
9 | cntk.initializer 包含模型参数初始化程序。 例如: normal()和uniform() |
10 | cntk.variables 包含低级构造。 例如: Parameter()和Variable() |
Microsoft Cognitive Toolkit offers two different build versions namely CPU-only and GPU-only.
Microsoft Cognitive Toolkit提供了两种不同的生成版本,即仅CPU和仅GPU。
The CPU-only build version of CNTK uses the optimised Intel MKLML, where MKLML is the subset of MKL (Math Kernel Library) and released with Intel MKL-DNN as a terminated version of Intel MKL for MKL-DNN.
CNTK的仅CPU构建版本使用经过优化的Intel MKLML,其中MKLML是MKL(数学内核库)的子集,并与Intel MKL-DNN一起发布,作为MKL-DNN的Intel MKL的终止版本。
On the other hand, the GPU-only build version of CNTK uses highly optimised NVIDIA libraries such as CUB and cuDNN. It supports distributed training across multiple GPUs and multiple machines. For even faster distributed training in CNTK, the GPU-build version also includes −
另一方面,仅GPU的CNTK构建版本使用高度优化的NVIDIA库,例如CUB和cuDNN 。 它支持跨多个GPU和多台机器的分布式培训。 为了在CNTK中进行更快的分布式培训,GPU构建版本还包括-
MSR-developed 1bit-quantized SGD.
MSR开发了1位量化的SGD。
Block-momentum SGD parallel training algorithms.
块动量SGD并行训练算法。
In the previous section, we saw how to install the basic version of CNTK to use with the CPU. Now let’s discuss how we can install CNTK to use with a GPU. But, before getting deep dive into it, first you should have a supported graphics card.
在上一节中,我们了解了如何安装CNTK的基本版本以与CPU一起使用。 现在让我们讨论如何安装CNTK以与GPU一起使用。 但是,在深入研究它之前,首先应该拥有受支持的图形卡。
At present, CNTK supports the NVIDIA graphics card with at least CUDA 3.0 support. To make sure, you can check at https://developer.nvidia.com/cuda-gpus whether your GPU supports CUDA.
目前,CNTK支持至少具有CUDA 3.0支持的NVIDIA图形卡。 为了确保这一点,您可以在https://developer.nvidia.com/cuda-gpus上检查您的GPU是否支持CUDA。
So, let us see the steps to enable GPU with CNTK on Windows OS −
因此,让我们看看在Windows OS上启用具有CNTK的GPU的步骤-
Step 1 − Depending on the graphics card you are using, first you need to have the latest GeForce or Quadro drivers for your graphics card.
步骤1-根据所使用的显卡,首先需要为显卡使用最新的GeForce或Quadro驱动程序。
Step 2 − Once you downloaded the drivers, you need to install the CUDA toolkit Version 9.0 for Windows from NVIDIA website https://developer.nvidia.com/cuda-90-download-archive?target_os=Windows&target_arch=x86_64. After installing, run the installer and follow the instructions.
步骤2-下载驱动程序后,您需要从NVIDIA网站https://developer.nvidia.com/cuda-90-download-archive?target_os=Windows&target_arch=x86_64安装CUDA工具包9.0 for Windows。 安装后,运行安装程序并按照说明进行操作。
Step 3 − Next, you need to install cuDNN binaries from NVIDIA website https://developer.nvidia.com/rdp/form/cudnn-download-survey. With CUDA 9.0 version, cuDNN 7.4.1 works well. Basically, cuDNN is a layer on the top of CUDA, used by CNTK.
步骤3-接下来,您需要从NVIDIA网站https://developer.nvidia.com/rdp/form/cudnn-download-survey安装cuDNN二进制文件。 在CUDA 9.0版本中,cuDNN 7.4.1可以很好地工作。 基本上,cuDNN是CNTK使用的位于CUDA顶部的一层。
Step 4 − After downloading the cuDNN binaries, you need to extract the zip file into the root folder of your CUDA toolkit installation.
步骤4-下载cuDNN二进制文件后,您需要将zip文件解压缩到CUDA工具包安装的根文件夹中。
Step 5 − This is the last step which will enable GPU usage inside CNTK. Execute the following command inside the Anaconda prompt on Windows OS −
步骤5-这是最后一步,它将在CNTK中启用GPU使用。 在Windows OS上的Anaconda提示符内执行以下命令-
pip install cntk-gpu
Let us see how we can enable GPU with CNTK on Linux OS −
让我们看看如何在Linux OS上使用CNTK启用GPU-
First, you need to install the CUDA toolkit from NVIDIA website https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type =runfilelocal.
首先,您需要从NVIDIA网站https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type = runfilelocal安装CUDA工具包。
Now, once you have binaries on the disk, run the installer by opening a terminal and executing the following command and the instruction on screen −
现在,一旦磁盘上有二进制文件,就可以通过打开终端并执行以下命令和屏幕上的说明来运行安装程序:
sh cuda_9.0.176_384.81_linux-run
After installing CUDA toolkit on your Linux machine, you need to modify the BASH profile script. For this, first open the $HOME/ .bashrc file in text editor. Now, at the end of the script, include the following lines −
在Linux机器上安装CUDA工具包之后,您需要修改BASH配置文件脚本。 为此,首先在文本编辑器中打开$ HOME / .bashrc文件。 现在,在脚本末尾,包括以下几行:
export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64\
${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Installing
At last we need to install cuDNN binaries. It can be downloaded from NVIDIA website https://developer.nvidia.com/rdp/form/cudnn-download-survey. With CUDA 9.0 version, cuDNN 7.4.1 works well. Basically, cuDNN is a layer on the top of CUDA, used by CNTK.
最后,我们需要安装cuDNN二进制文件。 可以从NVIDIA网站https://developer.nvidia.com/rdp/form/cudnn-download-survey下载。 在CUDA 9.0版本中,cuDNN 7.4.1可以很好地工作。 基本上,cuDNN是CNTK使用的位于CUDA顶部的一层。
Once downloaded the version for Linux, extract it to the /usr/local/cuda-9.0 folder by using the following command −
下载适用于Linux的版本后,使用以下命令将其解压缩到/usr/local/cuda-9.0文件夹-
tar xvzf -C /usr/local/cuda-9.0/ cudnn-9.0-linux-x64-v7.4.1.5.tgz
Change the path to the filename as required.
根据需要将路径更改为文件名。
In this chapter, we will learn in detail about the sequences in CNTK and its classification.
在本章中,我们将详细了解CNTK中的序列及其分类。
The concept on which CNTK works is tensor. Basically, CNTK inputs, outputs as well as parameters are organized as tensors, which is often thought of as a generalised matrix. Every tensor has a rank −
CNTK工作的概念是张量 。 基本上,CNTK的输入,输出以及参数被组织为张量 ,通常被认为是广义矩阵。 每个张量都有一个等级 -
Tensor of rank 0 is a scalar.
等级0的张量是标量。
Tensor of rank 1 is a vector.
等级1的张量是一个向量。
Tensor of rank 2 is amatrix.
等级2的张量是矩阵。
Here, these different dimensions are referred as axes.
在这里,这些不同的尺寸称为轴。
As the name implies, the static axes have the same length throughout the network’s life. On the other hand, the length of dynamic axes can vary from instance to instance. In fact, their length is typically not known before each minibatch is presented.
顾名思义,静态轴在网络的整个生命周期中具有相同的长度。 另一方面,动态轴的长度可能因实例而异。 实际上,通常在呈现每个小批量之前不知道它们的长度。
Dynamic axes are like static axes because they also define a meaningful grouping of the numbers contained in the tensor.
动态轴就像静态轴一样,因为它们还定义了张量中包含的有意义的数字分组。
To make it clearer, let’s see how a minibatch of short video clips is represented in CNTK. Suppose that the resolution of video clips is all 640 * 480. And, also the clips are shot in color which is typically encoded with three channels. It further means that our minibatch has the following −
为了更清楚一点,让我们看看如何在CNTK中表示一小段短视频剪辑。 假设视频剪辑的分辨率均为640 *480。并且,这些剪辑也以彩色拍摄,通常使用三个通道进行编码。 这进一步意味着我们的minibatch具有以下内容-
3 static axes of length 640, 480 and 3 respectively.
3个静态轴,分别为长度640、480和3。
Two dynamic axes; the length of the video and the minibatch axes.
两个动态轴; 视频和最小批量轴的长度。
It means that if a minibatch is having 16 videos each of which is 240 frames long, would be represented as 16*240*3*640*480 tensors.
这意味着如果一个微型批处理具有16个视频,每个视频的长度为240帧,则将表示为16 * 240 * 3 * 640 * 480张量。
Let us understand sequences in CNTK by first learning about Long-Short Term Memory Network.
首先了解长期记忆网络,让我们了解CNTK中的序列。
Long-short term memory (LSTMs) networks were introduced by Hochreiter & Schmidhuber. It solved the problem of getting a basic recurrent layer to remember things for a long time. The architecture of LSTM is given above in the diagram. As we can see it has input neurons, memory cells, and output neurons. In order to combat the vanishing gradient problem, Long-short term memory networks use an explicit memory cell (stores the previous values) and the following gates −
Hochreiter&Schmidhuber引入了长期短期记忆(LSTM)网络。 它解决了使基本的循环层能够长时间记住事物的问题。 LSTM的体系结构如上图所示。 如我们所见,它具有输入神经元,记忆细胞和输出神经元。 为了解决梯度消失的问题,长期短期存储网络使用显式存储单元(存储先前的值)和随后的门-
Forget gate − As the name implies, it tells the memory cell to forget the previous values. The memory cell stores the values until the gate i.e. ‘forget gate’ tells it to forget them.
忘记门 -顾名思义,它告诉存储单元忘记先前的值。 存储单元存储这些值,直到门(即“忘记门”)告诉它忘记它们为止。
Input gate − As name implies, it adds new stuff to the cell.
输入门 -顾名思义,它为单元添加了新内容。
Output gate − As name implies, output gate decides when to pass along the vectors from the cell to the next hidden state.
输出门 -顾名思义,输出门决定何时将矢量从单元传递到下一个隐藏状态。
It is very easy to work with sequences in CNTK. Let’s see it with the help of following example −
在CNTK中使用序列非常容易。 让我们借助以下示例进行查看-
import sys
import os
from cntk import Trainer, Axis
from cntk.io import MinibatchSource, CTFDeserializer, StreamDef, StreamDefs,\
INFINITELY_REPEAT
from cntk.learners import sgd, learning_parameter_schedule_per_sample
from cntk import input_variable, cross_entropy_with_softmax, \
classification_error, sequence
from cntk.logging import ProgressPrinter
from cntk.layers import Sequential, Embedding, Recurrence, LSTM, Dense
def create_reader(path, is_training, input_dim, label_dim):
return MinibatchSource(CTFDeserializer(path, StreamDefs(
features=StreamDef(field='x', shape=input_dim, is_sparse=True),
labels=StreamDef(field='y', shape=label_dim, is_sparse=False)
)), randomize=is_training,
max_sweeps=INFINITELY_REPEAT if is_training else 1)
def LSTM_sequence_classifier_net(input, num_output_classes, embedding_dim,
LSTM_dim, cell_dim):
lstm_classifier = Sequential([Embedding(embedding_dim),
Recurrence(LSTM(LSTM_dim, cell_dim)),
sequence.last,
Dense(num_output_classes)])
return lstm_classifier(input)
def train_sequence_classifier():
input_dim = 2000
cell_dim = 25
hidden_dim = 25
embedding_dim = 50
num_output_classes = 5
features = sequence.input_variable(shape=input_dim, is_sparse=True)
label = input_variable(num_output_classes)
classifier_output = LSTM_sequence_classifier_net(
features, num_output_classes, embedding_dim, hidden_dim, cell_dim)
ce = cross_entropy_with_softmax(classifier_output, label)
pe = classification_error(classifier_output, label)
rel_path = ("../../../Tests/EndToEndTests/Text/" +
"SequenceClassification/Data/Train.ctf")
path = os.path.join(os.path.dirname(os.path.abspath(__file__)), rel_path)
reader = create_reader(path, True, input_dim, num_output_classes)
input_map = {
features: reader.streams.features,
label: reader.streams.labels
}
lr_per_sample = learning_parameter_schedule_per_sample(0.0005)
progress_printer = ProgressPrinter(0)
trainer = Trainer(classifier_output, (ce, pe),
sgd(classifier_output.parameters, lr=lr_per_sample),progress_printer)
minibatch_size = 200
for i in range(255):
mb = reader.next_minibatch(minibatch_size, input_map=input_map)
trainer.train_minibatch(mb)
evaluation_average = float(trainer.previous_minibatch_evaluation_average)
loss_average = float(trainer.previous_minibatch_loss_average)
return evaluation_average, loss_average
if __name__ == '__main__':
error, _ = train_sequence_classifier()
print(" error: %f" % error)
average since average since examples
loss last metric last
------------------------------------------------------
1.61 1.61 0.886 0.886 44
1.61 1.6 0.714 0.629 133
1.6 1.59 0.56 0.448 316
1.57 1.55 0.479 0.41 682
1.53 1.5 0.464 0.449 1379
1.46 1.4 0.453 0.441 2813
1.37 1.28 0.45 0.447 5679
1.3 1.23 0.448 0.447 11365
error: 0.333333
The detailed explanation of the above program will be covered in next sections, especially when we will be constructing Recurrent Neural networks.
上述程序的详细说明将在下一部分中介绍,尤其是当我们要构建递归神经网络时。
This chapter deals with constructing a logistic regression model in CNTK.
本章涉及在CNTK中构建逻辑回归模型。
Logistic Regression, one of the simplest ML techniques, is a technique especially for binary classification. In other words, to create a prediction model in situations where the value of the variable to predict can be one of just two categorical values. One of the simplest examples of Logistic Regression is to predict whether the person is male or female, based on person’s age, voice, hairs and so on.
Logistic回归是最简单的ML技术之一,是一种专门用于二进制分类的技术。 换句话说,在预测变量的值可能只是两个分类值之一的情况下创建预测模型。 Logistic回归的最简单例子之一是根据人的年龄,声音,头发等来预测该人是男性还是女性。
Let’s understand the concept of Logistic Regression mathematically with the help of another example −
让我们借助另一个示例在数学上理解Logistic回归的概念-
Suppose, we want to predict the credit worthiness of a loan application; 0 means reject, and 1 means approve, based on applicant debt , income and credit rating. We represent debt with X1, income with X2 and credit rating with X3.
假设我们要预测贷款申请的信用价值; 根据申请人的债务,收入和信用等级 , 0表示拒绝,1表示批准。 我们用X1表示债务,用X2表示收入,用X3表示信用等级。
In Logistic Regression, we determine a weight value, represented by w, for every feature and a single bias value, represented by b.
在Logistic回归中,我们为每个特征确定一个权重值(用w表示)和一个偏置值(以b表示)。
Now suppose,
现在假设
X1 = 3.0
X2 = -2.0
X3 = 1.0
And suppose we determine weight and bias as follows −
并假设我们按以下方式确定权重和偏差-
W1 = 0.65, W2 = 1.75, W3 = 2.05 and b = 0.33
Now, for predicting the class, we need to apply the following formula −
现在,为了预测类别,我们需要应用以下公式-
Z = (X1*W1)+(X2*W2)+(X3+W3)+b
i.e. Z = (3.0)*(0.65) + (-2.0)*(1.75) + (1.0)*(2.05) + 0.33
= 0.83
Next, we need to compute P = 1.0/(1.0 + exp(-Z)). Here, the exp() function is Euler’s number.
接下来,我们需要计算P = 1.0 /(1.0 + exp(-Z)) 。 在这里,exp()函数是欧拉数。
P = 1.0/(1.0 + exp(-0.83)
= 0.6963
The P value can be interpreted as the probability that the class is 1. If P < 0.5, the prediction is class = 0 else the prediction (P >= 0.5) is class = 1.
P值可以解释为类别为1的概率。如果P <0.5,则预测为类别= 0,否则,预测(P> = 0.5)为类别= 1。
To determine the values of weight and bias, we must obtain a set of training data having the known input predictor values and known correct class labels values. After that, we can use an algorithm, generally Gradient Descent, in order to find the values of weight and bias.
要确定权重和偏倚的值,我们必须获得一组训练数据,该训练数据应具有已知的输入预测值和已知的正确类别标签值。 之后,我们可以使用一种算法(通常是“梯度下降”)来找到权重和偏差的值。
For this LR model, we are going to use the following data set −
对于此LR模型,我们将使用以下数据集-
1.0, 2.0, 0
3.0, 4.0, 0
5.0, 2.0, 0
6.0, 3.0, 0
8.0, 1.0, 0
9.0, 2.0, 0
1.0, 4.0, 1
2.0, 5.0, 1
4.0, 6.0, 1
6.0, 5.0, 1
7.0, 3.0, 1
8.0, 5.0, 1
To start this LR model implementation in CNTK, we need to first import the following packages −
要在CNTK中启动此LR模型实现,我们需要首先导入以下软件包-
import numpy as np
import cntk as C
The program is structured with main() function as follows −
该程序由main()函数构成,如下所示:
def main():
print("Using CNTK version = " + str(C.__version__) + "\n")
Now, we need to load the training data into memory as follows −
现在,我们需要按如下方式将训练数据加载到内存中:
data_file = ".\\dataLRmodel.txt"
print("Loading data from " + data_file + "\n")
features_mat = np.loadtxt(data_file, dtype=np.float32, delimiter=",", skiprows=0, usecols=[0,1])
labels_mat = np.loadtxt(data_file, dtype=np.float32, delimiter=",", skiprows=0, usecols=[2], ndmin=2)
Now, we will be creating a training program that creates a logistic regression model which is compatible with the training data −
现在,我们将创建一个训练程序,该程序将创建与训练数据兼容的逻辑回归模型-
features_dim = 2
labels_dim = 1
X = C.ops.input_variable(features_dim, np.float32)
y = C.input_variable(labels_dim, np.float32)
W = C.parameter(shape=(features_dim, 1)) # trainable cntk.Parameter
b = C.parameter(shape=(labels_dim))
z = C.times(X, W) + b
p = 1.0 / (1.0 + C.exp(-z))
model = p
Now, we need to create Lerner and trainer as follows −
现在,我们需要创建Lerner和Trainer,如下所示:
ce_error = C.binary_cross_entropy(model, y) # CE a bit more principled for LR
fixed_lr = 0.010
learner = C.sgd(model.parameters, fixed_lr)
trainer = C.Trainer(model, (ce_error), [learner])
max_iterations = 4000
Once, we have created the LR model, next, it is time to start the training process −
一次,我们创建了LR模型,接下来,是时候开始训练过程了-
np.random.seed(4)
N = len(features_mat)
for i in range(0, max_iterations):
row = np.random.choice(N,1) # pick a random row from training items
trainer.train_minibatch({ X: features_mat[row], y: labels_mat[row] })
if i % 1000 == 0 and i > 0:
mcee = trainer.previous_minibatch_loss_average
print(str(i) + " Cross-entropy error on curr item = %0.4f " % mcee)
Now, with the help of the following code, we can print the model weights and bias −
现在,借助以下代码,我们可以打印模型权重和偏差-
np.set_printoptions(precision=4, suppress=True)
print("Model weights: ")
print(W.value)
print("Model bias:")
print(b.value)
print("")
if __name__ == "__main__":
main()
import numpy as np
import cntk as C
def main():
print("Using CNTK version = " + str(C.__version__) + "\n")
data_file = ".\\dataLRmodel.txt" # provide the name and the location of data file
print("Loading data from " + data_file + "\n")
features_mat = np.loadtxt(data_file, dtype=np.float32, delimiter=",", skiprows=0, usecols=[0,1])
labels_mat = np.loadtxt(data_file, dtype=np.float32, delimiter=",", skiprows=0, usecols=[2], ndmin=2)
features_dim = 2
labels_dim = 1
X = C.ops.input_variable(features_dim, np.float32)
y = C.input_variable(labels_dim, np.float32)
W = C.parameter(shape=(features_dim, 1)) # trainable cntk.Parameter
b = C.parameter(shape=(labels_dim))
z = C.times(X, W) + b
p = 1.0 / (1.0 + C.exp(-z))
model = p
ce_error = C.binary_cross_entropy(model, y) # CE a bit more principled for LR
fixed_lr = 0.010
learner = C.sgd(model.parameters, fixed_lr)
trainer = C.Trainer(model, (ce_error), [learner])
max_iterations = 4000
np.random.seed(4)
N = len(features_mat)
for i in range(0, max_iterations):
row = np.random.choice(N,1) # pick a random row from training items
trainer.train_minibatch({ X: features_mat[row], y: labels_mat[row] })
if i % 1000 == 0 and i > 0:
mcee = trainer.previous_minibatch_loss_average
print(str(i) + " Cross-entropy error on curr item = %0.4f " % mcee)
np.set_printoptions(precision=4, suppress=True)
print("Model weights: ")
print(W.value)
print("Model bias:")
print(b.value)
if __name__ == "__main__":
main()
Using CNTK version = 2.7
1000 cross entropy error on curr item = 0.1941
2000 cross entropy error on curr item = 0.1746
3000 cross entropy error on curr item = 0.0563
Model weights:
[-0.2049]
[0.9666]]
Model bias:
[-2.2846]
Once the LR model has been trained, we can use it for prediction as follows −
一旦训练了LR模型,我们就可以将其用于预测,如下所示:
First of all, our evaluation program imports the numpy package and loads the training data into a feature matrix and a class label matrix in the same way as the training program we implement above −
首先,我们的评估程序导入numpy程序包,并将训练数据加载到特征矩阵和类标签矩阵中,方法与我们在上面实现的训练程序相同-
import numpy as np
def main():
data_file = ".\\dataLRmodel.txt" # provide the name and the location of data file
features_mat = np.loadtxt(data_file, dtype=np.float32, delimiter=",",
skiprows=0, usecols=(0,1))
labels_mat = np.loadtxt(data_file, dtype=np.float32, delimiter=",",
skiprows=0, usecols=[2], ndmin=2)
Next, it is time to set the values of the weights and the bias that were determined by our training program −
接下来,是时候设置由我们的训练计划确定的权重和偏差的值了-
print("Setting weights and bias values \n")
weights = np.array([0.0925, 1.1722], dtype=np.float32)
bias = np.array([-4.5400], dtype=np.float32)
N = len(features_mat)
features_dim = 2
Next our evaluation program will compute the logistic regression probability by walking through each training items as follows −
接下来,我们的评估程序将通过遍历每个训练项目来计算逻辑回归概率,如下所示:
print("item pred_prob pred_label act_label result")
for i in range(0, N): # each item
x = features_mat[i]
z = 0.0
for j in range(0, features_dim):
z += x[j] * weights[j]
z += bias[0]
pred_prob = 1.0 / (1.0 + np.exp(-z))
pred_label = 0 if pred_prob < 0.5 else 1
act_label = labels_mat[i]
pred_str = ‘correct’ if np.absolute(pred_label - act_label) < 1.0e-5 \
else ‘WRONG’
print("%2d %0.4f %0.0f %0.0f %s" % \ (i, pred_prob, pred_label, act_label, pred_str))
Now let us demonstrate how to do prediction −
现在让我们演示如何进行预测-
x = np.array([9.5, 4.5], dtype=np.float32)
print("\nPredicting class for age, education = ")
print(x)
z = 0.0
for j in range(0, features_dim):
z += x[j] * weights[j]
z += bias[0]
p = 1.0 / (1.0 + np.exp(-z))
print("Predicted p = " + str(p))
if p < 0.5: print("Predicted class = 0")
else: print("Predicted class = 1")
import numpy as np
def main():
data_file = ".\\dataLRmodel.txt" # provide the name and the location of data file
features_mat = np.loadtxt(data_file, dtype=np.float32, delimiter=",",
skiprows=0, usecols=(0,1))
labels_mat = np.loadtxt(data_file, dtype=np.float32, delimiter=",",
skiprows=0, usecols=[2], ndmin=2)
print("Setting weights and bias values \n")
weights = np.array([0.0925, 1.1722], dtype=np.float32)
bias = np.array([-4.5400], dtype=np.float32)
N = len(features_mat)
features_dim = 2
print("item pred_prob pred_label act_label result")
for i in range(0, N): # each item
x = features_mat[i]
z = 0.0
for j in range(0, features_dim):
z += x[j] * weights[j]
z += bias[0]
pred_prob = 1.0 / (1.0 + np.exp(-z))
pred_label = 0 if pred_prob < 0.5 else 1
act_label = labels_mat[i]
pred_str = ‘correct’ if np.absolute(pred_label - act_label) < 1.0e-5 \
else ‘WRONG’
print("%2d %0.4f %0.0f %0.0f %s" % \ (i, pred_prob, pred_label, act_label, pred_str))
x = np.array([9.5, 4.5], dtype=np.float32)
print("\nPredicting class for age, education = ")
print(x)
z = 0.0
for j in range(0, features_dim):
z += x[j] * weights[j]
z += bias[0]
p = 1.0 / (1.0 + np.exp(-z))
print("Predicted p = " + str(p))
if p < 0.5: print("Predicted class = 0")
else: print("Predicted class = 1")
if __name__ == "__main__":
main()
Setting weights and bias values.
设置权重和偏差值。
Item pred_prob pred_label act_label result
0 0.3640 0 0 correct
1 0.7254 1 0 WRONG
2 0.2019 0 0 correct
3 0.3562 0 0 correct
4 0.0493 0 0 correct
5 0.1005 0 0 correct
6 0.7892 1 1 correct
7 0.8564 1 1 correct
8 0.9654 1 1 correct
9 0.7587 1 1 correct
10 0.3040 0 1 WRONG
11 0.7129 1 1 correct
Predicting class for age, education =
[9.5 4.5]
Predicting p = 0.526487952
Predicting class = 1
This chapter deals with concepts of Neural Network with regards to CNTK.
本章讨论有关CNTK的神经网络概念。
As we know that, several layers of neurons are used for making a neural network. But, the question arises that in CNTK how we can model the layers of a NN? It can be done with the help of layer functions defined in the layer module.
众所周知,多层神经元被用于制造神经网络。 但是,出现的问题是,在CNTK中,我们如何对NN的各个层进行建模? 可以借助图层模块中定义的图层功能来完成。
Actually, in CNTK, working with the layers has a distinct functional programming feel to it. Layer function looks like a regular function and it produces a mathematical function with a set of predefined parameters. Let’s see how we can create the most basic layer type, Dense, with the help of layer function.
实际上,在CNTK中,使用这些层具有独特的功能编程感觉。 图层函数看起来像常规函数,它会生成带有一组预定义参数的数学函数。 让我们看看如何在图层功能的帮助下创建最基本的图层类型Dense。
With the help of following basic steps, we can create the most basic layer type −
借助以下基本步骤,我们可以创建最基本的图层类型-
Step 1 − First, we need to import the Dense layer function from the layers’ package of CNTK.
步骤1-首先,我们需要从CNTK的图层包中导入Dense图层功能。
from cntk.layers import Dense
Step 2 − Next from the CNTK root package, we need to import the input_variable function.
步骤2-接下来从CNTK根包中,我们需要导入input_variable函数。
from cntk import input_variable
Step 3 − Now, we need to create a new input variable using the input_variable function. We also need to provide the its size.
步骤3-现在,我们需要使用input_variable函数创建一个新的输入变量。 我们还需要提供其大小。
feature = input_variable(100)
Step 4 − At last, we will create a new layer using Dense function along with providing the number of neurons we want.
步骤4-最后,我们将使用Dense函数创建一个新层,并提供所需的神经元数量。
layer = Dense(40)(feature)
Now, we can invoke the configured Dense layer function to connect the Dense layer to the input.
现在,我们可以调用已配置的Dense层函数以将Dense层连接到输入。
from cntk.layers import Dense
from cntk import input_variable
feature= input_variable(100)
layer = Dense(40)(feature)
As we have seen CNTK provides us with a pretty good set of defaults for building NNs. Based on activation function and other settings we choose, the behavior as well as performance of the NN is different. It is another very useful stemming algorithm. That’s the reason, it is good to understand what we can configure.
正如我们所看到的,CNTK为我们提供了一组很好的默认值,用于构建NN。 根据激活功能和我们选择的其他设置,NN的行为和性能是不同的。 这是另一个非常有用的词干算法。 这就是原因,很高兴了解我们可以配置的内容。
Each layer in NN has its unique configuration options and when we talk about Dense layer, we have following important settings to define −
NN中的每一层都有其独特的配置选项,当我们谈论密集层时,我们具有以下重要设置来定义-
shape − As name implies, it defines the output shape of the layer which further determines the number of neurons in that layer.
形状 -顾名思义,它定义了该层的输出形状,该形状进一步确定了该层中神经元的数量。
activation − It defines the activation function of that layer, so it can transform the input data.
激活 -它定义了该层的激活功能,因此它可以转换输入数据。
init − It defines the initialisation function of that layer. It will initialise the parameters of the layer when we start training the NN.
init-定义该层的初始化功能。 当我们开始训练NN时,它将初始化该层的参数。
Let’s see the steps with the help of which we can configure a Dense layer −
让我们看看可以配置密集层的步骤-
Step1 − First, we need to import the Dense layer function from the layers’ package of CNTK.
步骤 1-首先,我们需要从CNTK的图层包中导入Dense图层功能。
from cntk.layers import Dense
Step2 − Next from the CNTK ops package, we need to import the sigmoid operator. It will be used to configure as an activation function.
步骤 2-接下来从CNTK ops包中,我们需要导入sigmoid运算符 。 它将被配置为激活功能。
from cntk.ops import sigmoid
Step3 − Now, from initializer package, we need to import the glorot_uniform initializer.
步骤 3-现在,从初始化程序包中,我们需要导入glorot_uniform初始化程序。
from cntk.initializer import glorot_uniform
Step4 − At last, we will create a new layer using Dense function along with providing the number of neurons as the first argument. Also, provide the sigmoid operator as activation function and the glorot_uniform as the init function for the layer.
步骤4-最后,我们将使用Dense函数创建一个新层,并提供神经元数量作为第一个参数。 同样,提供Sigmoid运算符作为激活函数,并提供glorot_uniform作为该层的初始化函数。
layer = Dense(50, activation = sigmoid, init = glorot_uniform)
from cntk.layers import Dense
from cntk.ops import sigmoid
from cntk.initializer import glorot_uniform
layer = Dense(50, activation = sigmoid, init = glorot_uniform)
Till now, we have seen how to create the structure of a NN and how to configure various settings. Here, we will see, how we can optimise the parameters of a NN. With the help of the combination of two components namely learners and trainers, we can optimise the parameters of a NN.
到目前为止,我们已经了解了如何创建NN的结构以及如何配置各种设置。 在这里,我们将看到如何优化NN的参数。 借助于学习者和培训者两个组件的组合,我们可以优化神经网络的参数。
The first component which is used to optimise the parameters of a NN is trainer component. It basically implements the backpropagation process. If we talk about its working, it passes the data through the NN to obtain a prediction.
用于优化NN参数的第一个组件是Trainer组件。 它基本上实现了反向传播过程。 如果我们谈论它的工作原理,它将数据通过NN传递以获得预测。
After that, it uses another component called learner in order to obtain the new values for the parameters in a NN. Once it obtains the new values, it applies these new values and repeat the process until an exit criterion is met.
此后,它使用另一个称为学习器的组件来获取NN中参数的新值。 一旦获得新值,它将应用这些新值并重复该过程,直到满足退出标准为止。
The second component which is used to optimise the parameters of a NN is learner component, which is basically responsible for performing the gradient descent algorithm.
用于优化NN参数的第二个组件是学习器组件,它主要负责执行梯度下降算法。
Following is the list of some of the interesting learners included in CNTK library −
以下是CNTK库中包含的一些有趣的学习者的列表-
Stochastic Gradient Descent (SGD) − This learner represents the basic stochastic gradient descent, without any extras.
随机梯度下降(SGD) -该学习器表示基本的随机梯度下降,没有任何额外的功能。
Momentum Stochastic Gradient Descent (MomentumSGD) − With SGD, this learner applies the momentum to overcome the problem of local maxima.
动量随机梯度下降(MomentumSGD) -通过SGD,该学习者可以利用动量来克服局部极大值的问题。
RMSProp − This learner, in order to control the rate of descent, uses decaying learning rates.
RMSProp-该学习者为了控制下降速度,使用递减的学习速度。
Adam − This learner, in order to decrease the rate of descent over time, uses decaying momentum.
亚当 -为了降低随时间下降的速度,该学习者使用了衰减的动量。
Adagrad − This learner, for frequently as well as infrequently occurring features, uses different learning rates.
Adagrad-该学习者对于频繁使用和不频繁使用的功能都使用不同的学习率。
This chapter will elaborate on creating a neural network in CNTK.
本章将详细介绍如何在CNTK中创建神经网络。
In order to apply CNTK concepts to build our first NN, we are going to use NN to classify species of iris flowers based on the physical properties of sepal width and length, and petal width and length. The dataset which we will be using iris dataset that describes the physical properties of different varieties of iris flowers −
为了应用CNTK概念构建我们的第一个NN,我们将使用NN根据萼片宽度和长度以及花瓣宽度和长度的物理属性对鸢尾花的种类进行分类。 我们将使用虹膜数据集的数据集,该数据集描述了不同品种的鸢尾花的物理特性-
Here, we will be building a regular NN called a feedforward NN. Let us see the implementation steps to build the structure of NN −
在这里,我们将构建一个称为前馈NN的常规NN。 让我们看看构建NN结构的实现步骤-
Step 1 − First, we will import the necessary components such as our layer types, activation functions, and a function that allows us to define an input variable for our NN, from CNTK library.
步骤1-首先,我们将从CNTK库中导入必要的组件,例如图层类型,激活函数,以及允许我们为NN定义输入变量的函数。
from cntk import default_options, input_variable
from cntk.layers import Dense, Sequential
from cntk.ops import log_softmax, relu
Step 2 − After that, we will create our model using sequential function. Once created, we will feed it with the layers we want. Here, we are going to create two distinct layers in our NN; one with four neurons and another with three neurons.
步骤2-之后,我们将使用顺序函数创建模型。 创建完成后,我们将向其提供所需的图层。 在这里,我们将在NN中创建两个不同的层。 一个带有四个神经元,另一个带有三个神经元。
model = Sequential([Dense(4, activation=relu), Dense(3, activation=log_sogtmax)])
Step 3 − At last, in order to compile the NN, we will bind the network to the input variable. It has an input layer with four neurons and an output layer with three neurons.
步骤3-最后,为了编译NN,我们将网络绑定到输入变量。 它具有带有四个神经元的输入层和带有三个神经元的输出层。
feature= input_variable(4)
z = model(feature)
There are lots of activation functions to choose from and choosing the right activation function will definitely make a big difference to how well our deep learning model will perform.
有很多激活函数可供选择,选择正确的激活函数肯定会对我们的深度学习模型的性能产生很大的影响。
Choosing an activation function at the output layer will depend upon the kind of problem we are going to solve with our model.
在输出层选择激活函数将取决于我们要用模型解决的问题的类型。
For a regression problem, we should use a linear activation function on the output layer.
对于回归问题,我们应该在输出层上使用线性激活函数 。
For a binary classification problem, we should use a sigmoid activation function on the output layer.
对于二进制分类问题,我们应该在输出层上使用S型激活函数 。
For multi-class classification problem, we should use a softmax activation function on the output layer.
对于多类别分类问题,我们应该在输出层上使用softmax激活函数 。
Here, we are going to build a model for predicting one of the three classes. It means we need to use softmax activation function at output layer.
在这里,我们将建立一个模型来预测这三个类别之一。 这意味着我们需要在输出层使用softmax激活功能 。
Choosing an activation function at the hidden layer requires some experimentation for monitoring the performance to see which activation function works well.
在隐藏层选择激活功能需要进行一些实验,以监视性能以查看哪个激活功能运行良好。
In a classification problem, we need to predict the probability a sample belongs to a specific class. That’s why we need an activation function that gives us probabilistic values. To reach this goal, sigmoid activation function can help us.
在分类问题中,我们需要预测样本属于特定类别的概率。 这就是为什么我们需要一个激活函数来给我们概率值。 为了达到这个目标, sigmod激活功能可以为我们提供帮助。
One of the major problems associated with sigmoid function is vanishing gradient problem. To overcome such problem, we can use ReLU activation function that coverts all negative values to zero and works as a pass-through filter for positive values.
与S形函数相关的主要问题之一是消失的梯度问题。 为了克服这种问题,我们可以使用ReLU激活功能 ,该功能将所有负值都覆盖为零,并用作正值的直通滤波器。
Once, we have the structure for our NN model, we must have to optimise it. For optimising we need a loss function. Unlike activation functions, we have very less loss functions to choose from. However, choosing a loss function will depend upon the kind of problem we are going to solve with our model.
一旦有了神经网络模型的结构,就必须对其进行优化。 为了优化,我们需要一个损失函数 。 与激活函数不同,我们有很少的损失函数可供选择。 但是,选择损失函数将取决于我们要用模型解决的问题的种类。
For example, in a classification problem, we should use a loss function that can measure the difference between a predicted class and an actual class.
例如,在分类问题中,我们应该使用损失函数,该函数可以测量预测类和实际类之间的差异。
For the classification problem, we are going to solve with our NN model, categorical cross entropy loss function is the best candidate. In CNTK, it is implemented as cross_entropy_with_softmax which can be imported from cntk.losses package, as follows−
对于分类问题,我们将使用我们的NN模型来解决, 分类交叉熵损失函数是最佳选择。 在CNTK中,它实现为cross_entropy_with_softmax ,可以从cntk.losses包中导入,如下所示:
label= input_variable(3)
loss = cross_entropy_with_softmax(z, label)
With having the structure for our NN model and a loss function to apply, we have all the ingredients to start making the recipe for optimising our deep learning model. But, before getting deep dive into this, we should learn about metrics.
有了NN模型的结构并应用损失函数后,我们就拥有了一切要素来开始制定优化我们的深度学习模型的方法。 但是,在深入了解这一点之前,我们应该了解指标。
cntk.metrics
CNTK has the package named cntk.metrics from which we can import the metrics we are going to use. As we are building a classification model, we will be using classification_error matric that will produce a number between 0 and 1. The number between 0 and 1 indicates the percentage of samples correctly predicted −
CNTK有一个名为cntk.metrics的软件包,我们可以从中导入将要使用的度量。 正如我们正在建立一个分类模型,我们将使用classification_error基质,将产生0与1之间的数字0和1之间的数字表示样品的百分比正确预测-
First, we need to import the metric from cntk.metrics package −
首先,我们需要从cntk.metrics包中导入指标-
from cntk.metrics import classification_error
error_rate = classification_error(z, label)
The above function actually needs the output of the NN and the expected label as input.
上面的函数实际上需要NN的输出和期望的标签作为输入。
Here, we will understand about training the Neural Network in CNTK.
在这里,我们将了解有关在CNTK中训练神经网络的信息。
In the previous section, we have defined all the components for the deep learning model. Now it is time to train it. As we discussed earlier, we can train a NN model in CNTK using the combination of learner and trainer.
在上一节中,我们定义了深度学习模型的所有组件。 现在该训练它了。 如前所述,我们可以使用学习者和训练者的组合训练 CNTK中的NN模型。
In this section, we will be defining the learner. CNTK provides several learners to choose from. For our model, defined in previous sections, we will be using Stochastic Gradient Descent (SGD) learner.
在本节中,我们将定义学习者 。 CNTK提供了一些学习者供您选择。 对于前面几节中定义的模型,我们将使用随机梯度下降(SGD)学习器 。
In order to train the neural network, let us configure the learner and trainer with the help of following steps −
为了训练神经网络,让我们在以下步骤的帮助下配置学习者和训练者 -
Step 1 − First, we need to import sgd function from cntk.lerners package.
步骤1-首先,我们需要从cntk.lerners包中导入sgd函数。
from cntk.learners import sgd
Step 2 − Next, we need to import Trainer function from cntk.train.trainer package.
步骤2-接下来,我们需要从cntk.train .trainer包中导入Trainer函数。
from cntk.train.trainer import Trainer
Step 3 − Now, we need to create a learner. It can be created by invoking sgd function along with providing model’s parameters and a value for the learning rate.
步骤3-现在,我们需要创建一个学习者 。 可以通过调用sgd函数以及提供模型的参数和学习率值来创建它。
learner = sgd(z.parametrs, 0.01)
Step 4 − At last, we need to initialize the trainer. It must be provided the network, the combination of the loss and metric along with the learner.
步骤4-最后,我们需要初始化Trainer 。 它必须与学习者一起提供网络, 损失和度量的组合。
trainer = Trainer(z, (loss, error_rate), [learner])
The learning rate which controls the speed of optimisation should be small number between 0.1 to 0.001.
控制优化速度的学习率应在0.1到0.001之间。
from cntk.learners import sgd
from cntk.train.trainer import Trainer
learner = sgd(z.parametrs, 0.01)
trainer = Trainer(z, (loss, error_rate), [learner])
Once we chose and configured the trainer, it is time to load the dataset. We have saved the iris dataset as a .CSV file and we will be using data wrangling package named pandas to load the dataset.
一旦我们选择并配置了培训者,就该加载数据集了。 我们将虹膜数据集另存为。 CSV文件,我们将使用名为pandas的数据整理包来加载数据集。
Step 1 − First, we need to import the pandas package.
步骤1-首先,我们需要导入pandas包。
from import pandas as pd
Step 2 − Now, we need to invoke the function named read_csv function to load the .csv file from the disk.
步骤2-现在,我们需要调用名为read_csv函数的函数以从磁盘加载.csv文件。
df_source = pd.read_csv(‘iris.csv’, names = [‘sepal_length’, ‘sepal_width’,
‘petal_length’, ‘petal_width’, index_col=False)
Once we load the dataset, we need to split it into a set of features and a label.
加载数据集后,我们需要将其拆分为一组要素和一个标签。
Step 1 − First, we need to select all rows and first four columns from the dataset. It can be done by using iloc function.
步骤1-首先,我们需要从数据集中选择所有行和前四列。 可以通过使用iloc函数来完成。
x = df_source.iloc[:, :4].values
Step 2 − Next we need to select the species column from iris dataset. We will be using the values property to access the underlying numpy array.
步骤2-接下来,我们需要从虹膜数据集中选择物种列。 我们将使用values属性访问基础的numpy数组。
x = df_source[‘species’].values
As we discussed earlier, our model is based on classification, it requires numeric input values. Hence, here we need to encode the species column to a numeric vector representation. Let’s see the steps to do it −
如前所述,我们的模型基于分类,它需要数字输入值。 因此,这里我们需要将种类列编码为数字矢量表示。 让我们看看执行此操作的步骤-
Step 1 − First, we need to create a list expression to iterate over all elements in the array. Then perform a look up in the label_mapping dictionary for each value.
步骤1-首先,我们需要创建一个列表表达式来遍历数组中的所有元素。 然后在label_mapping字典中对每个值进行查找。
label_mapping = {‘Iris-Setosa’ : 0, ‘Iris-Versicolor’ : 1, ‘Iris-Virginica’ : 2}
Step 2 − Next, convert this converted numeric value to a one-hot encoded vector. We will be using one_hot function as follows −
步骤2-接下来,将此转换后的数值转换为单次编码的矢量。 我们将使用one_hot函数,如下所示:
def one_hot(index, length):
result = np.zeros(length)
result[index] = 1
return result
Step 3 − At last, we need to turn this converted list into a numpy array.
步骤3-最后,我们需要将转换后的列表转换为一个numpy数组。
y = np.array([one_hot(label_mapping[v], 3) for v in y])
The situation, when your model remembers samples but can’t deduce rules from the training samples, is overfitting. With the help of following steps, we can detect overfitting on our model −
当您的模型记住样本但无法从训练样本中得出规则时,情况就变得过拟合了。 借助以下步骤,我们可以检测模型的过度拟合-
Step 1 − First, from sklearn package, import the train_test_split function from the model_selection module.
步骤1-首先,从sklearn包中,从model_selection模块导入train_test_split函数。
from sklearn.model_selection import train_test_split
Step 2 − Next, we need to invoke the train_test_split function with features x and labels y as follows −
步骤2-接下来,我们需要调用带有特征x和标签y的train_test_split函数,如下所示:
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0-2,
stratify=y)
We specified a test_size of 0.2 to set aside 20% of total data.
我们将test_size指定为0.2以预留总数据的20%。
label_mapping = {‘Iris-Setosa’ : 0, ‘Iris-Versicolor’ : 1, ‘Iris-Virginica’ : 2}
Step 1 − In order to train our model, first, we will be invoking the train_minibatch method. Then give it a dictionary that maps the input data to the input variable that we have used to define the NN and its associated loss function.
步骤1-为了训练我们的模型,首先,我们将调用train_minibatch方法。 然后给它一个字典,将输入数据映射到我们用来定义NN及其相关损失函数的输入变量。
trainer.train_minibatch({ features: X_train, label: y_train})
Step 2 − Next, call train_minibatch by using the following for loop −
步骤2-接下来,使用以下for循环调用train_minibatch-
for _epoch in range(10):
trainer.train_minbatch ({ feature: X_train, label: y_train})
print(‘Loss: {}, Acc: {}’.format(
trainer.previous_minibatch_loss_average,
trainer.previous_minibatch_evaluation_average))
from import pandas as pd
df_source = pd.read_csv(‘iris.csv’, names = [‘sepal_length’, ‘sepal_width’, ‘petal_length’, ‘petal_width’, index_col=False)
x = df_source.iloc[:, :4].values
x = df_source[‘species’].values
label_mapping = {‘Iris-Setosa’ : 0, ‘Iris-Versicolor’ : 1, ‘Iris-Virginica’ : 2}
def one_hot(index, length):
result = np.zeros(length)
result[index] = 1
return result
y = np.array([one_hot(label_mapping[v], 3) for v in y])
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0-2, stratify=y)
label_mapping = {‘Iris-Setosa’ : 0, ‘Iris-Versicolor’ : 1, ‘Iris-Virginica’ : 2}
trainer.train_minibatch({ features: X_train, label: y_train})
for _epoch in range(10):
trainer.train_minbatch ({ feature: X_train, label: y_train})
print(‘Loss: {}, Acc: {}’.format(
trainer.previous_minibatch_loss_average,
trainer.previous_minibatch_evaluation_average))
In order to optimise our NN model, whenever we pass data through the trainer, it measures the performance of the model through the metric that we configured for trainer. Such measurement of performance of NN model during training is on training data. But on the other hand, for a full analysis of the model performance we need to use test data as well.
为了优化我们的NN模型,每当我们通过培训员传递数据时,它都会通过为培训员配置的指标来衡量模型的性能。 训练期间神经网络模型性能的这种测量是在训练数据上进行的。 但另一方面,要对模型性能进行全面分析,我们还需要使用测试数据。
So, to measure the performance of the model using the test data, we can invoke the test_minibatch method on the trainer as follows −
因此,要使用测试数据衡量模型的性能,我们可以在训练器上调用test_minibatch方法,如下所示:
trainer.test_minibatch({ features: X_test, label: y_test})
Once you trained a deep learning model, the most important thing is to make predictions using that. In order to make prediction from the above trained NN, we can follow the given steps−
训练了深度学习模型后,最重要的是使用该模型进行预测。 为了从上面训练有素的NN进行预测,我们可以按照给定的步骤进行操作-
Step 1 − First, we need to pick a random item from the test set using the following function −
步骤1-首先,我们需要使用以下函数从测试集中选择一个随机项目-
np.random.choice
Step 2 − Next, we need to select the sample data from the test set by using sample_index.
步骤2-接下来,我们需要使用sample_index从测试集中选择样本数据。
Step 3 − Now, in order to convert the numeric output to the NN to an actual label, create an inverted mapping.
步骤3-现在,为了将数字输出转换为NN并转换为实际标签,请创建一个反向映射。
Step 4 − Now, use the selected sample data. Make a prediction by invoking the NN z as a function.
步骤4-现在,使用选定的样本数据。 通过调用NN z进行预测。
Step 5 − Now, once you got the predicted output, take the index of the neuron that has the highest value as the predicted value. It can be done by using the np.argmax function from the numpy package.
步骤5-现在,一旦获得了预测输出,就将具有最高值的神经元的索引作为预测值。 可以使用numpy包中的np.argmax函数来完成。
Step 6 − At last, convert the index value into the real label by using inverted_mapping.
第6步 -最后,通过使用reverse_mapping将索引值转换为实数标签。
sample_index = np.random.choice(X_test.shape[0])
sample = X_test[sample_index]
inverted_mapping = {
1:’Iris-setosa’,
2:’Iris-versicolor’,
3:’Iris-virginica’
}
prediction = z(sample)
predicted_label = inverted_mapping[np.argmax(prediction)]
print(predicted_label)
After training the above deep learning model and running it, you will get the following output −
在训练了上面的深度学习模型并运行它之后,您将获得以下输出:
Iris-versicolor
In this chapter, we will learn about how to work with the in-memory and large datasets in CNTK.
在本章中,我们将学习如何使用CNTK中的内存和大型数据集。
When we talk about feeding data into CNTK trainer, there can be many ways, but it will depend upon the size of the dataset and format of the data. The data sets can be small in-memory or large datasets.
当我们谈论将数据馈入CNTK培训器时,可以有很多方法,但这取决于数据集的大小和数据的格式。 数据集可以是小型内存数据集,也可以是大型数据集。
In this section, we are going to work with in-memory datasets. For this, we will use the following two frameworks −
在本节中,我们将使用内存数据集。 为此,我们将使用以下两个框架-
Here, we will work with a numpy based randomly generated dataset in CNTK. In this example, we are going to simulate data for a binary classification problem. Suppose, we have a set of observations with 4 features and want to predict two possible labels with our deep learning model.
在这里,我们将使用CNTK中基于numpy的随机生成的数据集。 在此示例中,我们将模拟二进制分类问题的数据。 假设我们有一组具有4个特征的观察结果,并希望使用我们的深度学习模型预测两个可能的标签。
For this, first we must generate a set of labels containing a one-hot vector representation of the labels, we want to predict. It can be done with the help of following steps −
为此,首先我们必须生成一组标签,其中包含我们要预测的标签的单热矢量表示。 可以通过以下步骤完成-
Step 1 − Import the numpy package as follows −
步骤1-如下导入numpy包-
import numpy as np
num_samples = 20000
Step 2 − Next, generate a label mapping by using np.eye function as follows −
步骤2-接下来,使用np.eye函数生成标签映射,如下所示-
label_mapping = np.eye(2)
Step 3 − Now by using np.random.choice function, collect the 20000 random samples as follows −
步骤3-现在通过使用np.random.choice函数,如下收集20000个随机样本-
y = label_mapping[np.random.choice(2,num_samples)].astype(np.float32)
Step 4 − Now at last by using np.random.random function, generate an array of random floating point values as follows −
步骤4-现在最后通过使用np.random.random函数,生成一个随机浮点值数组,如下所示:
x = np.random.random(size=(num_samples, 4)).astype(np.float32)
Once, we generate an array of random floating-point values, we need to convert them to 32-bit floating point numbers so that it can be matched to the format expected by CNTK. Let’s follow the steps below to do this −
一次,我们生成了一个随机浮点值数组,我们需要将它们转换为32位浮点数,以便它可以与CNTK期望的格式匹配。 让我们按照以下步骤进行操作-
Step 5 − Import the Dense and Sequential layer functions from cntk.layers module as follows −
步骤5-从cntk.layers模块导入密集和顺序图层功能,如下所示-
from cntk.layers import Dense, Sequential
Step 6 − Now, we need to import the activation function for the layers in the network. Let us import the sigmoid as activation function −
步骤6-现在,我们需要为网络中的层导入激活功能。 让我们导入Sigmoid作为激活函数-
from cntk import input_variable, default_options
from cntk.ops import sigmoid
Step 7 − Now, we need to import the loss function to train the network. Let us import binary_cross_entropy as loss function −
步骤7-现在,我们需要导入损失功能来训练网络。 让我们导入binary_cross_entropy作为损失函数-
from cntk.losses import binary_cross_entropy
Step 8 − Next, we need to define the default options for the network. Here, we will be providing the sigmoid activation function as a default setting. Also, create the model by using Sequential layer function as follows −
步骤8-接下来,我们需要定义网络的默认选项。 在这里,我们将提供S型激活功能作为默认设置。 另外,通过使用顺序图层函数创建模型,如下所示:
with default_options(activation=sigmoid):
model = Sequential([Dense(6),Dense(2)])
Step 9 − Next, initialise an input_variable with 4 input features serving as the input for the network.
步骤9-接下来,使用4个输入要素 (用作网络的输入)初始化input_variable 。
features = input_variable(4)
Step 10 − Now, in order to complete it, we need to connect features variable to the NN.
步骤10-现在,为了完成它,我们需要将要素变量连接到NN。
z = model(features)
So, now we have a NN, with the help of following steps, let us train it using in-memory dataset −
所以,现在我们有了一个NN,在以下步骤的帮助下,让我们使用内存数据集训练它-
Step 11 − To train this NN, first we need to import learner from cntk.learners module. We will import sgd learner as follows −
步骤11-要训练该NN,首先我们需要从cntk.learners模块导入学习者。 我们将如下导入sgd学习器-
from cntk.learners import sgd
Step 12 − Along with that import the ProgressPrinter from cntk.logging module as well.
步骤12-以及从cntk.logging模块导入ProgressPrinter的 步骤 。
from cntk.logging import ProgressPrinter
progress_writer = ProgressPrinter(0)
Step 13 − Next, define a new input variable for the labels as follows −
步骤13-接下来,为标签定义一个新的输入变量,如下所示-
labels = input_variable(2)
Step 14 − In order to train the NN model, next, we need to define a loss using the binary_cross_entropy function. Also, provide the model z and the labels variable.
步骤14-为了训练NN模型,接下来,我们需要使用binary_cross_entropy函数定义一个损失。 另外,提供模型z和标签变量。
loss = binary_cross_entropy(z, labels)
Step 15 − Next, initialize the sgd learner as follows −
步骤15-接下来,如下初始化sgd学习器-
learner = sgd(z.parameters, lr=0.1)
Step 16 − At last, call the train method on the loss function. Also, provide it with the input data, the sgd learner and the progress_printer.−
步骤16-最后,在损失函数上调用train方法。 另外,向其提供输入数据, sgd学习器和progress_printer。
training_summary=loss.train((x,y),parameter_learners=[learner],callbacks=[progress_writer])
import numpy as np
num_samples = 20000
label_mapping = np.eye(2)
y = label_mapping[np.random.choice(2,num_samples)].astype(np.float32)
x = np.random.random(size=(num_samples, 4)).astype(np.float32)
from cntk.layers import Dense, Sequential
from cntk import input_variable, default_options
from cntk.ops import sigmoid
from cntk.losses import binary_cross_entropy
with default_options(activation=sigmoid):
model = Sequential([Dense(6),Dense(2)])
features = input_variable(4)
z = model(features)
from cntk.learners import sgd
from cntk.logging import ProgressPrinter
progress_writer = ProgressPrinter(0)
labels = input_variable(2)
loss = binary_cross_entropy(z, labels)
learner = sgd(z.parameters, lr=0.1)
training_summary=loss.train((x,y),parameter_learners=[learner],callbacks=[progress_writer])
Build info:
Built time: *** ** **** 21:40:10
Last modified date: *** *** ** 21:08:46 2019
Build type: Release
Build target: CPU-only
With ASGD: yes
Math lib: mkl
Build Branch: HEAD
Build SHA1:ae9c9c7c5f9e6072cc9c94c254f816dbdc1c5be6 (modified)
MPI distribution: Microsoft MPI
MPI version: 7.0.12437.6
-------------------------------------------------------------------
average since average since examples
loss last metric last
------------------------------------------------------
Learning rate per minibatch: 0.1
1.52 1.52 0 0 32
1.51 1.51 0 0 96
1.48 1.46 0 0 224
1.45 1.42 0 0 480
1.42 1.4 0 0 992
1.41 1.39 0 0 2016
1.4 1.39 0 0 4064
1.39 1.39 0 0 8160
1.39 1.39 0 0 16352
Numpy arrays are very limited in what they can contain and one of the most basic ways of storing data. For example, a single n-dimensional array can contain data of a single data type. But on the other hand, for many real-world cases we need a library that can handle more than one data type in a single dataset.
Numpy数组可以包含的内容以及存储数据的最基本方法之一非常有限。 例如,单个n维数组可以包含单个数据类型的数据。 但另一方面,对于许多实际情况,我们需要一个可处理单个数据集中多个数据类型的库。
One of the Python libraries called Pandas makes it easier to work with such kind of datasets. It introduces the concept of a DataFrame (DF) and allows us to load datasets from disk stored in various formats as DFs. For example, we can read DFs stored as CSV, JSON, Excel, etc.
一个名为Pandas的Python库使使用这种数据集变得更加容易。 它介绍了DataFrame(DF)的概念,并允许我们从以DF格式存储的磁盘中加载数据集。 例如,我们可以读取以CSV,JSON,Excel等格式存储的DF。
You can learn Python Pandas library in more detail at https://www.tutorialspoint.com/python_pandas/index.htm.
您可以在https://www.tutorialspoint.com/python_pandas/index.htm上详细了解Python Pandas库。
In this example, we are going to use the example of classifying three possible species of the iris flowers based on four properties. We have created this deep learning model in the previous sections too. The model is as follows −
在此示例中,我们将使用基于四个属性对鸢尾花的三种可能物种进行分类的示例。 我们也在之前的部分中创建了这种深度学习模型。 模型如下-
from cntk.layers import Dense, Sequential
from cntk import input_variable, default_options
from cntk.ops import sigmoid, log_softmax
from cntk.losses import binary_cross_entropy
model = Sequential([
Dense(4, activation=sigmoid),
Dense(3, activation=log_softmax)
])
features = input_variable(4)
z = model(features)
The above model contains one hidden layer and an output layer with three neurons to match the number of classes we can predict.
上面的模型包含一个隐藏层和一个包含三个神经元的输出层,以匹配我们可以预测的类数。
Next, we will use the train method and loss function to train the network. For this, first we must load and preprocess the iris dataset, so that it matches the expected layout and data format for the NN. It can be done with the help of following steps −
接下来,我们将使用训练方法和损失函数来训练网络。 为此,首先我们必须加载和预处理虹膜数据集,以使其与NN的预期布局和数据格式匹配。 可以通过以下步骤完成-
Step 1 − Import the numpy and Pandas package as follows −
步骤1-如下导入numpy和Pandas包-
import numpy as np
import pandas as pd
Step 2 − Next, use the read_csv function to load the dataset into memory −
步骤2-接下来,使用read_csv函数将数据集加载到内存中-
df_source = pd.read_csv(‘iris.csv’, names = [‘sepal_length’, ‘sepal_width’,
‘petal_length’, ‘petal_width’, ‘species’], index_col=False)
Step 3 − Now, we need to create a dictionary that will be mapping the labels in the dataset with their corresponding numeric representation.
步骤3-现在,我们需要创建一个字典,该字典将映射数据集中的标签及其对应的数字表示形式。
label_mapping = {‘Iris-Setosa’ : 0, ‘Iris-Versicolor’ : 1, ‘Iris-Virginica’ : 2}
Step 4 − Now, by using iloc indexer on the DataFrame, select the first four columns as follows −
步骤4-现在,通过在DataFrame上使用iloc索引器,如下选择前四列:
x = df_source.iloc[:, :4].values
Step 5 −Next, we need to select the species columns as the labels for the dataset. It can be done as follows −
步骤5-接下来,我们需要选择种类列作为数据集的标签。 它可以做到如下-
y = df_source[‘species’].values
Step 6 − Now, we need to map the labels in the dataset, which can be done by using label_mapping. Also, use one_hot encoding to convert them into one-hot encoding arrays.
步骤6-现在,我们需要在数据集中映射标签,这可以通过使用label_mapping来完成。 另外,使用one_hot编码将它们转换为one-hot编码数组。
y = np.array([one_hot(label_mapping[v], 3) for v in y])
Step 7 − Next, to use the features and the mapped labels with CNTK, we need to convert them both to floats −
步骤7-接下来,要将特征和映射标签与CNTK一起使用,我们需要将它们都转换为浮点数-
x= x.astype(np.float32)
y= y.astype(np.float32)
As we know that, the labels are stored in the dataset as strings and CNTK cannot work with these strings. That’s the reason, it needs one-hot encoded vectors representing the labels. For this, we can define a function say one_hot as follows −
众所周知,标签以字符串形式存储在数据集中,而CNTK无法使用这些字符串。 这就是原因,它需要表示标签的一键编码矢量。 为此,我们可以定义一个函数one_hot ,如下所示:
def one_hot(index, length):
result = np.zeros(length)
result[index] = index
return result
Now, we have the numpy array in the correct format, with the help of following steps we can use them to train our model −
现在,我们以正确的格式设置了numpy数组,在以下步骤的帮助下,我们可以使用它们来训练我们的模型-
Step 8 − First, we need to import the loss function to train the network. Let us import binary_cross_entropy_with_softmax as loss function −
步骤8-首先,我们需要导入损失功能来训练网络。 让我们导入binary_cross_entropy_with_softmax作为损失函数-
from cntk.losses import binary_cross_entropy_with_softmax
Step 9 − To train this NN, we also need to import learner from cntk.learners module. We will import sgd learner as follows −
步骤9-要训练该NN,我们还需要从cntk.learners模块导入学习者。 我们将如下导入sgd学习器-
from cntk.learners import sgd
Step 10 − Along with that import the ProgressPrinter from cntk.logging module as well.
步骤10-以及从cntk.logging模块导入ProgressPrinter的 步骤 。
from cntk.logging import ProgressPrinter
progress_writer = ProgressPrinter(0)
Step 11 − Next, define a new input variable for the labels as follows −
步骤11-接下来,为标签定义一个新的输入变量,如下所示-
labels = input_variable(3)
Step 12 − In order to train the NN model, next, we need to define a loss using the binary_cross_entropy_with_softmax function. Also provide the model z and the labels variable.
步骤12-为了训练NN模型,接下来,我们需要使用binary_cross_entropy_with_softmax函数定义损失。 还提供模型z和标签变量。
loss = binary_cross_entropy_with_softmax (z, labels)
Step 13 − Next, initialise the sgd learner as follows −
步骤13-接下来,如下初始化sgd学习器-
learner = sgd(z.parameters, 0.1)
Step 14 − At last, call the train method on the loss function. Also, provide it with the input data, the sgd learner and the progress_printer.
步骤14-最后,在损失函数上调用train方法。 另外,向其提供输入数据, sgd学习者和progress_printer 。
training_summary=loss.train((x,y),parameter_learners=[learner],callbacks=
[progress_writer],minibatch_size=16,max_epochs=5)
from cntk.layers import Dense, Sequential
from cntk import input_variable, default_options
from cntk.ops import sigmoid, log_softmax
from cntk.losses import binary_cross_entropy
model = Sequential([
Dense(4, activation=sigmoid),
Dense(3, activation=log_softmax)
])
features = input_variable(4)
z = model(features)
import numpy as np
import pandas as pd
df_source = pd.read_csv(‘iris.csv’, names = [‘sepal_length’, ‘sepal_width’, ‘petal_length’, ‘petal_width’, ‘species’], index_col=False)
label_mapping = {‘Iris-Setosa’ : 0, ‘Iris-Versicolor’ : 1, ‘Iris-Virginica’ : 2}
x = df_source.iloc[:, :4].values
y = df_source[‘species’].values
y = np.array([one_hot(label_mapping[v], 3) for v in y])
x= x.astype(np.float32)
y= y.astype(np.float32)
def one_hot(index, length):
result = np.zeros(length)
result[index] = index
return result
from cntk.losses import binary_cross_entropy_with_softmax
from cntk.learners import sgd
from cntk.logging import ProgressPrinter
progress_writer = ProgressPrinter(0)
labels = input_variable(3)
loss = binary_cross_entropy_with_softmax (z, labels)
learner = sgd(z.parameters, 0.1)
training_summary=loss.train((x,y),parameter_learners=[learner],callbacks=[progress_writer],minibatch_size=16,max_epochs=5)
Build info:
Built time: *** ** **** 21:40:10
Last modified date: *** *** ** 21:08:46 2019
Build type: Release
Build target: CPU-only
With ASGD: yes
Math lib: mkl
Build Branch: HEAD
Build SHA1:ae9c9c7c5f9e6072cc9c94c254f816dbdc1c5be6 (modified)
MPI distribution: Microsoft MPI
MPI version: 7.0.12437.6
-------------------------------------------------------------------
average since average since examples
loss last metric last
------------------------------------------------------
Learning rate per minibatch: 0.1
1.1 1.1 0 0 16
0.835 0.704 0 0 32
1.993 1.11 0 0 48
1.14 1.14 0 0 112
[………]
In the previous section, we worked with small in-memory datasets using Numpy and pandas, but not all datasets are so small. Specially the datasets containing images, videos, sound samples are large. MinibatchSource is a component, that can load data in chunks, provided by CNTK to work with such large datasets. Some of the features of MinibatchSource components are as follows −
在上一节中,我们使用Numpy和pandas处理了较小的内存数据集,但并非所有数据集都这么小。 特别是包含图像,视频,声音样本的数据集很大。 MinibatchSource是一个组件,可以按块加载数据,由CNTK提供,可以处理如此大的数据集。 MinibatchSource组件的一些功能如下-
MinibatchSource can prevent NN from overfitting by automatically randomize samples read from the data source.
MinibatchSource可以通过自动随机化从数据源读取的样本来防止NN过度拟合。
It has built-in transformation pipeline which can be used to augment the data.
它具有内置的转换管道,可用于扩充数据。
It loads the data on a background thread separate from the training process.
它将数据加载到与训练过程分开的后台线程中。
In the following sections, we are going to explore how to use a minibatch source with out-of-memory data to work with large datasets. We will also explore, how we can use it to feed for training a NN.
在以下各节中,我们将探讨如何使用带有内存不足数据的小批量源处理大型数据集。 我们还将探索如何使用它来训练神经网络。
In the previous section, we have used iris flower example and worked with small in-memory dataset using Pandas DataFrames. Here, we will be replacing the code that uses data from a pandas DF with MinibatchSource. First, we need to create an instance of MinibatchSource with the help of following steps −
在上一节中,我们使用了鸢尾花示例,并使用Pandas DataFrames处理了较小的内存数据集。 在这里,我们将使用MinibatchSource替换使用熊猫DF数据的代码。 首先,我们需要在以下步骤的帮助下创建MinibatchSource的实例-
Step 1 − First, from cntk.io module import the components for the minibatchsource as follows −
步骤1-首先,从cntk.io模块导入minibatchsource的组件,如下所示-
from cntk.io import StreamDef, StreamDefs, MinibatchSource, CTFDeserializer,
INFINITY_REPEAT
Step 2 − Now, by using StreamDef class, crate a stream definition for the labels.
步骤2-现在,通过使用StreamDef类,为标签创建流定义。
labels_stream = StreamDef(field=’labels’, shape=3, is_sparse=False)
Step 3 − Next, create to read the features filed from the input file, create another instance of StreamDef as follows.
步骤3-接下来,创建以读取输入文件中归档的功能,如下所示创建另一个StreamDef实例。
feature_stream = StreamDef(field=’features’, shape=4, is_sparse=False)
Step 4 − Now, we need to provide iris.ctf file as input and initialise the deserializer as follows −
步骤4-现在,我们需要提供iris.ctf文件作为输入并按如下方式初始化解串器 -
deserializer = CTFDeserializer(‘iris.ctf’, StreamDefs(labels=
label_stream, features=features_stream)
Step 5 − At last, we need to create instance of minisourceBatch by using deserializer as follows −
步骤5-最后,我们需要使用反序列化器创建minisourceBatch的实例,如下所示-
Minibatch_source = MinibatchSource(deserializer, randomize=True)
from cntk.io import StreamDef, StreamDefs, MinibatchSource, CTFDeserializer, INFINITY_REPEAT
labels_stream = StreamDef(field=’labels’, shape=3, is_sparse=False)
feature_stream = StreamDef(field=’features’, shape=4, is_sparse=False)
deserializer = CTFDeserializer(‘iris.ctf’, StreamDefs(labels=label_stream, features=features_stream)
Minibatch_source = MinibatchSource(deserializer, randomize=True)
As you have seen above, we are taking the data from ‘iris.ctf’ file. It has the file format called CNTK Text Format(CTF). It is mandatory to create a CTF file to get the data for the MinibatchSource instance we created above. Let us see how we can create a CTF file.
如您在上面看到的,我们正在从“ iris.ctf”文件中获取数据。 它具有称为CNTK文本格式(CTF)的文件格式。 必须创建一个CTF文件来获取上面创建的MinibatchSource实例的数据。 让我们看看如何创建CTF文件。
Step 1 − First, we need to import the pandas and numpy packages as follows −
步骤1-首先,我们需要如下导入pandas和numpy包-
import pandas as pd
import numpy as np
Step 2 − Next, we need to load our data file, i.e. iris.csv into memory. Then, store it in the df_source variable.
步骤2-接下来,我们需要将数据文件(即iris.csv)加载到内存中。 然后,将其存储在df_source变量中。
df_source = pd.read_csv(‘iris.csv’, names = [‘sepal_length’, ‘sepal_width’, ‘petal_length’, ‘petal_width’, ‘species’], index_col=False)
Step 3 − Now, by using iloc indexer as the features, take the content of the first four columns. Also, use the data from species column as follows −
步骤3-现在,通过使用iloc索引器作为功能,获取前四列的内容。 另外,使用来自物种列的数据,如下所示:
features = df_source.iloc[: , :4].values
labels = df_source[‘species’].values
Step 4 − Next, we need to create a mapping between the label name and its numeric representation. It can be done by creating label_mapping as follows −
步骤4-接下来,我们需要在标签名称与其数字表示形式之间创建映射。 可以通过如下创建label_mapping来完成-
label_mapping = {‘Iris-Setosa’ : 0, ‘Iris-Versicolor’ : 1, ‘Iris-Virginica’ : 2}
Step 5 − Now, convert the labels to a set of one-hot encoded vectors as follows −
步骤5-现在,将标签转换为一组单热编码矢量,如下所示-
labels = [one_hot(label_mapping[v], 3) for v in labels]
Now, as we did before, create a utility function called one_hot to encode the labels. It can be done as follows −
现在,像我们之前所做的那样,创建一个称为one_hot的实用程序函数来对标签进行编码。 它可以做到如下-
def one_hot(index, length):
result = np.zeros(length)
result[index] = 1
return result
As, we have loaded and preprocessed the data, it’s time to store it on disk in the CTF file format. We can do it with the help of following Python code −
由于我们已经加载并预处理了数据,是时候以CTF文件格式将其存储在磁盘上了。 我们可以在以下Python代码的帮助下做到这一点-
With open(‘iris.ctf’, ‘w’) as output_file:
for index in range(0, feature.shape[0]):
feature_values = ‘ ‘.join([str(x) for x in np.nditer(features[index])])
label_values = ‘ ‘.join([str(x) for x in np.nditer(labels[index])])
output_file.write(‘features {} | labels {} \n’.format(feature_values, label_values))
import pandas as pd
import numpy as np
df_source = pd.read_csv(‘iris.csv’, names = [‘sepal_length’, ‘sepal_width’, ‘petal_length’, ‘petal_width’, ‘species’], index_col=False)
features = df_source.iloc[: , :4].values
labels = df_source[‘species’].values
label_mapping = {‘Iris-Setosa’ : 0, ‘Iris-Versicolor’ : 1, ‘Iris-Virginica’ : 2}
labels = [one_hot(label_mapping[v], 3) for v in labels]
def one_hot(index, length):
result = np.zeros(length)
result[index] = 1
return result
With open(‘iris.ctf’, ‘w’) as output_file:
for index in range(0, feature.shape[0]):
feature_values = ‘ ‘.join([str(x) for x in np.nditer(features[index])])
label_values = ‘ ‘.join([str(x) for x in np.nditer(labels[index])])
output_file.write(‘features {} | labels {} \n’.format(feature_values, label_values))
Once you create MinibatchSource, instance, we need to train it. We can use the same training logic as used when we worked with small in-memory datasets. Here, we will use MinibatchSource instance as the input for the train method on loss function as follows −
创建实例MinibatchSource之后 ,我们需要对其进行培训。 我们可以使用与处理小型内存数据集时相同的训练逻辑。 在这里,我们将使用MinibatchSource实例作为损失函数的train方法的输入,如下所示:
Step 1 − In order to log the output of the training session, first import the ProgressPrinter from cntk.logging module as follows −
步骤1-为了记录培训课程的输出,请首先从cntk.logging模块导入ProgressPrinter,如下所示:
from cntk.logging import ProgressPrinter
Step 2 − Next, to set up the training session, import the trainer and training_session from cntk.train module as follows −
步骤2-接下来,要设置培训课程,请从cntk.train模块中导入trainer和training_session ,如下所示:
from cntk.train import Trainer,
Step 3 − Now, we need to define some set of constants like minibatch_size, samples_per_epoch and num_epochs as follows −
步骤3-现在,我们需要定义一些常量集,例如minibatch_size , samples_per_epoch和num_epochs ,如下所示:
minbatch_size = 16
samples_per_epoch = 150
num_epochs = 30
Step 4 − Next, in order to know CNTK how to read data during training, we need to define a mapping between the input variable for the network and the streams in the minibatch source.
步骤4-接下来,为了知道CNTK如何在训练期间读取数据,我们需要定义网络的输入变量和minibatch源中的流之间的映射。
input_map = {
features: minibatch.source.streams.features,
labels: minibatch.source.streams.features
}
Step 5 − Next, to log the output of the training process, initialise the progress_printer variable with a new ProgressPrinter instance as follows −
步骤5-接下来,要记录训练过程的输出,请使用新的ProgressPrinter实例初始化progress_printer变量,如下所示:
progress_writer = ProgressPrinter(0)
Step 6 − At last, we need to invoke the train method on the loss as follows −
步骤6-最后,我们需要对损失调用以下方法:
train_history = loss.train(minibatch_source,
parameter_learners=[learner],
model_inputs_to_streams=input_map,
callbacks=[progress_writer],
epoch_size=samples_per_epoch,
max_epochs=num_epochs)
from cntk.logging import ProgressPrinter
from cntk.train import Trainer, training_session
minbatch_size = 16
samples_per_epoch = 150
num_epochs = 30
input_map = {
features: minibatch.source.streams.features,
labels: minibatch.source.streams.features
}
progress_writer = ProgressPrinter(0)
train_history = loss.train(minibatch_source,
parameter_learners=[learner],
model_inputs_to_streams=input_map,
callbacks=[progress_writer],
epoch_size=samples_per_epoch,
max_epochs=num_epochs)
-------------------------------------------------------------------
average since average since examples
loss last metric last
------------------------------------------------------
Learning rate per minibatch: 0.1
1.21 1.21 0 0 32
1.15 0.12 0 0 96
[………]
This chapter will explain how to measure the model performance in CNKT.
本章将说明如何在CNKT中测量模型性能。
After building a ML model, we used to train it using a set of data samples. Because of this training our ML model learns and derive some general rules. The performance of ML model matters when we feed new samples, i.e., different samples than provided at the time of training, to the model. The model behaves differently in that case. It may be worse at making a good prediction on those new samples.
建立ML模型后,我们通常使用一组数据样本对其进行训练。 由于进行了这种培训,我们的机器学习模型学习并得出了一些通用规则。 当我们向模型提供新样本(即与训练时提供的样本不同)时,ML模型的性能至关重要。 在这种情况下,模型的行为会有所不同。 对这些新样本做出好的预测可能会更糟。
But the model must work well for new samples as well because in production environment we will get different input than we used sample data for training purpose. That’s the reason, we should validate the ML model by using a set of samples different from the samples we used for training purpose. Here, we are going to discuss two different techniques for creating a dataset for validating a NN.
但是该模型也必须适用于新样本,因为在生产环境中,我们将获得与用于训练目的的样本数据不同的输入。 因此,我们应该使用一组不同于训练目的的样本来验证ML模型。 在这里,我们将讨论用于创建用于验证NN的数据集的两种不同技术。
It is one of the easiest methods for creating a dataset to validate a NN. As name implies, in this method we will be holding back one set of samples from training (say 20%) and using it to test the performance of our ML model. Following diagram shows the ratio between training and validation samples −
它是创建数据集以验证NN的最简单方法之一。 顾名思义,在这种方法中,我们将从训练中保留一组样本(例如20%),并将其用于测试ML模型的性能。 下图显示了训练样本和验证样本之间的比率-
Hold-out dataset model ensures that we have enough data to train our ML model and at the same time we will have a reasonable number of samples to get good measurement of model’s performance.
保持数据集模型可确保我们有足够的数据来训练我们的ML模型,同时我们将有合理数量的样本来获得对模型性能的良好衡量。
In order to include in the training set and test set, it’s a good practice to choose random samples from the main dataset. It ensures an even distribution between training and test set.
为了将其包含在训练集和测试集中,从主数据集中选择随机样本是一个好习惯。 它确保训练和测试集之间的均匀分配。
Following is an example in which we are producing own hold-out dataset by using train_test_split function from the scikit-learn library.
以下是一个示例,其中我们使用scikit-learn库中的train_test_split函数来生成自己的保留数据集。
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# Here above test_size = 0.2 represents that we provided 20% of the data as test data.
from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics
classifier_knn = KNeighborsClassifier(n_neighbors=3)
classifier_knn.fit(X_train, y_train)
y_pred = classifier_knn.predict(X_test)
# Providing sample data and the model will make prediction out of that data
sample = [[5, 5, 3, 2], [2, 4, 3, 5]]
preds = classifier_knn.predict(sample)
pred_species = [iris.target_names[p] for p in preds] print("Predictions:", pred_species)
Predictions: ['versicolor', 'virginica']
While using CNTK, we need to randomise the order of our dataset each time we train our model because −
使用CNTK时,我们每次训练模型时都需要对数据集的顺序进行随机化处理,因为-
Deep learning algorithms are highly influenced by the random-number generators.
深度学习算法受随机数生成器的影响很大。
The order in which we provide the samples to NN during training greatly affects its performance.
我们在训练期间将样本提供给NN的顺序极大地影响了其性能。
The major downside of using the hold-out dataset technique is that it is unreliable because sometimes we get very good results but sometimes, we get bad results.
使用保留数据集技术的主要缺点是,它不可靠,因为有时我们会得到很好的结果,但是有时我们会得到不好的结果。
To make our ML model more reliable, there is a technique called K-fold cross validation. In nature K-fold cross validation technique is same as the previous technique, but it repeats it several times-usually about 5 to 10 times. Following diagram represents its concept −
为了使我们的ML模型更可靠,有一种称为K折交叉验证的技术。 本质上,K折交叉验证技术与以前的技术相同,但是会重复几次,通常约为5到10次。 下图表示其概念-
The working of K-fold cross validation can be understood with the help of following steps −
可以通过以下步骤了解K折交叉验证的工作方式-
Step 1 − Like in Hand-out dataset technique, in K-fold cross validation technique, first we need to split the dataset into a training and test set. Ideally, the ratio is 80-20, i.e. 80% of training set and 20% of test set.
步骤1-像分发数据集技术一样,在K折交叉验证技术中,首先我们需要将数据集分为训练和测试集。 理想情况下,该比率为80-20,即训练集的80%和测试集的20%。
Step 2 − Next, we need to train our model using the training set.
步骤2-接下来,我们需要使用训练集来训练我们的模型。
Step 3 −At last, we will be using the test set to measure the performance of our model. The only difference between Hold-out dataset technique and k-cross validation technique is that the above process gets repeated usually for 5 to 10 times and at the end the average is calculated over all the performance metrics. That average would be the final performance metrics.
步骤3-最后,我们将使用测试集来衡量模型的性能。 Hold-out数据集技术和k-cross验证技术之间的唯一区别是,上述过程通常重复5到10次,最后在所有性能指标上计算平均值。 该平均值将是最终的性能指标。
Let us see an example with a small dataset −
让我们看一个带有小数据集的示例-
from numpy import array
from sklearn.model_selection import KFold
data = array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0])
kfold = KFold(5, True, 1)
for train, test in kfold.split(data):
print('train: %s, test: %s' % (data[train],(data[test]))
train: [0.1 0.2 0.4 0.5 0.6 0.7 0.8 0.9], test: [0.3 1. ]
train: [0.1 0.2 0.3 0.4 0.6 0.8 0.9 1. ], test: [0.5 0.7]
train: [0.2 0.3 0.5 0.6 0.7 0.8 0.9 1. ], test: [0.1 0.4]
train: [0.1 0.3 0.4 0.5 0.6 0.7 0.9 1. ], test: [0.2 0.8]
train: [0.1 0.2 0.3 0.4 0.5 0.7 0.8 1. ], test: [0.6 0.9]
As we see, because of using a more realistic training and test scenario, k-fold cross validation technique gives us a much more stable performance measurement but, on the downside, it takes a lot of time when validating deep learning models.
如我们所见,由于使用了更为现实的训练和测试场景,k折交叉验证技术为我们提供了更加稳定的性能测量,但不利的是,验证深度学习模型需要大量时间。
CNTK does not support for k-cross validation, hence we need to write our own script to do so.
CNTK不支持k交叉验证,因此我们需要编写自己的脚本来这样做。
Whether, we use Hand-out dataset or k-fold cross-validation technique, we will discover that the output for the metrics will be different for dataset used for training and the dataset used for validation.
无论是使用分发数据集还是k倍交叉验证技术,我们都会发现,用于训练的数据集和用于验证的数据集的指标输出会有所不同。
The phenomenon called overfitting is a situation where our ML model, models the training data exceptionally well, but fails to perform well on the testing data, i.e. was not able to predict test data.
这种称为过拟合的现象是我们的机器学习模型对训练数据建模得异常出色的情况,但是对测试数据的表现不佳,即无法预测测试数据。
It happens when a ML model learns a specific pattern and noise from the training data to such an extent, that it negatively impacts that model’s ability to generalise from the training data to new, i.e. unseen data. Here, noise is the irrelevant information or randomness in a dataset.
当ML模型从训练数据中学习到特定的模式和噪声达到一定程度时,就会发生这种情况,从而会对模型从训练数据推广到新的即看不见的数据的能力产生负面影响。 在这里,噪声是数据集中的无关信息或随机性。
Following are the two ways with the help of which we can detect weather our model is overfit or not −
以下是两种方法可以帮助我们检测模型是否过拟合的天气-
The overfit model will perform well on the same samples we used for training, but it will perform very bad on the new samples, i.e. samples different from training.
过拟合模型在我们用于训练的相同样本上将表现良好,但在新样本(即不同于训练的样本)上将表现非常差。
The model is overfit during validation if the metric on the test set is lower than the same metric, we use on our training set.
如果测试集上的指标低于相同指标(我们在训练集中使用),则该模型在验证期间过拟合。
Another situation that can arise in our ML is underfitting. This is a situation where, our ML model didn’t model the training data well and fails to predict useful output. When we start training the first epoch, our model will be underfitting, but will become less underfit as training progress.
我们的机器学习中可能出现的另一种情况是拟合不足。 在这种情况下,我们的ML模型无法很好地对训练数据进行建模,并且无法预测有用的输出。 当我们开始训练第一个纪元时,我们的模型将变得不合适,但是随着训练的进行,模型将变得更少。
One of the ways to detect, whether our model is underfit or not is to look at the metrics for training set and test set. Our model will be underfit if the metric on the test set is higher than the metric on the training set.
检测我们的模型是否欠拟合的一种方法是查看训练集和测试集的指标。 如果测试集上的指标高于训练集上的指标,我们的模型将不适合。
In this chapter, we will study how to classify neural network by using CNTK.
在本章中,我们将研究如何使用CNTK对神经网络进行分类。
Classification may be defined as the process to predict categorial output labels or responses for the given input data. The categorised output, which will be based on what the model has learned in training phase, can have the form such as "Black" or "White" or "spam" or "no spam".
分类可以定义为预测给定输入数据的分类输出标签或响应的过程。 分类输出将基于模型在训练阶段所学到的内容,形式可以是“黑色”或“白色”或“垃圾邮件”或“无垃圾邮件”。
On the other hand, mathematically, it is the task of approximating a mapping function say f from input variables say X to the output variables say Y.
另一方面,在数学上,这是从输入变量说X到输出变量说Y近似映射函数说f的任务。
A classic example of classification problem can be the spam detection in e-mails. It is obvious that there can be only two categories of output, "spam" and "no spam".
分类问题的经典示例可以是电子邮件中的垃圾邮件检测。 显然,只能有两类输出,“垃圾邮件”和“无垃圾邮件”。
To implement such classification, we first need to do training of the classifier where "spam" and "no spam" emails would be used as the training data. Once, the classifier trained successfully, it can be used to detect an unknown email.
为了实现这种分类,我们首先需要对分类器进行训练,其中将“垃圾邮件”和“无垃圾邮件”电子邮件用作训练数据。 一旦分类器训练成功,就可以将其用于检测未知电子邮件。
Here, we are going to create a 4-5-3 NN using iris flower dataset having the following −
在这里,我们将使用具有以下内容的鸢尾花数据集来创建4-5-3 NN-
4-input nodes (one for each predictor value).
4输入节点(每个预测值一个)。
5-hidden processing nodes.
5个隐藏的处理节点。
3-output nodes (because there are three possible species in iris dataset).
3输出节点(因为虹膜数据集中存在三种可能的物种)。
We will be using iris flower dataset, from which we want to classify species of iris flowers based on the physical properties of sepal width and length, and petal width and length. The dataset describes the physical properties of different varieties of iris flowers −
我们将使用鸢尾花数据集,从中我们要根据萼片的宽度和长度以及花瓣的宽度和长度的物理属性对鸢尾花的种类进行分类。 数据集描述了不同品种的鸢尾花的物理特性-
Sepal length
萼片长度
Sepal width
萼片宽度
Petal length
花瓣长度
Petal width
花瓣宽度
Class i.e. iris setosa or iris versicolor or iris virginica
类,即鸢尾鸢尾花或杂色鸢尾花或初春鸢尾花
We have iris.CSV file which we used before in previous chapters also. It can be loaded with the help of Pandas library. But, before using it or loading it for our classifier, we need to prepare the training and test files, so that it can be used easily with CNTK.
我们还有在先前章节中使用过的iris.CSV文件。 可以在Pandas库的帮助下进行加载。 但是,在使用它或将其加载到我们的分类器之前,我们需要准备训练和测试文件,以便可以将其轻松用于CNTK。
Iris dataset is one of the most popular datasets for ML projects. It has 150 data items and the raw data looks as follows −
Iris数据集是ML项目中最受欢迎的数据集之一。 它有150个数据项,原始数据如下所示-
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
…
7.0 3.2 4.7 1.4 versicolor
6.4 3.2 4.5 1.5 versicolor
…
6.3 3.3 6.0 2.5 virginica
5.8 2.7 5.1 1.9 virginica
As told earlier, the first four values on each line describes the physical properties of different varieties, i.e. Sepal length, Sepal width, Petal length, Petal width of iris flowers.
如前所述,每行的前四个值描述了不同品种的物理特性,即萼片长度,萼片宽度,花瓣长度,鸢尾花的花瓣宽度。
But, we should have to convert the data in the format, that can be easily used by CNTK and that format is .ctf file (we created one iris.ctf in previous section also). It will look like as follows −
但是,我们必须将数据转换为CNTK可以轻松使用的格式,并且该格式为.ctf文件(我们也在上一节中创建了一个iris.ctf)。 它看起来像如下-
|attribs 5.1 3.5 1.4 0.2|species 1 0 0
|attribs 4.9 3.0 1.4 0.2|species 1 0 0
…
|attribs 7.0 3.2 4.7 1.4|species 0 1 0
|attribs 6.4 3.2 4.5 1.5|species 0 1 0
…
|attribs 6.3 3.3 6.0 2.5|species 0 0 1
|attribs 5.8 2.7 5.1 1.9|species 0 0 1
In the above data, the |attribs tag mark the start of the feature value and the |species tags the class label values. We can also use any other tag names of our wish, even we can add item ID as well. For example, look at the following data −
在以上数据中,| attribs标记标记特征值的开始,而| species标记类标记值。 我们也可以使用我们希望的任何其他标签名称,即使我们也可以添加商品ID。 例如,查看以下数据-
|ID 001 |attribs 5.1 3.5 1.4 0.2|species 1 0 0 |#setosa
|ID 002 |attribs 4.9 3.0 1.4 0.2|species 1 0 0 |#setosa
…
|ID 051 |attribs 7.0 3.2 4.7 1.4|species 0 1 0 |#versicolor
|ID 052 |attribs 6.4 3.2 4.5 1.5|species 0 1 0 |#versicolor
…
There are total 150 data items in iris dataset and for this example, we will be using 80-20 hold-out dataset rule i.e. 80% (120 items) data items for training purpose and remaining 20% (30 items) data items for testing purpose.
虹膜数据集中共有150个数据项,在本示例中,我们将使用80-20个保持数据集规则,即80%(120个项)数据项用于训练目的,其余20%(30个项)数据项用于测试目的。
First, we need to process the data files in CNTK format and for that we are going to use the helper function named create_reader as follows −
首先,我们需要处理CNTK格式的数据文件,为此,我们将使用名为create_reader的帮助器函数,如下所示-
def create_reader(path, input_dim, output_dim, rnd_order, sweeps):
x_strm = C.io.StreamDef(field='attribs', shape=input_dim, is_sparse=False)
y_strm = C.io.StreamDef(field='species', shape=output_dim, is_sparse=False)
streams = C.io.StreamDefs(x_src=x_strm, y_src=y_strm)
deserial = C.io.CTFDeserializer(path, streams)
mb_src = C.io.MinibatchSource(deserial, randomize=rnd_order, max_sweeps=sweeps)
return mb_src
Now, we need to set the architecture arguments for our NN and also provide the location of the data files. It can be done with the help of following python code −
现在,我们需要为NN设置架构参数,并提供数据文件的位置。 可以在以下python代码的帮助下完成-
def main():
print("Using CNTK version = " + str(C.__version__) + "\n")
input_dim = 4
hidden_dim = 5
output_dim = 3
train_file = ".\\...\\" #provide the name of the training file(120 data items)
test_file = ".\\...\\" #provide the name of the test file(30 data items)
Now, with the help of following code line our program will create the untrained NN −
现在,在以下代码行的帮助下,我们的程序将创建未经训练的NN-
X = C.ops.input_variable(input_dim, np.float32)
Y = C.ops.input_variable(output_dim, np.float32)
with C.layers.default_options(init=C.initializer.uniform(scale=0.01, seed=1)):
hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh, name='hidLayer')(X)
oLayer = C.layers.Dense(output_dim, activation=None, name='outLayer')(hLayer)
nnet = oLayer
model = C.ops.softmax(nnet)
Now, once we created the dual untrained model, we need to set up a Learner algorithm object and afterwards use it to create a Trainer training object. We are going to use SGD learner and cross_entropy_with_softmax loss function −
现在,一旦我们创建了双重未经训练的模型,就需要建立一个Learner算法对象,然后使用它来创建一个Trainer训练对象。 我们将使用SGD学习器和cross_entropy_with_softmax损失函数-
tr_loss = C.cross_entropy_with_softmax(nnet, Y)
tr_clas = C.classification_error(nnet, Y)
max_iter = 2000
batch_size = 10
learn_rate = 0.01
learner = C.sgd(nnet.parameters, learn_rate)
trainer = C.Trainer(nnet, (tr_loss, tr_clas), [learner])
Code the learning algorithm as follows −
编写学习算法的代码如下-
max_iter = 2000
batch_size = 10
lr_schedule = C.learning_parameter_schedule_per_sample([(1000, 0.05), (1, 0.01)])
mom_sch = C.momentum_schedule([(100, 0.99), (0, 0.95)], batch_size)
learner = C.fsadagrad(nnet.parameters, lr=lr_schedule, momentum=mom_sch)
trainer = C.Trainer(nnet, (tr_loss, tr_clas), [learner])
Now, once we finished with Trainer object, we need to create a reader function to read the training data−
现在,一旦完成Trainer对象,我们需要创建一个reader函数来读取训练数据-
rdr = create_reader(train_file, input_dim, output_dim, rnd_order=True, sweeps=C.io.INFINITELY_REPEAT)
iris_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
Now it’s time to train our NN model−
现在是时候训练我们的NN模型了
for i in range(0, max_iter):
curr_batch = rdr.next_minibatch(batch_size, input_map=iris_input_map) trainer.train_minibatch(curr_batch)
if i % 500 == 0:
mcee = trainer.previous_minibatch_loss_average
macc = (1.0 - trainer.previous_minibatch_evaluation_average) * 100
print("batch %4d: mean loss = %0.4f, accuracy = %0.2f%% " \ % (i, mcee, macc))
Once, we have done with training, let’s evaluate the model using test data items −
一次,我们完成了培训,让我们使用测试数据项评估模型-
print("\nEvaluating test data \n")
rdr = create_reader(test_file, input_dim, output_dim, rnd_order=False, sweeps=1)
iris_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
num_test = 30
all_test = rdr.next_minibatch(num_test, input_map=iris_input_map) acc = (1.0 - trainer.test_minibatch(all_test)) * 100
print("Classification accuracy = %0.2f%%" % acc)
After evaluating the accuracy of our trained NN model, we will be using it for making a prediction on unseen data −
在评估了训练有素的NN模型的准确性之后,我们将使用它来对看不见的数据进行预测-
np.set_printoptions(precision = 1, suppress=True)
unknown = np.array([[6.4, 3.2, 4.5, 1.5]], dtype=np.float32)
print("\nPredicting Iris species for input features: ")
print(unknown[0]) pred_prob = model.eval(unknown)
np.set_printoptions(precision = 4, suppress=True)
print("Prediction probabilities are: ")
print(pred_prob[0])
Import numpy as np
Import cntk as C
def create_reader(path, input_dim, output_dim, rnd_order, sweeps):
x_strm = C.io.StreamDef(field='attribs', shape=input_dim, is_sparse=False)
y_strm = C.io.StreamDef(field='species', shape=output_dim, is_sparse=False)
streams = C.io.StreamDefs(x_src=x_strm, y_src=y_strm)
deserial = C.io.CTFDeserializer(path, streams)
mb_src = C.io.MinibatchSource(deserial, randomize=rnd_order, max_sweeps=sweeps)
return mb_src
def main():
print("Using CNTK version = " + str(C.__version__) + "\n")
input_dim = 4
hidden_dim = 5
output_dim = 3
train_file = ".\\...\\" #provide the name of the training file(120 data items)
test_file = ".\\...\\" #provide the name of the test file(30 data items)
X = C.ops.input_variable(input_dim, np.float32)
Y = C.ops.input_variable(output_dim, np.float32)
with C.layers.default_options(init=C.initializer.uniform(scale=0.01, seed=1)):
hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh, name='hidLayer')(X)
oLayer = C.layers.Dense(output_dim, activation=None, name='outLayer')(hLayer)
nnet = oLayer
model = C.ops.softmax(nnet)
tr_loss = C.cross_entropy_with_softmax(nnet, Y)
tr_clas = C.classification_error(nnet, Y)
max_iter = 2000
batch_size = 10
learn_rate = 0.01
learner = C.sgd(nnet.parameters, learn_rate)
trainer = C.Trainer(nnet, (tr_loss, tr_clas), [learner])
max_iter = 2000
batch_size = 10
lr_schedule = C.learning_parameter_schedule_per_sample([(1000, 0.05), (1, 0.01)])
mom_sch = C.momentum_schedule([(100, 0.99), (0, 0.95)], batch_size)
learner = C.fsadagrad(nnet.parameters, lr=lr_schedule, momentum=mom_sch)
trainer = C.Trainer(nnet, (tr_loss, tr_clas), [learner])
rdr = create_reader(train_file, input_dim, output_dim, rnd_order=True, sweeps=C.io.INFINITELY_REPEAT)
iris_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
for i in range(0, max_iter):
curr_batch = rdr.next_minibatch(batch_size, input_map=iris_input_map) trainer.train_minibatch(curr_batch)
if i % 500 == 0:
mcee = trainer.previous_minibatch_loss_average
macc = (1.0 - trainer.previous_minibatch_evaluation_average) * 100
print("batch %4d: mean loss = %0.4f, accuracy = %0.2f%% " \ % (i, mcee, macc))
print("\nEvaluating test data \n")
rdr = create_reader(test_file, input_dim, output_dim, rnd_order=False, sweeps=1)
iris_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
num_test = 30
all_test = rdr.next_minibatch(num_test, input_map=iris_input_map) acc = (1.0 - trainer.test_minibatch(all_test)) * 100
print("Classification accuracy = %0.2f%%" % acc)
np.set_printoptions(precision = 1, suppress=True)
unknown = np.array([[7.0, 3.2, 4.7, 1.4]], dtype=np.float32)
print("\nPredicting species for input features: ")
print(unknown[0])
pred_prob = model.eval(unknown)
np.set_printoptions(precision = 4, suppress=True)
print("Prediction probabilities: ")
print(pred_prob[0])
if __name__== ”__main__”:
main()
Using CNTK version = 2.7
batch 0: mean loss = 1.0986, mean accuracy = 40.00%
batch 500: mean loss = 0.6677, mean accuracy = 80.00%
batch 1000: mean loss = 0.5332, mean accuracy = 70.00%
batch 1500: mean loss = 0.2408, mean accuracy = 100.00%
Evaluating test data
Classification accuracy = 94.58%
Predicting species for input features:
[7.0 3.2 4.7 1.4]
Prediction probabilities:
[0.0847 0.736 0.113]
This Iris dataset has only 150 data items, hence it would take only a few seconds to train the NN classifier model, but training on a large dataset having hundred or thousand data items can take hours or even days.
该Iris数据集只有150个数据项,因此训练NN分类器模型仅需几秒钟,但是对具有数百或数千个数据项的大型数据集进行训练可能要花费数小时甚至数天。
We can save our model so that, we won’t have to retain it from scratch. With the help of following Python code, we can save our trained NN −
我们可以保存我们的模型,这样就不必从头开始保留它。 借助以下Python代码,我们可以保存训练有素的NN-
nn_classifier = “.\\neuralclassifier.model” #provide the name of the file
model.save(nn_classifier, format=C.ModelFormat.CNTKv2)
Following are the arguments of save() function used above −
以下是上面使用的save()函数的参数-
File name is the first argument of save() function. It can also be write along with the path of file.
文件名是save()函数的第一个参数。 也可以将其与文件路径一起写入。
Another parameter is the format parameter which has a default value C.ModelFormat.CNTKv2.
另一个参数是格式参数,其默认值为C.ModelFormat.CNTKv2 。
Once you saved the trained model, it’s very easy to load that model. We only need to use the load () function. Let’s check this in the following example −
保存训练好的模型后,很容易加载该模型。 我们只需要使用load()函数。 让我们在以下示例中进行检查-
import numpy as np
import cntk as C
model = C.ops.functions.Function.load(“.\\neuralclassifier.model”)
np.set_printoptions(precision = 1, suppress=True)
unknown = np.array([[7.0, 3.2, 4.7, 1.4]], dtype=np.float32)
print("\nPredicting species for input features: ")
print(unknown[0])
pred_prob = model.eval(unknown)
np.set_printoptions(precision = 4, suppress=True)
print("Prediction probabilities: ")
print(pred_prob[0])
The benefit of saved model is that, once you load a saved model, it can be used exactly as if the model had just been trained.
保存的模型的好处在于,一旦加载了保存的模型,就可以像使用模型一样对其进行完全使用。
Let us understand, what is neural network binary classification using CNTK, in this chapter.
在本章中,让我们了解什么是使用CNTK的神经网络二进制分类。
Binary classification using NN is like multi-class classification, the only thing is that there are just two output nodes instead of three or more. Here, we are going to perform binary classification using a neural network by using two techniques namely one-node and two-node technique. One-node technique is more common than two-node technique.
使用NN的二进制分类就像多类分类,唯一的事情是只有两个输出节点,而不是三个或更多。 在这里,我们将通过使用两种技术,即一节点和两节点技术,使用神经网络执行二进制分类。 单节点技术比两节点技术更为普遍。
For both these techniques to implement using NN, we will be using banknote dataset. The dataset can be downloaded from UCI Machine Learning Repository which is available at https://archive.ics.uci.edu/ml/datasets/banknote+authentication.
对于这两种使用NN实现的技术,我们将使用钞票数据集。 该数据集可以从UCI机器学习存储库中下载,该存储库可从https://archive.ics.uci.edu/ml/datasets/banknote+authentication获得。
For our example, we will be using 50 authentic data items having class forgery = 0, and the first 50 fake items having class forgery = 1.
对于我们的示例,我们将使用50个伪造类别为0的真实数据项,以及前50个伪造类别为1的假数据项。
There are 1372 data items in the full dataset. The raw dataset looks as follows −
完整数据集中有1372个数据项。 原始数据集如下所示-
3.6216, 8.6661, -2.8076, -0.44699, 0
4.5459, 8.1674, -2.4586, -1.4621, 0
…
-1.3971, 3.3191, -1.3927, -1.9948, 1
0.39012, -0.14279, -0.031994, 0.35084, 1
Now, first we need to convert this raw data into two-node CNTK format, which would be as follows −
现在,首先我们需要将该原始数据转换为两节点的CNTK格式,如下所示-
|stats 3.62160000 8.66610000 -2.80730000 -0.44699000 |forgery 0 1 |# authentic
|stats 4.54590000 8.16740000 -2.45860000 -1.46210000 |forgery 0 1 |# authentic
. . .
|stats -1.39710000 3.31910000 -1.39270000 -1.99480000 |forgery 1 0 |# fake
|stats 0.39012000 -0.14279000 -0.03199400 0.35084000 |forgery 1 0 |# fake
You can use the following python program to create CNTK-format data from Raw data −
您可以使用以下python程序从Raw数据创建CNTK格式的数据-
fin = open(".\\...", "r") #provide the location of saved dataset text file.
for line in fin:
line = line.strip()
tokens = line.split(",")
if tokens[4] == "0":
print("|stats %12.8f %12.8f %12.8f %12.8f |forgery 0 1 |# authentic" % \
(float(tokens[0]), float(tokens[1]), float(tokens[2]), float(tokens[3])) )
else:
print("|stats %12.8f %12.8f %12.8f %12.8f |forgery 1 0 |# fake" % \
(float(tokens[0]), float(tokens[1]), float(tokens[2]), float(tokens[3])) )
fin.close()
There is very little difference between two-node classification and multi-class classification. Here we first, need to process the data files in CNTK format and for that we are going to use the helper function named create_reader as follows −
两节点分类和多分类之间几乎没有什么区别。 在这里,我们首先需要处理CNTK格式的数据文件,为此,我们将使用名为create_reader的帮助器函数,如下所示-
def create_reader(path, input_dim, output_dim, rnd_order, sweeps):
x_strm = C.io.StreamDef(field='stats', shape=input_dim, is_sparse=False)
y_strm = C.io.StreamDef(field='forgery', shape=output_dim, is_sparse=False)
streams = C.io.StreamDefs(x_src=x_strm, y_src=y_strm)
deserial = C.io.CTFDeserializer(path, streams)
mb_src = C.io.MinibatchSource(deserial, randomize=rnd_order, max_sweeps=sweeps)
return mb_src
Now, we need to set the architecture arguments for our NN and also provide the location of the data files. It can be done with the help of following python code −
现在,我们需要为NN设置架构参数,并提供数据文件的位置。 可以在以下python代码的帮助下完成-
def main():
print("Using CNTK version = " + str(C.__version__) + "\n")
input_dim = 4
hidden_dim = 10
output_dim = 2
train_file = ".\\...\\" #provide the name of the training file
test_file = ".\\...\\" #provide the name of the test file
Now, with the help of following code line our program will create the untrained NN −
现在,在以下代码行的帮助下,我们的程序将创建未经训练的NN-
X = C.ops.input_variable(input_dim, np.float32)
Y = C.ops.input_variable(output_dim, np.float32)
with C.layers.default_options(init=C.initializer.uniform(scale=0.01, seed=1)):
hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh, name='hidLayer')(X)
oLayer = C.layers.Dense(output_dim, activation=None, name='outLayer')(hLayer)
nnet = oLayer
model = C.ops.softmax(nnet)
Now, once we created the dual untrained model, we need to set up a Learner algorithm object and afterwards use it to create a Trainer training object. We are going to use SGD learner and cross_entropy_with_softmax loss function −
现在,一旦我们创建了双重未经训练的模型,就需要建立一个Learner算法对象,然后使用它来创建一个Trainer训练对象。 我们将使用SGD学习器和cross_entropy_with_softmax损失函数-
tr_loss = C.cross_entropy_with_softmax(nnet, Y)
tr_clas = C.classification_error(nnet, Y)
max_iter = 500
batch_size = 10
learn_rate = 0.01
learner = C.sgd(nnet.parameters, learn_rate)
trainer = C.Trainer(nnet, (tr_loss, tr_clas), [learner])
Now, once we finished with Trainer object, we need to create a reader function to read the training data −
现在,一旦完成了Trainer对象,我们需要创建一个reader函数来读取训练数据-
rdr = create_reader(train_file, input_dim, output_dim, rnd_order=True, sweeps=C.io.INFINITELY_REPEAT)
banknote_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
Now, it is time to train our NN model −
现在是时候训练我们的NN模型了-
for i in range(0, max_iter):
curr_batch = rdr.next_minibatch(batch_size, input_map=iris_input_map) trainer.train_minibatch(curr_batch)
if i % 500 == 0:
mcee = trainer.previous_minibatch_loss_average
macc = (1.0 - trainer.previous_minibatch_evaluation_average) * 100
print("batch %4d: mean loss = %0.4f, accuracy = %0.2f%% " \ % (i, mcee, macc))
Once training is completed, let us evaluate the model using test data items −
训练完成后,让我们使用测试数据项评估模型-
print("\nEvaluating test data \n")
rdr = create_reader(test_file, input_dim, output_dim, rnd_order=False, sweeps=1)
banknote_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
num_test = 20
all_test = rdr.next_minibatch(num_test, input_map=iris_input_map) acc = (1.0 - trainer.test_minibatch(all_test)) * 100
print("Classification accuracy = %0.2f%%" % acc)
After evaluating the accuracy of our trained NN model, we will be using it for making a prediction on unseen data −
在评估了训练有素的NN模型的准确性之后,我们将使用它来对看不见的数据进行预测-
np.set_printoptions(precision = 1, suppress=True)
unknown = np.array([[0.6, 1.9, -3.3, -0.3]], dtype=np.float32)
print("\nPredicting Banknote authenticity for input features: ")
print(unknown[0])
pred_prob = model.eval(unknown)
np.set_printoptions(precision = 4, suppress=True)
print("Prediction probabilities are: ")
print(pred_prob[0])
if pred_prob[0,0] < pred_prob[0,1]:
print(“Prediction: authentic”)
else:
print(“Prediction: fake”)
def create_reader(path, input_dim, output_dim, rnd_order, sweeps):
x_strm = C.io.StreamDef(field='stats', shape=input_dim, is_sparse=False)
y_strm = C.io.StreamDef(field='forgery', shape=output_dim, is_sparse=False)
streams = C.io.StreamDefs(x_src=x_strm, y_src=y_strm)
deserial = C.io.CTFDeserializer(path, streams)
mb_src = C.io.MinibatchSource(deserial, randomize=rnd_order, max_sweeps=sweeps)
return mb_src
def main():
print("Using CNTK version = " + str(C.__version__) + "\n")
input_dim = 4
hidden_dim = 10
output_dim = 2
train_file = ".\\...\\" #provide the name of the training file
test_file = ".\\...\\" #provide the name of the test file
X = C.ops.input_variable(input_dim, np.float32)
Y = C.ops.input_variable(output_dim, np.float32)
withC.layers.default_options(init=C.initializer.uniform(scale=0.01, seed=1)):
hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh, name='hidLayer')(X)
oLayer = C.layers.Dense(output_dim, activation=None, name='outLayer')(hLayer)
nnet = oLayer
model = C.ops.softmax(nnet)
tr_loss = C.cross_entropy_with_softmax(nnet, Y)
tr_clas = C.classification_error(nnet, Y)
max_iter = 500
batch_size = 10
learn_rate = 0.01
learner = C.sgd(nnet.parameters, learn_rate)
trainer = C.Trainer(nnet, (tr_loss, tr_clas), [learner])
rdr = create_reader(train_file, input_dim, output_dim, rnd_order=True, sweeps=C.io.INFINITELY_REPEAT)
banknote_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
for i in range(0, max_iter):
curr_batch = rdr.next_minibatch(batch_size, input_map=iris_input_map) trainer.train_minibatch(curr_batch)
if i % 500 == 0:
mcee = trainer.previous_minibatch_loss_average
macc = (1.0 - trainer.previous_minibatch_evaluation_average) * 100
print("batch %4d: mean loss = %0.4f, accuracy = %0.2f%% " \ % (i, mcee, macc))
print("\nEvaluating test data \n")
rdr = create_reader(test_file, input_dim, output_dim, rnd_order=False, sweeps=1)
banknote_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
num_test = 20
all_test = rdr.next_minibatch(num_test, input_map=iris_input_map) acc = (1.0 - trainer.test_minibatch(all_test)) * 100
print("Classification accuracy = %0.2f%%" % acc)
np.set_printoptions(precision = 1, suppress=True)
unknown = np.array([[0.6, 1.9, -3.3, -0.3]], dtype=np.float32)
print("\nPredicting Banknote authenticity for input features: ")
print(unknown[0])
pred_prob = model.eval(unknown)
np.set_printoptions(precision = 4, suppress=True)
print("Prediction probabilities are: ")
print(pred_prob[0])
if pred_prob[0,0] < pred_prob[0,1]:
print(“Prediction: authentic”)
else:
print(“Prediction: fake”)
if __name__== ”__main__”:
main()
Using CNTK version = 2.7
batch 0: mean loss = 0.6928, accuracy = 80.00%
batch 50: mean loss = 0.6877, accuracy = 70.00%
batch 100: mean loss = 0.6432, accuracy = 80.00%
batch 150: mean loss = 0.4978, accuracy = 80.00%
batch 200: mean loss = 0.4551, accuracy = 90.00%
batch 250: mean loss = 0.3755, accuracy = 90.00%
batch 300: mean loss = 0.2295, accuracy = 100.00%
batch 350: mean loss = 0.1542, accuracy = 100.00%
batch 400: mean loss = 0.1581, accuracy = 100.00%
batch 450: mean loss = 0.1499, accuracy = 100.00%
Evaluating test data
Classification accuracy = 84.58%
Predicting banknote authenticity for input features:
[0.6 1.9 -3.3 -0.3]
Prediction probabilities are:
[0.7847 0.2536]
Prediction: fake
The implementation program is almost like we have done above for two-node classification. The main change is that when using the two-node classification technique.
实现程序几乎就像我们上面为两节点分类所做的那样。 主要的变化是使用两节点分类技术时。
We can use the CNTK built-in classification_error() function, but in case of one-node classification CNTK doesn’t support classification_error() function. That’s the reason we need to implement a program-defined function as follows −
我们可以使用CNTK内置的category_error()函数,但是在单节点分类的情况下,CNTK不支持classification_error()函数。 这就是我们需要实现如下程序定义函数的原因-
def class_acc(mb, x_var, y_var, model):
num_correct = 0; num_wrong = 0
x_mat = mb[x_var].asarray()
y_mat = mb[y_var].asarray()
for i in range(mb[x_var].shape[0]):
p = model.eval(x_mat[i]
y = y_mat[i]
if p[0,0] < 0.5 and y[0,0] == 0.0 or p[0,0] >= 0.5 and y[0,0] == 1.0:
num_correct += 1
else:
num_wrong += 1
return (num_correct * 100.0)/(num_correct + num_wrong)
With that change let’s see the complete one-node classification example −
有了这一更改,让我们看完整的一节点分类示例-
import numpy as np
import cntk as C
def create_reader(path, input_dim, output_dim, rnd_order, sweeps):
x_strm = C.io.StreamDef(field='stats', shape=input_dim, is_sparse=False)
y_strm = C.io.StreamDef(field='forgery', shape=output_dim, is_sparse=False)
streams = C.io.StreamDefs(x_src=x_strm, y_src=y_strm)
deserial = C.io.CTFDeserializer(path, streams)
mb_src = C.io.MinibatchSource(deserial, randomize=rnd_order, max_sweeps=sweeps)
return mb_src
def class_acc(mb, x_var, y_var, model):
num_correct = 0; num_wrong = 0
x_mat = mb[x_var].asarray()
y_mat = mb[y_var].asarray()
for i in range(mb[x_var].shape[0]):
p = model.eval(x_mat[i]
y = y_mat[i]
if p[0,0] < 0.5 and y[0,0] == 0.0 or p[0,0] >= 0.5 and y[0,0] == 1.0:
num_correct += 1
else:
num_wrong += 1
return (num_correct * 100.0)/(num_correct + num_wrong)
def main():
print("Using CNTK version = " + str(C.__version__) + "\n")
input_dim = 4
hidden_dim = 10
output_dim = 1
train_file = ".\\...\\" #provide the name of the training file
test_file = ".\\...\\" #provide the name of the test file
X = C.ops.input_variable(input_dim, np.float32)
Y = C.ops.input_variable(output_dim, np.float32)
with C.layers.default_options(init=C.initializer.uniform(scale=0.01, seed=1)):
hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh, name='hidLayer')(X)
oLayer = C.layers.Dense(output_dim, activation=None, name='outLayer')(hLayer)
model = oLayer
tr_loss = C.cross_entropy_with_softmax(model, Y)
max_iter = 1000
batch_size = 10
learn_rate = 0.01
learner = C.sgd(model.parameters, learn_rate)
trainer = C.Trainer(model, (tr_loss), [learner])
rdr = create_reader(train_file, input_dim, output_dim, rnd_order=True, sweeps=C.io.INFINITELY_REPEAT)
banknote_input_map = {X : rdr.streams.x_src, Y : rdr.streams.y_src }
for i in range(0, max_iter):
curr_batch = rdr.next_minibatch(batch_size, input_map=iris_input_map) trainer.train_minibatch(curr_batch)
if i % 100 == 0:
mcee=trainer.previous_minibatch_loss_average
ca = class_acc(curr_batch, X,Y, model)
print("batch %4d: mean loss = %0.4f, accuracy = %0.2f%% " \ % (i, mcee, ca))
print("\nEvaluating test data \n")
rdr = create_reader(test_file, input_dim, output_dim, rnd_order=False, sweeps=1)
banknote_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
num_test = 20
all_test = rdr.next_minibatch(num_test, input_map=iris_input_map)
acc = class_acc(all_test, X,Y, model)
print("Classification accuracy = %0.2f%%" % acc)
np.set_printoptions(precision = 1, suppress=True)
unknown = np.array([[0.6, 1.9, -3.3, -0.3]], dtype=np.float32)
print("\nPredicting Banknote authenticity for input features: ")
print(unknown[0])
pred_prob = model.eval({X:unknown})
print("Prediction probability: ")
print(“%0.4f” % pred_prob[0,0])
if pred_prob[0,0] < 0.5:
print(“Prediction: authentic”)
else:
print(“Prediction: fake”)
if __name__== ”__main__”:
main()
Using CNTK version = 2.7
batch 0: mean loss = 0.6936, accuracy = 10.00%
batch 100: mean loss = 0.6882, accuracy = 70.00%
batch 200: mean loss = 0.6597, accuracy = 50.00%
batch 300: mean loss = 0.5298, accuracy = 70.00%
batch 400: mean loss = 0.4090, accuracy = 100.00%
batch 500: mean loss = 0.3790, accuracy = 90.00%
batch 600: mean loss = 0.1852, accuracy = 100.00%
batch 700: mean loss = 0.1135, accuracy = 100.00%
batch 800: mean loss = 0.1285, accuracy = 100.00%
batch 900: mean loss = 0.1054, accuracy = 100.00%
Evaluating test data
Classification accuracy = 84.00%
Predicting banknote authenticity for input features:
[0.6 1.9 -3.3 -0.3]
Prediction probability:
0.8846
Prediction: fake
The chapter will help you understand the neural network regression with regards to CNTK.
本章将帮助您了解有关CNTK的神经网络回归。
As we know that, in order to predict a numeric value from one or more predictor variables, we use regression. Let’s take an example of predicting the median value of a house in say one of the 100 towns. To do so, we have data that includes −
众所周知,为了从一个或多个预测变量预测数值,我们使用回归。 让我们以预测100个城镇之一的房屋中位数为例。 为此,我们拥有的数据包括-
A crime statistic for each town.
每个城镇的犯罪统计数据。
The age of the houses in each town.
每个镇上房屋的年龄。
A measure of the distance from each town to a prime location.
从每个镇到主要位置的距离的度量。
The student-to-teacher ratio in each town.
每个镇的师生比例。
A racial demographic statistic for each town.
每个镇的种族人口统计数据。
The median house value in each town.
每个城镇的房屋中位价。
Based on these five predictor variables, we would like to predict median house value. And for this we can create a linear regression model along the lines of−
基于这五个预测变量,我们希望预测房屋中位数。 为此,我们可以沿着-
Y = a0+a1(crime)+a2(house-age)+(a3)(distance)+(a4)(ratio)+(a5)(racial)
In the above equation −
在上面的方程式中-
Y is a predicted median value
Y是预测的中值
a0 is a constant and
一个 0是常数,并且
a1 through a5 all are constants associated with the five predictors we discussed above.
一个 1至5的所有都与五个预测我们上面讨论的相关联的常数。
We also have an alternate approach of using a neural network. It will create more accurate prediction model.
我们还有使用神经网络的另一种方法。 它将创建更准确的预测模型。
Here, we will be creating a neural network regression model by using CNTK.
在这里,我们将使用CNTK创建神经网络回归模型。
To implement Neural Network regression using CNTK, we will be using Boston area house values dataset. The dataset can be downloaded from UCI Machine Learning Repository which is available at https://archive.ics.uci.edu/ml/machine-learning-databases/housing/. This dataset has total 14 variables and 506 instances.
为了使用CNTK进行神经网络回归,我们将使用波士顿地区房屋价值数据集。 可以从UCI机器学习存储库下载该数据集,该存储库可从https://archive.ics.uci.edu/ml/machine-learning-databases/housing/下载 。 该数据集共有14个变量和506个实例。
But, for our implementation program we are going to use six of the 14 variables and 100 instances. Out of 6, 5 as predictors and one as a value-to-predict. From 100 instances, we will be using 80 for training and 20 for testing purpose. The value which we want to predict is the median house price in a town. Let’s see the five predictors we will be using −
但是,对于我们的实施程序,我们将使用14个变量和100个实例中的6个。 在6个预测指标中,有5个是预测指标,一个是预测值。 从100个实例中,我们将使用80个实例进行培训,并使用20个实例进行测试。 我们要预测的值是城镇的房价中位数。 让我们看看我们将使用的五个预测变量-
Crime per capita in the town − We would expect smaller values to be associated with this predictor.
人均城市犯罪率 -我们期望与该预测因子相关的值较小。
Proportion of owner − occupied units built before 1940 - We would expect smaller values to be associated with this predictor because larger value means older house.
所有者的比例 -在1940年之前建造的居住单元-我们期望与该预测变量关联的值较小,因为较大的值表示房屋更旧。
Weighed distance of the town to five Boston employment centers.
镇距波士顿五个就业中心的距离。
Area school pupil-to-teacher ratio.
地区学校的师生比。
An indirect metric of the proportion of black residents in the town.
城镇黑人居民比例的间接指标。
As we did before, first we need to convert the raw data into CNTK format. We are going to use first 80 data items for training purpose, so the tab-delimited CNTK format is as follows −
如我们之前所做的,首先我们需要将原始数据转换为CNTK格式。 我们将使用前80个数据项进行培训,因此制表符分隔的CNTK格式如下-
|predictors 1.612820 96.90 3.76 21.00 248.31 |medval 13.50
|predictors 0.064170 68.20 3.36 19.20 396.90 |medval 18.90
|predictors 0.097440 61.40 3.38 19.20 377.56 |medval 20.00
. . .
Next 20 items, also converted into CNTK format, will used for testing purpose.
接下来的20个项目(也转换为CNTK格式)将用于测试目的。
First, we need to process the data files in CNTK format and for that, we are going to use the helper function named create_reader as follows −
首先,我们需要以CNTK格式处理数据文件,为此,我们将使用名为create_reader的帮助器函数,如下所示-
def create_reader(path, input_dim, output_dim, rnd_order, sweeps):
x_strm = C.io.StreamDef(field='predictors', shape=input_dim, is_sparse=False)
y_strm = C.io.StreamDef(field='medval', shape=output_dim, is_sparse=False)
streams = C.io.StreamDefs(x_src=x_strm, y_src=y_strm)
deserial = C.io.CTFDeserializer(path, streams)
mb_src = C.io.MinibatchSource(deserial, randomize=rnd_order, max_sweeps=sweeps)
return mb_src
Next, we need to create a helper function that accepts a CNTK mini-batch object and computes a custom accuracy metric.
接下来,我们需要创建一个辅助函数,该函数接受CNTK迷你批处理对象并计算自定义精度度量。
def mb_accuracy(mb, x_var, y_var, model, delta):
num_correct = 0
num_wrong = 0
x_mat = mb[x_var].asarray()
y_mat = mb[y_var].asarray()
for i in range(mb[x_var].shape[0]):
v = model.eval(x_mat[i])
y = y_mat[i]
if np.abs(v[0,0] – y[0,0]) < delta:
num_correct += 1
else:
num_wrong += 1
return (num_correct * 100.0)/(num_correct + num_wrong)
Now, we need to set the architecture arguments for our NN and also provide the location of the data files. It can be done with the help of following python code −
现在,我们需要为NN设置架构参数,并提供数据文件的位置。 可以在以下python代码的帮助下完成-
def main():
print("Using CNTK version = " + str(C.__version__) + "\n")
input_dim = 5
hidden_dim = 20
output_dim = 1
train_file = ".\\...\\" #provide the name of the training file(80 data items)
test_file = ".\\...\\" #provide the name of the test file(20 data items)
Now, with the help of following code line our program will create the untrained NN −
现在,在以下代码行的帮助下,我们的程序将创建未经训练的NN-
X = C.ops.input_variable(input_dim, np.float32)
Y = C.ops.input_variable(output_dim, np.float32)
with C.layers.default_options(init=C.initializer.uniform(scale=0.01, seed=1)):
hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh, name='hidLayer')(X)
oLayer = C.layers.Dense(output_dim, activation=None, name='outLayer')(hLayer)
model = C.ops.alias(oLayer)
Now, once we have created the dual untrained model, we need to set up a Learner algorithm object. We are going to use SGD learner and squared_error loss function −
现在,一旦我们创建了双重未经训练的模型,就需要建立一个Learner算法对象。 我们将使用SGD学习器和squared_error损失函数-
tr_loss = C.squared_error(model, Y)
max_iter = 3000
batch_size = 5
base_learn_rate = 0.02
sch=C.learning_parameter_schedule([base_learn_rate, base_learn_rate/2], minibatch_size=batch_size, epoch_size=int((max_iter*batch_size)/2))
learner = C.sgd(model.parameters, sch)
trainer = C.Trainer(model, (tr_loss), [learner])
Now, once we finish with Learning algorithm object, we need to create a reader function to read the training data −
现在,完成学习算法对象后,我们需要创建一个读取器函数以读取训练数据-
rdr = create_reader(train_file, input_dim, output_dim, rnd_order=True, sweeps=C.io.INFINITELY_REPEAT)
boston_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
Now, it’s time to train our NN model −
现在是时候训练我们的NN模型了-
for i in range(0, max_iter):
curr_batch = rdr.next_minibatch(batch_size, input_map=boston_input_map) trainer.train_minibatch(curr_batch)
if i % int(max_iter/10) == 0:
mcee = trainer.previous_minibatch_loss_average
acc = mb_accuracy(curr_batch, X, Y, model, delta=3.00)
print("batch %4d: mean squared error = %8.4f, accuracy = %5.2f%% " \ % (i, mcee, acc))
Once we have done with training, let’s evaluate the model using test data items −
训练完成后,让我们使用测试数据项评估模型-
print("\nEvaluating test data \n")
rdr = create_reader(test_file, input_dim, output_dim, rnd_order=False, sweeps=1)
boston_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
num_test = 20
all_test = rdr.next_minibatch(num_test, input_map=boston_input_map)
acc = mb_accuracy(all_test, X, Y, model, delta=3.00)
print("Prediction accuracy = %0.2f%%" % acc)
After evaluating the accuracy of our trained NN model, we will be using it for making a prediction on unseen data −
在评估了训练有素的NN模型的准确性之后,我们将使用它来对看不见的数据进行预测-
np.set_printoptions(precision = 2, suppress=True)
unknown = np.array([[0.09, 50.00, 4.5, 17.00, 350.00], dtype=np.float32)
print("\nPredicting median home value for feature/predictor values: ")
print(unknown[0])
pred_prob = model.eval({X: unknown)
print("\nPredicted value is: ")
print(“$%0.2f (x1000)” %pred_value[0,0])
import numpy as np
import cntk as C
def create_reader(path, input_dim, output_dim, rnd_order, sweeps):
x_strm = C.io.StreamDef(field='predictors', shape=input_dim, is_sparse=False)
y_strm = C.io.StreamDef(field='medval', shape=output_dim, is_sparse=False)
streams = C.io.StreamDefs(x_src=x_strm, y_src=y_strm)
deserial = C.io.CTFDeserializer(path, streams)
mb_src = C.io.MinibatchSource(deserial, randomize=rnd_order, max_sweeps=sweeps)
return mb_src
def mb_accuracy(mb, x_var, y_var, model, delta):
num_correct = 0
num_wrong = 0
x_mat = mb[x_var].asarray()
y_mat = mb[y_var].asarray()
for i in range(mb[x_var].shape[0]):
v = model.eval(x_mat[i])
y = y_mat[i]
if np.abs(v[0,0] – y[0,0]) < delta:
num_correct += 1
else:
num_wrong += 1
return (num_correct * 100.0)/(num_correct + num_wrong)
def main():
print("Using CNTK version = " + str(C.__version__) + "\n")
input_dim = 5
hidden_dim = 20
output_dim = 1
train_file = ".\\...\\" #provide the name of the training file(80 data items)
test_file = ".\\...\\" #provide the name of the test file(20 data items)
X = C.ops.input_variable(input_dim, np.float32)
Y = C.ops.input_variable(output_dim, np.float32)
with C.layers.default_options(init=C.initializer.uniform(scale=0.01, seed=1)):
hLayer = C.layers.Dense(hidden_dim, activation=C.ops.tanh, name='hidLayer')(X)
oLayer = C.layers.Dense(output_dim, activation=None, name='outLayer')(hLayer)
model = C.ops.alias(oLayer)
tr_loss = C.squared_error(model, Y)
max_iter = 3000
batch_size = 5
base_learn_rate = 0.02
sch = C.learning_parameter_schedule([base_learn_rate, base_learn_rate/2], minibatch_size=batch_size, epoch_size=int((max_iter*batch_size)/2))
learner = C.sgd(model.parameters, sch)
trainer = C.Trainer(model, (tr_loss), [learner])
rdr = create_reader(train_file, input_dim, output_dim, rnd_order=True, sweeps=C.io.INFINITELY_REPEAT)
boston_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
for i in range(0, max_iter):
curr_batch = rdr.next_minibatch(batch_size, input_map=boston_input_map) trainer.train_minibatch(curr_batch)
if i % int(max_iter/10) == 0:
mcee = trainer.previous_minibatch_loss_average
acc = mb_accuracy(curr_batch, X, Y, model, delta=3.00)
print("batch %4d: mean squared error = %8.4f, accuracy = %5.2f%% " \ % (i, mcee, acc))
print("\nEvaluating test data \n")
rdr = create_reader(test_file, input_dim, output_dim, rnd_order=False, sweeps=1)
boston_input_map = { X : rdr.streams.x_src, Y : rdr.streams.y_src }
num_test = 20
all_test = rdr.next_minibatch(num_test, input_map=boston_input_map)
acc = mb_accuracy(all_test, X, Y, model, delta=3.00)
print("Prediction accuracy = %0.2f%%" % acc)
np.set_printoptions(precision = 2, suppress=True)
unknown = np.array([[0.09, 50.00, 4.5, 17.00, 350.00], dtype=np.float32)
print("\nPredicting median home value for feature/predictor values: ")
print(unknown[0])
pred_prob = model.eval({X: unknown)
print("\nPredicted value is: ")
print(“$%0.2f (x1000)” %pred_value[0,0])
if __name__== ”__main__”:
main()
Using CNTK version = 2.7
batch 0: mean squared error = 385.6727, accuracy = 0.00%
batch 300: mean squared error = 41.6229, accuracy = 20.00%
batch 600: mean squared error = 28.7667, accuracy = 40.00%
batch 900: mean squared error = 48.6435, accuracy = 40.00%
batch 1200: mean squared error = 77.9562, accuracy = 80.00%
batch 1500: mean squared error = 7.8342, accuracy = 60.00%
batch 1800: mean squared error = 47.7062, accuracy = 60.00%
batch 2100: mean squared error = 40.5068, accuracy = 40.00%
batch 2400: mean squared error = 46.5023, accuracy = 40.00%
batch 2700: mean squared error = 15.6235, accuracy = 60.00%
Evaluating test data
Prediction accuracy = 64.00%
Predicting median home value for feature/predictor values:
[0.09 50. 4.5 17. 350.]
Predicted value is:
$21.02(x1000)
This Boston Home value dataset has only 506 data items (among which we sued only 100). Hence, it would take only a few seconds to train the NN regressor model, but training on a large dataset having hundred or thousand data items can take hours or even days.
这个波士顿房屋价值数据集只有506个数据项(其中我们仅起诉了100个)。 因此,训练NN回归模型仅需几秒钟,但是对具有数百或数千个数据项的大型数据集进行训练可能要花费数小时甚至数天。
We can save our model, so that we won’t have to retain it from scratch. With the help of following Python code, we can save our trained NN −
我们可以保存我们的模型,这样就不必从头开始保留它。 借助以下Python代码,我们可以保存训练有素的NN-
nn_regressor = “.\\neuralregressor.model” #provide the name of the file
model.save(nn_regressor, format=C.ModelFormat.CNTKv2)
Following are the arguments of save() function used above −
以下是上面使用的save()函数的参数-
File name is the first argument of save() function. It can also be written along with the path of file.
文件名是save()函数的第一个参数。 也可以将其与文件路径一起写入。
Another parameter is the format parameter which has a default value C.ModelFormat.CNTKv2.
另一个参数是格式参数,其默认值为C.ModelFormat.CNTKv2 。
Once you saved the trained model, it’s very easy to load that model. We only need to use the load () function. Let’s check this in following example −
保存训练好的模型后,很容易加载该模型。 我们只需要使用load()函数。 让我们在以下示例中进行检查-
import numpy as np
import cntk as C
model = C.ops.functions.Function.load(“.\\neuralregressor.model”)
np.set_printoptions(precision = 2, suppress=True)
unknown = np.array([[0.09, 50.00, 4.5, 17.00, 350.00], dtype=np.float32)
print("\nPredicting area median home value for feature/predictor values: ")
print(unknown[0])
pred_prob = model.eval({X: unknown)
print("\nPredicted value is: ")
print(“$%0.2f (x1000)” %pred_value[0,0])
The benefit of saved model is that once you load a saved model, it can be used exactly as if the model had just been trained.
保存的模型的好处是,一旦加载了保存的模型,就可以像使用模型一样对它进行精确的使用。
This chapter will help you to understand how to measure performance of classification model in CNTK. Let us begin with confusion matrix.
本章将帮助您了解如何衡量CNTK中分类模型的性能。 让我们从混淆矩阵开始。
Confusion matrix - a table with the predicted output versus the expected output is the easiest way to measure the performance of a classification problem, where the output can be of two or more type of classes.
混淆矩阵-包含预期输出与预期输出的表是衡量分类问题性能的最简单方法,其中输出可以是两种或多种类型的类别。
In order to understand how it works, we are going to create a confusion matrix for a binary classification model that predicts, whether a credit card transaction was normal or a fraud. It is shown as follows −
为了了解其工作原理,我们将为二进制分类模型创建一个混淆矩阵,该模型可预测信用卡交易是正常交易还是欺诈交易。 它显示如下-
Actual fraud | Actual normal | |
---|---|---|
Predicted fraud |
True positive |
False positive |
Predicted normal |
False negative |
True negative |
实际欺诈 | 实际正常 | |
---|---|---|
预期欺诈 |
真正的积极 |
假阳性 |
预测正常 |
假阴性 |
真否定 |
As we can see, the above sample confusion matrix contains 2 columns, one for class fraud and other for class normal. In the same way we have 2 rows, one is added for class fraud and other is added for class normal. Following is the explanation of the terms associated with confusion matrix −
可以看到,上面的样本混淆矩阵包含2列,一列用于类别欺诈,另一列用于类别正常。 以同样的方式,我们有2行,为行欺诈添加了一行,为正常课添加了另一行。 以下是与混淆矩阵相关的术语的解释-
True Positives − When both actual class & predicted class of data point is 1.
真实正值 -当数据点的实际类别和预测类别均为1时。
True Negatives − When both actual class & predicted class of data point is 0.
真否定 -当数据点的实际类别和预测类别均为0时。
False Positives − When actual class of data point is 0 & predicted class of data point is 1.
误报 -当数据点的实际类别为0且数据点的预测类别为1时。
False Negatives − When actual class of data point is 1 & predicted class of data point is 0.
假阴性 -当数据点的实际类别为1且数据点的预测类别为0时。
Let’s see, how we can calculate number of different things from the confusion matrix −
让我们看看,如何从混淆矩阵中计算出不同事物的数量-
Accuracy − It is the number of correct predictions made by our ML classification model. It can be calculated with the help of following formula −
准确性 -这是我们的ML分类模型做出的正确预测的数量。 可以借助以下公式进行计算-
Precision −It tells us how many samples were correctly predicted out of all samples we predicted. It can be calculated with the help of following formula −
精度 -它告诉我们在我们预测的所有样本中正确预测了多少个样本。 可以借助以下公式进行计算-
Recall or Sensitivity − Recall are the number of positives returned by our ML classification model. In other words, it tells us how many of the fraud cases in the dataset were actually detected by the model. It can be calculated with the help of following formula −
回忆或敏感度 -回忆是我们的ML分类模型返回的阳性数。 换句话说,它告诉我们模型实际上检测到数据集中有多少个欺诈案件。 可以借助以下公式进行计算-
Specificity − Opposite to recall, it gives the number of negatives returned by our ML classification model. It can be calculated with the help of following formula −
特异性 -与召回相反,它给出了我们的ML分类模型返回的负数。 可以借助以下公式进行计算-
We can use F-measure as an alternative of Confusion matrix. The main reason behind this, we can’t maximize Recall and Precision at the same time. There is a very strong relationship between these metrics and that can be understood with the help of following example −
我们可以使用F度量作为混淆矩阵的替代方案。 这背后的主要原因是,我们无法同时最大化查全率和查准率。 这些指标之间有很强的关系,可以通过以下示例来理解-
Suppose, we want to use a DL model to classify cell samples as cancerous or normal. Here, to reach maximum precision we need to reduce the number of predictions to 1. Although, this can give us reach around 100 percent precision, but recall will become really low.
假设我们要使用DL模型将细胞样本分类为癌性或正常。 在这里,为了达到最高的精度,我们需要将预测数减少到1。虽然,这可以使我们达到100%左右的精度,但是召回率会非常低。
On the other hand, if we would like to reach maximum recall, we need to make as many predictions as possible. Although, this can give us reach around 100 percent recall, but precision will become really low.
另一方面,如果我们想最大程度地提高召回率,则需要做出尽可能多的预测。 虽然,这可以使我们达到100%左右的召回率,但是精确度确实会降低。
In practice, we need to find a way balancing between precision and recall. The F-measure metric allows us to do so, as it expresses a harmonic average between precision and recall.
在实践中,我们需要找到一种在精度和召回率之间取得平衡的方法。 F量度指标允许我们这样做,因为它表示精度和召回率之间的谐波平均值。
This formula is called the F1-measure, where the extra term called B is set to 1 to get an equal ratio of precision and recall. In order to emphasize recall, we can set the factor B to 2. On the other hand, to emphasize precision, we can set the factor B to 0.5.
该公式称为F1量度,其中将称为B的额外项设置为1,以得到相等的精度和查全率。 为了强调回忆,我们可以将系数B设置为2。另一方面,为了强调精度,我们可以将系数B设置为0.5。
In previous section we have created a classification model using Iris flower dataset. Here, we will be measuring its performance by using confusion matrix and F-measure metric.
在上一节中,我们使用鸢尾花数据集创建了一个分类模型。 在这里,我们将使用混淆矩阵和F-measure度量来测量其性能。
We already created the model, so we can start the validating process, which includes confusion matrix, on the same. First, we are going to create confusion matrix with the help of the confusion_matrix function from scikit-learn. For this, we need the real labels for our test samples and the predicted labels for the same test samples.
我们已经创建了模型,因此可以在其中开始包含混淆矩阵的验证过程。 首先,我们将借助scikit-learn的confusion_matrix函数创建混淆矩阵。 为此,我们需要测试样品的真实标签和相同测试样品的预测标签。
Let’s calculate the confusion matrix by using following python code −
让我们通过使用以下python代码来计算混淆矩阵-
from sklearn.metrics import confusion_matrix
y_true = np.argmax(y_test, axis=1)
y_pred = np.argmax(z(X_test), axis=1)
matrix = confusion_matrix(y_true=y_true, y_pred=y_pred)
print(matrix)
[[10 0 0]
[ 0 1 9]
[ 0 0 10]]
We can also use heatmap function to visualise a confusion matrix as follows −
我们还可以使用热图函数来可视化混淆矩阵,如下所示:
import seaborn as sns
import matplotlib.pyplot as plt
g = sns.heatmap(matrix,
annot=True,
xticklabels=label_encoder.classes_.tolist(),
yticklabels=label_encoder.classes_.tolist(),
cmap='Blues')
g.set_yticklabels(g.get_yticklabels(), rotation=0)
plt.show()
We should also have a single performance number, that we can use to compare the model. For this, we need to calculate the classification error by using classification_error function, from the metrics package in CNTK as done while creating classification model.
我们还应该有一个性能数字,可以用来比较模型。 为此,我们需要使用classification_error函数来计算分类错误,从指标包CNTK为完成在创建分类模型。
Now to calculate the classification error, execute the test method on the loss function with a dataset. After that, CNTK will take the samples we provided as input for this function and make a prediction based on input features X_test.
现在要计算分类误差,请使用数据集对损失函数执行测试方法。 之后,CNTK将采用我们提供的样本作为此功能的输入,并根据输入特征X_ test进行预测。
loss.test([X_test, y_test])
{'metric': 0.36666666666, 'samples': 30}
For implementing F-Measures, CNTK also includes function called fmeasures. We can use this function, while training the NN by replacing the cell cntk.metrics.classification_error, with a call to cntk.losses.fmeasure when defining the criterion factory function as follows −
为了实现F-措施,CNTK还包括称为f-措施的功能。 我们可以在定义标准工厂功能时通过调用cntk.losses.fmeasure替换单元cntk.metrics.classification_error来训练NN时使用此函数,如下所示:
import cntk
@cntk.Function
def criterion_factory(output, target):
loss = cntk.losses.cross_entropy_with_softmax(output, target)
metric = cntk.losses.fmeasure(output, target)
return loss, metric
After using cntk.losses.fmeasure function, we will get different output for the loss.test method call given as follows −
使用cntk.losses.fmeasure函数后,对于loss.test方法调用,我们将获得不同的输出,如下所示:
loss.test([X_test, y_test])
{'metric': 0.83101488749, 'samples': 30}
Here, we will study about measuring performance with regards to a regression model.
在这里,我们将研究有关衡量回归模型性能的信息。
As we know that regression models are different than classification models, in the sense that, there is no binary measure of right or wrong for individuals’ samples. In regression models, we want to measure how close the prediction is to the actual value. The closer the prediction value is to the expected output, the better the model performs.
我们知道,回归模型与分类模型不同,从某种意义上说,个人样本没有对与错的二进制度量。 在回归模型中,我们要测量预测值与实际值的接近程度。 预测值越接近预期输出,模型的性能越好。
Here, we are going to measure the performance of NN used for regression using different error-rate functions.
在这里,我们将使用不同的错误率函数来衡量用于回归的NN的性能。
As discussed earlier, while validating a regression model, we can’t say whether a prediction is right or wrong. We want our prediction to be as close as possible to the real value. But, a small error margin is acceptable here.
如前所述,在验证回归模型时,我们不能说预测是对还是错。 我们希望我们的预测尽可能接近实际价值。 但是,此处可接受的误差范围很小。
The formula for calculating the error margin is as follows −
误差容限的计算公式如下-
Here,
这里,
Predicted value = indicated y by a hat
预测值 =用帽子表示y
Real value = predicted by y
实际值 =由y预测
First, we need to calculate the distance between the predicted and the real value. Then, to get an overall error rate, we need to sum these squared distances and calculate the average. This is called the mean squared error function.
首先,我们需要计算预测值与实际值之间的距离。 然后,为了获得总体错误率,我们需要对这些平方距离求和并计算平均值。 这称为均方误差函数。
But, if we want performance figures that express an error margin, we need a formula that expresses the absolute error. The formula for mean absolute error function is as follows −
但是,如果我们想要表示误差容限的性能指标,则需要一个表示绝对误差的公式。 平均绝对误差函数的公式如下-
The above formula takes the absolute distance between the predicted and the real value.
上述公式采用了预测值与实际值之间的绝对距离。
Here, we will look at how to use the different metrics, we discussed in combination with CNTK. We will use a regression model, that predicts miles per gallon for cars using the steps given below.
在这里,我们将结合CNTK讨论如何使用不同的指标。 我们将使用回归模型,该模型使用以下步骤预测汽车的每加仑英里数。
Step 1 − First, we need to import the required components from cntk package as follows −
步骤1-首先,我们需要从cntk包中导入所需的组件,如下所示-
from cntk import default_option, input_variable
from cntk.layers import Dense, Sequential
from cntk.ops import relu
Step 2 − Next, we need to define a default activation function using the default_options functions. Then, create a new Sequential layer set and provide two Dense layers with 64 neurons each. Then, we add an additional Dense layer (which will act as the output layer) to the Sequential layer set and give 1 neuron without an activation as follows −
步骤2-接下来,我们需要使用default_options函数定义一个默认激活函数。 然后,创建一个新的顺序层集,并提供两个具有64个神经元的密集层。 然后,我们向顺序层集添加一个额外的密集层(将作为输出层),并给出1个神经元而没有激活,如下所示:
with default_options(activation=relu):
model = Sequential([Dense(64),Dense(64),Dense(1,activation=None)])
Step 3 − Once the network has been created, we need to create an input feature. We need to make sure that, it has the same shape as the features that we are going to be using for training.
步骤3-一旦创建了网络,我们需要创建一个输入功能。 我们需要确保它的形状与我们将用于训练的功能相同。
features = input_variable(X.shape[1])
Step 4 − Now, we need to create another input_variable with size 1. It will be used to store the expected value for NN.
步骤4-现在,我们需要创建另一个大小为1的input_variable 。它将用于存储NN的期望值。
target = input_variable(1)
z = model(features)
Now, we need to train the model and in order to do so, we are going to split the dataset and perform preprocessing using the following implementation steps −
现在,我们需要训练模型,为了做到这一点,我们将使用以下实现步骤拆分数据集并执行预处理-
Step 5 −First, import StandardScaler from sklearn.preprocessing to get the values between -1 and +1. This will help us against exploding gradient problems in the NN.
步骤5-首先,从sklearn.preprocessing导入StandardScaler以获得-1和+1之间的值。 这将帮助我们解决NN中爆炸性梯度问题。
from sklearn.preprocessing import StandardScalar
Step 6 − Next, import train_test_split from sklearn.model_selection as follows−
步骤6-接下来,从sklearn.model_selection导入train_test_split,如下所示:
from sklearn.model_selection import train_test_split
Step 7 − Drop the mpg column from the dataset by using the dropmethod. At last split the dataset into a training and validation set using the train_test_split function as follows −
步骤7-通过使用drop方法从数据集中删除mpg列。 最后,使用train_test_split函数将数据集分为训练和验证集,如下所示:
x = df_cars.drop(columns=[‘mpg’]).values.astype(np.float32)
y=df_cars.iloc[: , 0].values.reshape(-1, 1).astype(np.float32)
scaler = StandardScaler()
X = scaler.fit_transform(x)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
Step 8 − Now, we need to create another input_variable with size 1. It will be used to store the expected value for NN.
步骤8-现在,我们需要创建另一个大小为1的input_variable。它将用于存储NN的期望值。
target = input_variable(1)
z = model(features)
We have split as well as preprocessed the data, now we need to train the NN. As did in previous sections while creating regression model, we need to define a combination of a loss and metric function to train the model.
我们已经分割并预处理了数据,现在我们需要训练NN。 与创建回归模型时一样,我们需要定义损失和度量函数的组合来训练模型。
import cntk
def absolute_error(output, target):
return cntk.ops.reduce_mean(cntk.ops.abs(output – target))
@ cntk.Function
def criterion_factory(output, target):
loss = squared_error(output, target)
metric = absolute_error(output, target)
return loss, metric
Now, let’s have a look at how to use the trained model. For our model, we will use criterion_factory as the loss and metric combination.
现在,让我们看一下如何使用经过训练的模型。 对于我们的模型,我们将使用criteria_factory作为损失和指标的组合。
from cntk.losses import squared_error
from cntk.learners import sgd
from cntk.logging import ProgressPrinter
progress_printer = ProgressPrinter(0)
loss = criterion_factory (z, target)
learner = sgd(z.parameters, 0.001)
training_summary=loss.train((x_train,y_train),parameter_learners=[learner],callbacks=[progress_printer],minibatch_size=16,max_epochs=10)
from cntk import default_option, input_variable
from cntk.layers import Dense, Sequential
from cntk.ops import relu
with default_options(activation=relu):
model = Sequential([Dense(64),Dense(64),Dense(1,activation=None)])
features = input_variable(X.shape[1])
target = input_variable(1)
z = model(features)
from sklearn.preprocessing import StandardScalar
from sklearn.model_selection import train_test_split
x = df_cars.drop(columns=[‘mpg’]).values.astype(np.float32)
y=df_cars.iloc[: , 0].values.reshape(-1, 1).astype(np.float32)
scaler = StandardScaler()
X = scaler.fit_transform(x)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
target = input_variable(1)
z = model(features)
import cntk
def absolute_error(output, target):
return cntk.ops.reduce_mean(cntk.ops.abs(output – target))
@ cntk.Function
def criterion_factory(output, target):
loss = squared_error(output, target)
metric = absolute_error(output, target)
return loss, metric
from cntk.losses import squared_error
from cntk.learners import sgd
from cntk.logging import ProgressPrinter
progress_printer = ProgressPrinter(0)
loss = criterion_factory (z, target)
learner = sgd(z.parameters, 0.001)
training_summary=loss.train((x_train,y_train),parameter_learners=[learner],callbacks=[progress_printer],minibatch_size=16,max_epochs=10)
-------------------------------------------------------------------
average since average since examples
loss last metric last
------------------------------------------------------
Learning rate per minibatch: 0.001
690 690 24.9 24.9 16
654 636 24.1 23.7 48
[………]
In order to validate our regression model, we need to make sure that, the model handles new data just as well as it does with the training data. For this, we need to invoke the test method on loss and metric combination with test data as follows −
为了验证我们的回归模型,我们需要确保该模型能够像处理训练数据一样处理新数据。 为此,我们需要对损失和度量与测试数据的组合调用测试方法,如下所示:
loss.test([X_test, y_test])
{'metric': 1.89679785619, 'samples': 79}
In this chapter, how to measure performance of out-of-memory datasets will be explained.
在本章中,将说明如何测量内存不足数据集的性能。
In previous sections, we have discussed about various methods to validate the performance of our NN, but the methods we have discussed, are ones that deals with the datasets that fit in the memory.
在前面的部分中,我们讨论了各种方法来验证NN的性能,但是我们所讨论的方法是处理适合内存的数据集的方法。
Here, the question arises what about out-of-memory datasets, because in production scenario, we need a lot of data to train NN. In this section, we are going to discuss how to measure performance when working with minibatch sources and manual minibatch loop.
在这里,出现了关于内存不足数据集的问题,因为在生产场景中,我们需要大量数据来训练NN 。 在本节中,我们将讨论在使用微型批处理源和手动微型批处理循环时如何测量性能。
While working with out-of-memory dataset, i.e. minibatch sources, we need slightly different setup for loss, as well as metric, than the setup we used while working with small datasets i.e. in-memory datasets. First, we will see how to set up a way to feed data to the trainer of NN model.
当使用内存不足的数据集(即小批量来源)时,与用于小型数据集(即内存中的数据集)时使用的设置相比,我们需要的损耗和度量设置略有不同。 首先,我们将了解如何建立一种将数据馈送到NN模型训练器的方法。
Following are the implementation steps−
以下是实施步骤-
Step 1 − First, from cntk.io module import the components for creating the minibatch source as follows−
步骤1-首先,从cntk。 io模块按如下方式导入用于创建minibatch源的组件-
from cntk.io import StreamDef, StreamDefs, MinibatchSource, CTFDeserializer,
INFINITY_REPEAT
Step 2 − Next, create a new function named say create_datasource. This function will have two parameters namely filename and limit, with a default value of INFINITELY_REPEAT.
步骤2-接下来,创建一个名为say create_datasource的新函数。 该函数将具有两个参数,即filename和limit,其默认值为INFINITELY_REPEAT 。
def create_datasource(filename, limit =INFINITELY_REPEAT)
Step 3 − Now, within the function, by using StreamDef class crate a stream definition for the labels that reads from the labels field that has three features. We also need to set is_sparse to False as follows−
步骤3-现在,在函数内,通过使用StreamDef类为从具有三个功能的labels字段读取的标签创建流定义。 我们还需要将is_sparse设置为False ,如下所示:
labels_stream = StreamDef(field=’labels’, shape=3, is_sparse=False)
Step 4 − Next, create to read the features filed from the input file, create another instance of StreamDef as follows.
步骤4-接下来,创建以读取输入文件中归档的功能,如下所示创建另一个StreamDef实例。
feature_stream = StreamDef(field=’features’, shape=4, is_sparse=False)
Step 5 − Now, initialise the CTFDeserializer instance class. Specify the filename and streams that we need to deserialize as follows −
步骤5-现在,初始化CTFDeserializer实例类。 指定我们需要反序列化的文件名和流,如下所示:
deserializer = CTFDeserializer(filename, StreamDefs(labels=
label_stream, features=features_stream)
Step 6 − Next, we need to create instance of minisourceBatch by using deserializer as follows −
步骤6-接下来,我们需要使用反序列化器创建minisourceBatch的实例,如下所示-
Minibatch_source = MinibatchSource(deserializer, randomize=True, max_sweeps=limit)
return minibatch_source
Step 7 − At last, we need to provide training and testing source, which we created in previous sections also. We are using iris flower dataset.
步骤7-最后,我们需要提供培训和测试资源,这也是我们在上一节中创建的。 我们正在使用鸢尾花数据集。
training_source = create_datasource(‘Iris_train.ctf’)
test_source = create_datasource(‘Iris_test.ctf’, limit=1)
Once you create MinibatchSource instance, we need to train it. We can use the same training logic, as used when we worked with small in-memory datasets. Here, we will use MinibatchSource instance, as the input for the train method on loss function as follows −
创建MinibatchSource实例后,我们需要对其进行培训。 我们可以使用与处理小型内存数据集时相同的训练逻辑。 在这里,我们将使用MinibatchSource实例,作为损失函数的train方法的输入,如下所示:
Following are the implementation steps−
以下是实施步骤-
Step 1 − In order to log the output of the training session, first import the ProgressPrinter from cntk.logging module as follows −
步骤1-为了记录培训课程的输出,请首先从cntk.logging模块导入ProgressPrinter ,如下所示:
from cntk.logging import ProgressPrinter
Step 2 − Next, to set up the training session, import the trainer and training_session from cntk.train module as follows−
步骤2-接下来,要设置培训课程,请从cntk.train模块导入培训师和training_session ,如下所示:
from cntk.train import Trainer, training_session
Step 3 − Now, we need to define some set of constants like minibatch_size, samples_per_epoch and num_epochs as follows−
步骤3-现在,我们需要定义一些常量集,例如minibatch_size , samples_per_epoch和num_epochs ,如下所示:
minbatch_size = 16
samples_per_epoch = 150
num_epochs = 30
max_samples = samples_per_epoch * num_epochs
Step 4 − Next, in order to know how to read data during training in CNTK, we need to define a mapping between the input variable for the network and the streams in the minibatch source.
步骤4-接下来,为了知道如何在CNTK训练期间读取数据,我们需要定义网络的输入变量和minibatch源中的流之间的映射。
input_map = {
features: training_source.streams.features,
labels: training_source.streams.labels
}
Step 5 − Next to log the output of the training process, initialize the progress_printer variable with a new ProgressPrinter instance. Also, initialize the trainer and provide it with the model as follows−
步骤5-接下来要记录训练过程的输出,请使用新的ProgressPrinter实例初始化progress_printer变量。 另外,初始化训练器并为其提供以下模型-
progress_writer = ProgressPrinter(0)
trainer: training_source.streams.labels
Step 6 − At last, to start the training process, we need to invoke the training_session function as follows −
步骤6-最后,要开始训练过程,我们需要按以下方式调用training_session函数-
session = training_session(trainer,
mb_source=training_source,
mb_size=minibatch_size,
model_inputs_to_streams=input_map,
max_samples=max_samples,
test_config=test_config)
session.train()
Once we trained the model, we can add validation to this setup by using a TestConfig object and assign it to the test_config keyword argument of the train_session function.
训练完模型后,我们可以使用TestConfig对象将验证添加到此设置中,并将其分配给train_session函数的test_config关键字参数。
Following are the implementation steps−
以下是实施步骤-
Step 1 − First, we need to import the TestConfig class from the module cntk.train as follows−
步骤1-首先,我们需要从模块cntk.train导入TestConfig类,如下所示:
from cntk.train import TestConfig
Step 2 − Now, we need to create a new instance of the TestConfig with the test_source as input−
步骤2-现在,我们需要使用test_source作为输入来创建TestConfig的新实例-
Test_config = TestConfig(test_source)
from cntk.io import StreamDef, StreamDefs, MinibatchSource, CTFDeserializer, INFINITY_REPEAT
def create_datasource(filename, limit =INFINITELY_REPEAT)
labels_stream = StreamDef(field=’labels’, shape=3, is_sparse=False)
feature_stream = StreamDef(field=’features’, shape=4, is_sparse=False)
deserializer = CTFDeserializer(filename, StreamDefs(labels=label_stream, features=features_stream)
Minibatch_source = MinibatchSource(deserializer, randomize=True, max_sweeps=limit)
return minibatch_source
training_source = create_datasource(‘Iris_train.ctf’)
test_source = create_datasource(‘Iris_test.ctf’, limit=1)
from cntk.logging import ProgressPrinter
from cntk.train import Trainer, training_session
minbatch_size = 16
samples_per_epoch = 150
num_epochs = 30
max_samples = samples_per_epoch * num_epochs
input_map = {
features: training_source.streams.features,
labels: training_source.streams.labels
}
progress_writer = ProgressPrinter(0)
trainer: training_source.streams.labels
session = training_session(trainer,
mb_source=training_source,
mb_size=minibatch_size,
model_inputs_to_streams=input_map,
max_samples=max_samples,
test_config=test_config)
session.train()
from cntk.train import TestConfig
Test_config = TestConfig(test_source)
-------------------------------------------------------------------
average since average since examples
loss last metric last
------------------------------------------------------
Learning rate per minibatch: 0.1
1.57 1.57 0.214 0.214 16
1.38 1.28 0.264 0.289 48
[………]
Finished Evaluation [1]: Minibatch[1-1]:metric = 69.65*30;
As we see above, it is easy to measure the performance of our NN model during and after training, by using the metrics when training with regular APIs in CNTK. But, on the other side, things will not be that easy while working with a manual minibatch loop.
正如我们在上面看到的,通过在CNTK中使用常规API进行训练时使用度量,可以很容易地测量训练期间和训练后NN模型的性能。 但是,另一方面,使用手动minibatch循环时,事情不会那么容易。
Here, we are using the model given below with 4 inputs and 3 outputs from Iris Flower dataset, created in previous sections too−
在这里,我们使用下面给出的模型,该模型也具有在先前部分中创建的Iris Flower数据集的4个输入和3个输出-
from cntk import default_options, input_variable
from cntk.layers import Dense, Sequential
from cntk.ops import log_softmax, relu, sigmoid
from cntk.learners import sgd
model = Sequential([
Dense(4, activation=sigmoid),
Dense(3, activation=log_softmax)
])
features = input_variable(4)
labels = input_variable(3)
z = model(features)
Next, the loss for the model is defined as the combination of the cross-entropy loss function, and the F-measure metric as used in previous sections. We are going to use the criterion_factory utility, to create this as a CNTK function object as shown below−
接下来,将模型的损失定义为交叉熵损失函数和前面几节中使用的F度量度量的组合。 我们将使用criteria_factory实用工具,将其创建为CNTK函数对象,如下所示-
import cntk
from cntk.losses import cross_entropy_with_softmax, fmeasure
@cntk.Function
def criterion_factory(outputs, targets):
loss = cross_entropy_with_softmax(outputs, targets)
metric = fmeasure(outputs, targets, beta=1)
return loss, metric
loss = criterion_factory(z, labels)
learner = sgd(z.parameters, 0.1)
label_mapping = {
'Iris-setosa': 0,
'Iris-versicolor': 1,
'Iris-virginica': 2
}
Now, as we have defined the loss function, we will see how we can use it in the trainer, to set up a manual training session.
现在,我们已经定义了损失功能,我们将看到如何在培训师中使用它来设置手动培训课程。
Following are the implementation steps −
以下是实施步骤-
Step 1 − First, we need to import the required packages like numpy and pandas to load and preprocess the data.
步骤1-首先,我们需要导入所需的程序包(例如numpy和pandas)以加载和预处理数据。
import pandas as pd
import numpy as np
Step 2 − Next, in order to log information during training, import the ProgressPrinter class as follows−
步骤2-接下来,为了在培训期间记录信息,请按以下方式导入ProgressPrinter类:
from cntk.logging import ProgressPrinter
Step 3 − Then, we need to import the trainer module from cntk.train module as follows −
步骤3-然后,我们需要从cntk.train模块导入trainer模块,如下所示-
from cntk.train import Trainer
Step 4 − Next, create a new instance of ProgressPrinter as follows −
步骤4-接下来,如下创建一个ProgressPrinter的新实例-
progress_writer = ProgressPrinter(0)
Step 5 − Now, we need to initialise trainer with the parameters the loss, the learner and the progress_writer as follows −
步骤5-现在,我们需要使用参数loss,学习者和progress_writer初始化培训器,如下所示:
trainer = Trainer(z, loss, learner, progress_writer)
Step 6 −Next, in order to train the model, we will create a loop that will iterate over the dataset thirty times. This will be the outer training loop.
步骤6-接下来,为了训练模型,我们将创建一个循环,该循环将遍历数据集三十次。 这将是外部训练循环。
for _ in range(0,30):
Step 7 − Now, we need to load the data from disk using pandas. Then, in order to load the dataset in mini-batches, set the chunksize keyword argument to 16.
步骤7-现在,我们需要使用熊猫从磁盘加载数据。 然后,为了以小批量加载数据集,请将chunksize关键字参数设置为16。
input_data = pd.read_csv('iris.csv',
names=['sepal_length', 'sepal_width','petal_length','petal_width', 'species'],
index_col=False, chunksize=16)
Step 8 − Now, create an inner training for loop to iterate over each of the mini-batches.
步骤8-现在,为循环创建一个内部训练,以迭代每个迷你批 。
for df_batch in input_data:
Step 9 − Now inside this loop, read the first four columns using the iloc indexer, as the features to train from and convert them to float32 −
第9步 -现在,这个循环里,阅读使用ILOC索引的前四列,作为拥有从训练并将其转换为FLOAT32 -
feature_values = df_batch.iloc[:,:4].values
feature_values = feature_values.astype(np.float32)
Step 10 − Now, read the last column as the labels to train from, as follows −
步骤10-现在,读取最后一列作为要训练的标签,如下所示-
label_values = df_batch.iloc[:,-1]
Step 11 − Next, we will use one-hot vectors to convert the label strings to their numeric presentation as follows −
步骤11-接下来,我们将使用一键向量将标签字符串转换为其数字表示,如下所示-
label_values = label_values.map(lambda x: label_mapping[x])
Step 12 − After that, take the numeric presentation of the labels. Next, convert them to a numpy array, so it is easier to work with them as follows −
步骤12-之后,进行标签的数字表示。 接下来,将它们转换为numpy数组,因此按以下方式更容易使用它们-
label_values = label_values.values
Step 13 − Now, we need to create a new numpy array that has the same number of rows as the label values that we have converted.
步骤13-现在,我们需要创建一个新的numpy数组,该数组具有与已转换的标签值相同的行数。
encoded_labels = np.zeros((label_values.shape[0], 3))
Step 14 − Now, in order to create one-hot encoded labels, select the columns based on the numeric label values.
步骤14-现在,为了创建一键编码标签,请基于数字标签值选择列。
encoded_labels[np.arange(label_values.shape[0]), label_values] = 1.
Step 15 − At last, we need to invoke the train_minibatch method on the trainer and provide the processed features and labels for the minibatch.
步骤15-最后,我们需要在训练器上调用train_minibatch方法,并为minibatch提供已处理的功能和标签。
trainer.train_minibatch({features: feature_values, labels: encoded_labels})
from cntk import default_options, input_variable
from cntk.layers import Dense, Sequential
from cntk.ops import log_softmax, relu, sigmoid
from cntk.learners import sgd
model = Sequential([
Dense(4, activation=sigmoid),
Dense(3, activation=log_softmax)
])
features = input_variable(4)
labels = input_variable(3)
z = model(features)
import cntk
from cntk.losses import cross_entropy_with_softmax, fmeasure
@cntk.Function
def criterion_factory(outputs, targets):
loss = cross_entropy_with_softmax(outputs, targets)
metric = fmeasure(outputs, targets, beta=1)
return loss, metric
loss = criterion_factory(z, labels)
learner = sgd(z.parameters, 0.1)
label_mapping = {
'Iris-setosa': 0,
'Iris-versicolor': 1,
'Iris-virginica': 2
}
import pandas as pd
import numpy as np
from cntk.logging import ProgressPrinter
from cntk.train import Trainer
progress_writer = ProgressPrinter(0)
trainer = Trainer(z, loss, learner, progress_writer)
for _ in range(0,30):
input_data = pd.read_csv('iris.csv',
names=['sepal_length', 'sepal_width','petal_length','petal_width', 'species'],
index_col=False, chunksize=16)
for df_batch in input_data:
feature_values = df_batch.iloc[:,:4].values
feature_values = feature_values.astype(np.float32)
label_values = df_batch.iloc[:,-1]
label_values = label_values.map(lambda x: label_mapping[x])
label_values = label_values.values
encoded_labels = np.zeros((label_values.shape[0], 3))
encoded_labels[np.arange(label_values.shape[0]),
label_values] = 1.
trainer.train_minibatch({features: feature_values, labels: encoded_labels})
-------------------------------------------------------------------
average since average since examples
loss last metric last
------------------------------------------------------
Learning rate per minibatch: 0.1
1.45 1.45 -0.189 -0.189 16
1.24 1.13 -0.0382 0.0371 48
[………]
In the above output, we got both the output for the loss and the metric during training. It is because we combined a metric and loss in a function object and used a progress printer in the trainer configuration.
在上面的输出中,我们同时获得了损失和训练期间指标的输出。 这是因为我们在功能对象中结合了度量和损失,并在训练器配置中使用了进度打印机。
Now, in order to evaluate the model performance, we need to perform same task as with training the model, but this time, we need to use an Evaluator instance to test the model. It is shown in the following Python code−
现在,为了评估模型的性能,我们需要执行与训练模型相同的任务,但是这次,我们需要使用Evaluator实例来测试模型。 在以下Python代码中显示-
from cntk import Evaluator
evaluator = Evaluator(loss.outputs[1], [progress_writer])
input_data = pd.read_csv('iris.csv',
names=['sepal_length', 'sepal_width','petal_length','petal_width', 'species'],
index_col=False, chunksize=16)
for df_batch in input_data:
feature_values = df_batch.iloc[:,:4].values
feature_values = feature_values.astype(np.float32)
label_values = df_batch.iloc[:,-1]
label_values = label_values.map(lambda x: label_mapping[x])
label_values = label_values.values
encoded_labels = np.zeros((label_values.shape[0], 3))
encoded_labels[np.arange(label_values.shape[0]), label_values] = 1.
evaluator.test_minibatch({ features: feature_values, labels:
encoded_labels})
evaluator.summarize_test_progress()
Now, we will get the output something like the following−
现在,我们将获得类似以下的输出-
Finished Evaluation [1]: Minibatch[1-11]:metric = 74.62*143;
In this chapter, we will understand how to monitor a model in CNTK.
在本章中,我们将了解如何在CNTK中监视模型。
In previous sections, we have done some validation on our NN models. But, is it also necessary and possible to monitor our model during training?
在前面的部分中,我们对NN模型进行了一些验证。 但是,在培训过程中也有必要并且有可能监视我们的模型吗?
Yes, already we have used ProgressWriter class to monitor our model and there are many more ways to do so. Before getting deep into the ways, first let’s have a look how monitoring in CNTK works and how we can use it to detect problems in our NN model.
是的,我们已经使用ProgressWriter类来监视我们的模型,并且还有许多其他方法可以执行此操作。 在深入探讨方法之前,首先让我们看一下CNTK中的监视如何工作以及如何使用它来检测NN模型中的问题。
Actually, during training and validation, CNTK allows us to specify callbacks in several spots in the API. First, let’s take a closer look at when CNTK invokes callbacks.
实际上,在培训和验证期间,CNTK允许我们在API的多个位置指定回调。 首先,让我们仔细看看CNTK何时调用回调。
CNTK will invoke the callbacks at the training and testing set moments when−
CNTK将在训练和测试设置的时刻调用回调,当-
A minibatch is completed.
小批量完成。
A full sweep over the dataset is completed during training.
在训练过程中完成了对数据集的全面扫描。
A minibatch of testing is completed.
小批量测试已完成。
A full sweep over the dataset is completed during testing.
在测试过程中完成了对数据集的全面扫描。
While working with CNTK, we can specify callbacks in several spots in the API. For example−
使用CNTK时,我们可以在API的多个位置指定回调。 例如-
Here, when we call train on a loss function, we can specify a set of callbacks through the callbacks argument as follows−
在这里,当我们在损失函数上调用train时,我们可以通过callbacks参数指定一组回调,如下所示:
training_summary=loss.train((x_train,y_train),
parameter_learners=[learner],
callbacks=[progress_writer]),
minibatch_size=16, max_epochs=15)
In this case, we can specify callbacks for monitoring purpose while creating the Trainer as follows−
在这种情况下,我们可以在创建Trainer时指定用于监视目的的回调,如下所示:
from cntk.logging import ProgressPrinter
callbacks = [
ProgressPrinter(0)
]
Trainer = Trainer(z, (loss, metric), learner, [callbacks])
Let us study about different monitoring tools.
让我们研究不同的监视工具。
While reading this tutorial, you will find ProgressPrinter as the most used monitoring tool. Some of the characteristics of ProgressPrinter monitoring tool are−
阅读本教程时,您会发现ProgressPrinter是最常用的监视工具。 ProgressPrinter监视工具的一些特征是-
ProgressPrinter class implements basic console-based logging to monitor our model. It can log to disk we want it to.
ProgressPrinter类实现了基于控制台的基本日志记录,以监视我们的模型。 它可以记录到我们想要的磁盘上。
Especially useful while working in a distributed training scenario.
在分布式培训方案中工作时特别有用。
It is also very useful while working in a scenario where we can’t log in on the console to see the output of our Python program.
在无法登录控制台以查看Python程序输出的情况下,它也非常有用。
With the help of following code, we can create an instance of ProgressPrinter−
借助以下代码,我们可以创建ProgressPrinter的实例-
ProgressPrinter(0, log_to_file=’test.txt’)
We will get the output something that we have seen in the earlier sections−
我们将获得前面部分中看到的输出-
Test.txt
CNTKCommandTrainInfo: train : 300
CNTKCommandTrainInfo: CNTKNoMoreCommands_Total : 300
CNTKCommandTrainBegin: train
-------------------------------------------------------------------
average since average since examples
loss last metric last
------------------------------------------------------
Learning rate per minibatch: 0.1
1.45 1.45 -0.189 -0.189 16
1.24 1.13 -0.0382 0.0371 48
[………]
One of the disadvantages of using ProgressPrinter is that, we can’t get a good view of how the loss and metric progress over time is hard. TensorBoardProgressWriter is a great alternative to the ProgressPrinter class in CNTK.
使用ProgressPrinter的缺点之一是,我们无法很好地了解随着时间的流逝而造成的损失和指标进度如何困难。 TensorBoardProgressWriter是CNTK中ProgressPrinter类的绝佳替代品。
Before using it, we need to first install it with the help of following command −
在使用它之前,我们需要首先在以下命令的帮助下安装它-
pip install tensorboard
Now, in order to use TensorBoard, we need to set up TensorBoardProgressWriter in our training code as follows−
现在,为了使用TensorBoard,我们需要在我们的训练代码中设置TensorBoardProgressWriter ,如下所示-
import time
from cntk.logging import TensorBoardProgressWriter
tensorbrd_writer = TensorBoardProgressWriter(log_dir=’logs/{}’.format(time.time()),freq=1,model=z)
It is a good practice to call the close method on TensorBoardProgressWriter instance after done with the training of NNmodel.
训练完NN模型后, 最好在TensorBoardProgressWriter实例上调用close方法。
We can visualise the TensorBoard logging data with the help of following command −
我们可以借助以下命令来可视化TensorBoard日志记录数据-
Tensorboard –logdir logs
In this chapter, let us study how to construct a Convolutional Neural Network (CNN) in CNTK.
在本章中,让我们研究如何在CNTK中构建卷积神经网络(CNN)。
Convolutional neural networks (CNNs) are also made up of neurons, that have learnable weights and biases. That’s why in this manner, they are like ordinary neural networks (NNs).
卷积神经网络(CNN)也由具有可学习的权重和偏差的神经元组成。 因此,它们就像普通的神经网络(NN)。
If we recall the working of ordinary NNs, every neuron receives one or more inputs, takes a weighted sum and it passed through an activation function to produce the final output. Here, the question arises that if CNNs and ordinary NNs have so many similarities then what makes these two networks different to each other?
如果我们回想起普通NN的工作原理,每个神经元都会收到一个或多个输入,并进行加权和,然后通过激活函数传递出最终输出。 在这里,出现的问题是,如果CNN和普通NN具有如此之多的相似性,那么这两个网络又有什么不同呢?
What makes them different is the treatment of input data and types of layers? The structure of input data is ignored in ordinary NN and all the data is converted into 1-D array before feeding it into the network.
是什么使它们与众不同的是对输入数据和图层类型的处理? 在普通的NN中,输入数据的结构被忽略,所有数据在输入到网络之前都被转换为一维数组。
But, Convolutional Neural Network architecture can consider the 2D structure of the images, process them and allow it to extract the properties that are specific to images. Moreover, CNNs have the advantage of having one or more Convolutional layers and pooling layer, which are the main building blocks of CNNs.
但是,卷积神经网络体系结构可以考虑图像的2D结构,对其进行处理,并允许其提取特定于图像的属性。 此外,CNN的优点是具有一个或多个卷积层和池化层,这是CNN的主要构建块。
These layers are followed by one or more fully connected layers as in standard multilayer NNs. So, we can think of CNN, as a special case of fully connected networks.
在这些层之后是一个或多个完全连接的层,如在标准多层NN中。 因此,我们可以将CNN视为全连接网络的特例。
The architecture of CNN is basically a list of layers that transforms the 3-dimensional, i.e. width, height and depth of image volume into a 3-dimensional output volume. One important point to note here is that, every neuron in the current layer is connected to a small patch of the output from the previous layer, which is like overlaying a N*N filter on the input image.
CNN的体系结构基本上是将3维(即图像体积的宽度,高度和深度)转换为3维输出体积的层的列表。 这里要注意的重要一点是,当前层中的每个神经元都连接到前一层输出的一小块,就像在输入图像上覆盖N * N滤镜一样。
It uses M filters, which are basically feature extractors that extract features like edges, corner and so on. Following are the layers [INPUT-CONV-RELU-POOL-FC] that are used to construct Convolutional neural networks (CNNs)−
它使用M个滤镜,这些滤镜基本上是特征提取器,用于提取边缘,拐角等特征。 以下是用于构造卷积神经网络(CNN)的[INPUT-CONV-RELU-POOL-FC]层-
INPUT− As the name implies, this layer holds the raw pixel values. Raw pixel values mean the data of the image as it is. Example, INPUT [64×64×3] is a 3-channeled RGB image of width-64, height-64 and depth-3.
输入 -顾名思义,该层保存原始像素值。 原始像素值表示原样的图像数据。 例如,INPUT [64×64×3]是宽度为64,高度为64和深度为3的3通道RGB图像。
CONV− This layer is one of the building blocks of CNNs as most of the computation is done in this layer. Example - if we use 6 filters on the above mentioned INPUT [64×64×3], this may result in the volume [64×64×6].
CONV-该层是CNN的组成部分之一,因为大多数计算都在该层中完成。 示例-如果在上述INPUT [64×64×3]上使用6个滤镜,则可能会导致体积为[64×64×6]。
RELU−Also called rectified linear unit layer, that applies an activation function to the output of previous layer. In other manner, a non-linearity would be added to the network by RELU.
RELU-也称为整流线性单位层,将激活函数应用于上一层的输出。 以其他方式,将通过RELU将非线性添加到网络。
POOL− This layer, i.e. Pooling layer is one other building block of CNNs. The main task of this layer is down-sampling, which means it operates independently on every slice of the input and resizes it spatially.
POOL-此层(即池化层)是CNN的另一个构建块。 该层的主要任务是下采样,这意味着它在输入的每个片段上独立运行并在空间上调整其大小。
FC− It is called Fully Connected layer or more specifically the output layer. It is used to compute output class score and the resulting output is volume of the size 1*1*L where L is the number corresponding to class score.
FC-称为完全连接层,或更具体地说,称为输出层。 它用于计算输出类别分数,结果输出为大小为1 * 1 * L的体积,其中L是与类别分数对应的数字。
The diagram below represents the typical architecture of CNNs−
下图表示了CNN的典型架构-
We have seen the architecture and the basics of CNN, now we are going to building convolutional network using CNTK. Here, we will first see how to put together the structure of the CNN and then we will look at how to train the parameters of it.
我们已经了解了CNN的体系结构和基础知识,现在我们将使用CNTK构建卷积网络。 在这里,我们将首先看到如何将CNN的结构放在一起,然后我们将研究如何训练CNN的参数。
At last we’ll see, how we can improve the neural network by changing its structure with various different layer setups. We are going to use MNIST image dataset.
最后,我们将看到如何通过使用各种不同的层设置更改其结构来改善神经网络。 我们将使用MNIST图像数据集。
So, first let’s create a CNN structure. Generally, when we build a CNN for recognizing patterns in images, we do the following−
因此,首先让我们创建一个CNN结构。 通常,当我们构建CNN来识别图像中的图案时,我们会执行以下操作-
We use a combination of convolution and pooling layers.
我们结合使用卷积和池化层。
One or more hidden layer at the end of the network.
网络末端的一层或多层隐藏层。
At last, we finish the network with a softmax layer for classification purpose.
最后,我们完成一个带有softmax层的网络以用于分类。
With the help of following steps, we can build the network structure−
借助以下步骤,我们可以构建网络结构-
Step 1− First, we need to import the required layers for CNN.
步骤1-首先,我们需要为CNN导入所需的图层。
from cntk.layers import Convolution2D, Sequential, Dense, MaxPooling
Step 2− Next, we need to import the activation functions for CNN.
步骤2-接下来,我们需要导入CNN的激活功能。
from cntk.ops import log_softmax, relu
Step 3− After that in order to initialize the convolutional layers later, we need to import the glorot_uniform_initializer as follows−
步骤3-之后,为了稍后初始化卷积层,我们需要导入glorot_uniform_initializer ,如下所示:
from cntk.initializer import glorot_uniform
Step 4− Next, to create input variables import the input_variable function. And import default_option function, to make configuration of NN a bit easier.
步骤4-接下来,要创建输入变量,请导入input_variable函数。 并导入default_option函数,以使NN的配置更加容易。
from cntk import input_variable, default_options
Step 5− Now to store the input images, create a new input_variable. It will contain three channels namely red, green and blue. It would have the size of 28 by 28 pixels.
步骤5-现在要存储输入图像,请创建一个新的input_variable 。 它将包含三个通道,即红色,绿色和蓝色。 它的大小为28 x 28像素。
features = input_variable((3,28,28))
Step 6−Next, we need to create another input_variable to store the labels to predict.
步骤6-接下来,我们需要创建另一个input_variable来存储要预测的标签。
labels = input_variable(10)
Step 7− Now, we need to create the default_option for the NN. And, we need to use the glorot_uniform as the initialization function.
步骤7-现在,我们需要为NN创建default_option 。 并且,我们需要使用glorot_uniform作为初始化函数。
with default_options(initialization=glorot_uniform, activation=relu):
Step 8− Next, in order to set the structure of the NN, we need to create a new Sequential layer set.
步骤8-接下来,为了设置NN的结构,我们需要创建一个新的顺序层集。
Step 9− Now we need to add a Convolutional2D layer with a filter_shape of 5 and a strides setting of 1, within the Sequential layer set. Also, enable padding, so that the image is padded to retain the original dimensions.
步骤9-现在我们需要在顺序图层集中添加一个filter_shape为5且步幅设置为1的Convolutional2D图层。 另外,启用填充,以便填充图像以保留原始尺寸。
model = Sequential([
Convolution2D(filter_shape=(5,5), strides=(1,1), num_filters=8, pad=True),
Step 10− Now it’s time to add a MaxPooling layer with filter_shape of 2, and a strides setting of 2 to compress the image by half.
第10步 -现在是时候用的2 filter_shape添加MaxPooling层和进步的2设置为一半将图像压缩。
MaxPooling(filter_shape=(2,2), strides=(2,2)),
Step 11− Now, as we did in step 9, we need to add another Convolutional2D layer with a filter_shape of 5 and a strides setting of 1, use 16 filters. Also, enable padding, so that, the size of the image produced by the previous pooling layer should be retained.
步骤11-现在,就像我们在步骤9中所做的那样,我们需要添加另一个filter_shape为5且步幅设置为1的Convolutional2D图层,使用16个滤镜。 另外,启用填充,以便应保留前一个合并层生成的图像的大小。
Convolution2D(filter_shape=(5,5), strides=(1,1), num_filters=16, pad=True),
Step 12− Now, as we did in step 10, add another MaxPooling layer with a filter_shape of 3 and a strides setting of 3 to reduce the image to a third.
步骤12 -现在,当我们在步骤10中一样,以3 filter_shape和步幅设定为3至将图像缩小到三分之一添加另一个MaxPooling层。
MaxPooling(filter_shape=(3,3), strides=(3,3)),
Step 13− At last, add a Dense layer with ten neurons for the 10 possible classes, the network can predict. In order to turn the network into a classification model, use a log_siftmax activation function.
步骤13-最后,网络可以预测,为10个可能的类别添加包含十个神经元的密集层。 为了将网络变成分类模型,请使用log_siftmax激活函数。
Dense(10, activation=log_softmax)
])
from cntk.layers import Convolution2D, Sequential, Dense, MaxPooling
from cntk.ops import log_softmax, relu
from cntk.initializer import glorot_uniform
from cntk import input_variable, default_options
features = input_variable((3,28,28))
labels = input_variable(10)
with default_options(initialization=glorot_uniform, activation=relu):
model = Sequential([
Convolution2D(filter_shape=(5,5), strides=(1,1), num_filters=8, pad=True),
MaxPooling(filter_shape=(2,2), strides=(2,2)),
Convolution2D(filter_shape=(5,5), strides=(1,1), num_filters=16, pad=True),
MaxPooling(filter_shape=(3,3), strides=(3,3)),
Dense(10, activation=log_softmax)
])
z = model(features)
As we have created the structure of the network, it’s time to train the network. But before starting the training of our network, we need to set up minibatch sources, because training a NN that works with images requires more memory, than most computers have.
创建了网络结构后,就该对网络进行培训了。 但是在开始训练我们的网络之前,我们需要设置微型批处理源,因为训练与图像配合使用的NN需要比大多数计算机更多的内存。
We have already created minibatch sources in previous sections. Following is the Python code to set up two minibatch sources −
在前面的部分中,我们已经创建了微型批处理源。 以下是设置两个微型批处理源的Python代码-
As we have the create_datasource function, we can now create two separate data sources (training and testing one) to train the model.
有了create_datasource函数后,我们现在可以创建两个单独的数据源(训练和测试一个)来训练模型。
train_datasource = create_datasource('mnist_train')
test_datasource = create_datasource('mnist_test', max_sweeps=1, train=False)
Now, as we have prepared the images, we can start training of our NN. As we did in previous sections, we can use the train method on the loss function to kick off the training. Following is the code for this −
现在,当我们准备好图像后,就可以开始训练我们的神经网络了。 和前面的部分一样,我们可以对损失函数使用训练方法来开始训练。 以下是此代码-
from cntk import Function
from cntk.losses import cross_entropy_with_softmax
from cntk.metrics import classification_error
from cntk.learners import sgd
@Function
def criterion_factory(output, targets):
loss = cross_entropy_with_softmax(output, targets)
metric = classification_error(output, targets)
return loss, metric
loss = criterion_factory(z, labels)
learner = sgd(z.parameters, lr=0.2)
With the help of previous code, we have setup the loss and learner for the NN. The following code will train and validate the NN−
借助先前的代码,我们为NN设置了损失和学习器。 以下代码将训练并验证NN-
from cntk.logging import ProgressPrinter
from cntk.train import TestConfig
progress_writer = ProgressPrinter(0)
test_config = TestConfig(test_datasource)
input_map = {
features: train_datasource.streams.features,
labels: train_datasource.streams.labels
}
loss.train(train_datasource,
max_epochs=10,
minibatch_size=64,
epoch_size=60000,
parameter_learners=[learner],
model_inputs_to_streams=input_map,
callbacks=[progress_writer, test_config])
from cntk.layers import Convolution2D, Sequential, Dense, MaxPooling
from cntk.ops import log_softmax, relu
from cntk.initializer import glorot_uniform
from cntk import input_variable, default_options
features = input_variable((3,28,28))
labels = input_variable(10)
with default_options(initialization=glorot_uniform, activation=relu):
model = Sequential([
Convolution2D(filter_shape=(5,5), strides=(1,1), num_filters=8, pad=True),
MaxPooling(filter_shape=(2,2), strides=(2,2)),
Convolution2D(filter_shape=(5,5), strides=(1,1), num_filters=16, pad=True),
MaxPooling(filter_shape=(3,3), strides=(3,3)),
Dense(10, activation=log_softmax)
])
z = model(features)
import os
from cntk.io import MinibatchSource, StreamDef, StreamDefs, ImageDeserializer, INFINITELY_REPEAT
import cntk.io.transforms as xforms
def create_datasource(folder, train=True, max_sweeps=INFINITELY_REPEAT):
mapping_file = os.path.join(folder, 'mapping.bin')
image_transforms = []
if train:
image_transforms += [
xforms.crop(crop_type='randomside', side_ratio=0.8),
xforms.scale(width=28, height=28, channels=3, interpolations='linear')
]
stream_definitions = StreamDefs(
features=StreamDef(field='image', transforms=image_transforms),
labels=StreamDef(field='label', shape=10)
)
deserializer = ImageDeserializer(mapping_file, stream_definitions)
return MinibatchSource(deserializer, max_sweeps=max_sweeps)
train_datasource = create_datasource('mnist_train')
test_datasource = create_datasource('mnist_test', max_sweeps=1, train=False)
from cntk import Function
from cntk.losses import cross_entropy_with_softmax
from cntk.metrics import classification_error
from cntk.learners import sgd
@Function
def criterion_factory(output, targets):
loss = cross_entropy_with_softmax(output, targets)
metric = classification_error(output, targets)
return loss, metric
loss = criterion_factory(z, labels)
learner = sgd(z.parameters, lr=0.2)
from cntk.logging import ProgressPrinter
from cntk.train import TestConfig
progress_writer = ProgressPrinter(0)
test_config = TestConfig(test_datasource)
input_map = {
features: train_datasource.streams.features,
labels: train_datasource.streams.labels
}
loss.train(train_datasource,
max_epochs=10,
minibatch_size=64,
epoch_size=60000,
parameter_learners=[learner],
model_inputs_to_streams=input_map,
callbacks=[progress_writer, test_config])
-------------------------------------------------------------------
average since average since examples
loss last metric last
------------------------------------------------------
Learning rate per minibatch: 0.2
142 142 0.922 0.922 64
1.35e+06 1.51e+07 0.896 0.883 192
[………]
As we have seen, it’s difficult to train NN used for image recognition and, they require a lot of data to train also. One more issue is that, they tend to overfit on images used during training. Let us see with an example, when we have photos of faces in an upright position, our model will have a hard time recognizing faces that are rotated in another direction.
如我们所见,训练用于图像识别的神经网络非常困难,而且它们还需要大量数据来训练。 另一个问题是,它们倾向于过度拟合训练期间使用的图像。 让我们来看一个例子,当我们以直立的姿势拍摄脸部照片时,我们的模型将很难识别沿另一个方向旋转的脸部。
In order to overcome such problem, we can use image augmentation and CNTK supports specific transforms, when creating minibatch sources for images. We can use several transformations as follows−
为了克服此类问题,我们可以在创建图像的小批量来源时使用图像增强功能,而CNTK支持特定的转换。 我们可以使用以下几种转换:
We can randomly crop images used for training with just a few lines of code.
我们只需几行代码就可以随机裁剪用于训练的图像。
We can use a scale and color also.
我们也可以使用比例尺和颜色。
Let’s see with the help of following Python code, how we can change the list of transformations by including a cropping transformation within the function used to create the minibatch source earlier.
让我们在下面的Python代码的帮助下看一下,如何通过在用于早期创建minibatch源的函数中包括裁剪转换来更改转换列表。
import os
from cntk.io import MinibatchSource, StreamDef, StreamDefs, ImageDeserializer, INFINITELY_REPEAT
import cntk.io.transforms as xforms
def create_datasource(folder, train=True, max_sweeps=INFINITELY_REPEAT):
mapping_file = os.path.join(folder, 'mapping.bin')
image_transforms = []
if train:
image_transforms += [
xforms.crop(crop_type='randomside', side_ratio=0.8),
xforms.scale(width=28, height=28, channels=3, interpolations='linear')
]
stream_definitions = StreamDefs(
features=StreamDef(field='image', transforms=image_transforms),
labels=StreamDef(field='label', shape=10)
)
deserializer = ImageDeserializer(mapping_file, stream_definitions)
return MinibatchSource(deserializer, max_sweeps=max_sweeps)
With the help of above code, we can enhance the function to include a set of image transforms, so that, when we will be training we can randomly crop the image, so we get more variations of the image.
在上述代码的帮助下,我们可以增强该功能以包括一组图像变换,以便在进行训练时可以随机裁剪图像,从而获得更多的图像变化。
Now, let us understand how to construct a Recurrent Neural Network (RNN) in CNTK.
现在,让我们了解如何在CNTK中构建递归神经网络(RNN)。
We learned how to classify images with a neural network, and it is one of the iconic jobs in deep learning. But, another area where neural network excels at and lot of research happening is Recurrent Neural Networks (RNN). Here, we are going to know what RNN is and how it can be used in scenarios where we need to deal with time-series data.
我们学习了如何使用神经网络对图像进行分类,这是深度学习中的标志性工作之一。 但是,神经网络擅长和研究大量的另一个领域是递归神经网络(RNN)。 在这里,我们将了解什么是RNN,以及在需要处理时间序列数据的场景中如何使用RNN。
Recurrent neural networks (RNNs) may be defined as the special breed of NNs that are capable of reasoning over time. RNNs are mainly used in scenarios, where we need to deal with values that change over time, i.e. time-series data. In order to understand it in a better way, let’s have a small comparison between regular neural networks and recurrent neural networks −
递归神经网络(RNN)可以定义为能够随时间进行推理的特殊类型的NN。 RNN主要用于需要处理随时间变化的值(即时间序列数据)的场景。 为了更好地理解它,让我们对常规神经网络和递归神经网络进行一下比较-
As we know that, in a regular neural network, we can provide only one input. This limits it to results in only one prediction. To give you an example, we can do translating text job by using regular neural networks.
众所周知,在常规神经网络中,我们只能提供一个输入。 这将其限制为仅导致一个预测。 举个例子,我们可以使用常规的神经网络来翻译文本。
On the other hand, in recurrent neural networks, we can provide a sequence of samples that result in a single prediction. In other words, using RNNs we can predict an output sequence based on an input sequence. For example, there have been quite a few successful experiments with RNN in translation tasks.
另一方面,在递归神经网络中,我们可以提供导致单个预测的一系列样本。 换句话说,使用RNN,我们可以基于输入序列来预测输出序列。 例如,在翻译任务中已经有许多成功的RNN实验。
RNNs can be used in several ways. Some of them are as follows −
RNN可以以多种方式使用。 其中一些如下-
Before getting deep dive into the steps, that how RNN can predict a single output based on a sequence, let’s see how a basic RNN looks like−
在深入研究步骤之前,RNN如何基于序列预测单个输出,让我们看一下基本RNN的样子-
As we can in the above diagram, RNN contains a loopback connection to the input and whenever, we feed a sequence of values it will process each element in the sequence as time steps.
如上图所示,RNN包含到输入的回送连接,并且每当我们输入一个值序列时,它将作为时间步长处理序列中的每个元素。
Moreover, because of the loopback connection, RNN can combine the generated output with input for the next element in the sequence. In this way, RNN will build a memory over the whole sequence which can be used to make a prediction.
此外,由于具有环回连接,RNN可以将生成的输出与序列中下一个元素的输入进行组合。 这样,RNN将在整个序列上建立一个可用于进行预测的内存。
In order to make prediction with RNN, we can perform the following steps−
为了使用RNN进行预测,我们可以执行以下步骤-
First, to create an initial hidden state, we need to feed the first element of the input sequence.
首先,要创建初始隐藏状态,我们需要输入输入序列的第一个元素。
After that, to produce an updated hidden state, we need to take the initial hidden state and combine it with the second element in the input sequence.
之后,要生成更新的隐藏状态,我们需要采用初始隐藏状态并将其与输入序列中的第二个元素组合。
At last, to produce the final hidden state and to predict the output for the RNN, we need to take the final element in the input sequence.
最后,要生成最终的隐藏状态并预测RNN的输出,我们需要在输入序列中使用final元素。
In this way, with the help of this loopback connection we can teach a RNN to recognize patterns that happen over time.
这样,借助此环回连接,我们可以教导RNN识别随时间发生的模式。
The basic model, discussed above, of RNN can be extended to other use cases as well. For example, we can use it to predict a sequence of values based on a single input. In this scenario, order to make prediction with RNN we can perform the following steps −
上面讨论的RNN的基本模型也可以扩展到其他用例。 例如,我们可以使用它来基于单个输入来预测值序列。 在这种情况下,为了使用RNN进行预测,我们可以执行以下步骤-
First, to create an initial hidden state and predict the first element in the output sequence, we need to feed an input sample into the neural network.
首先,要创建初始隐藏状态并预测输出序列中的第一个元素,我们需要将输入样本馈入神经网络。
After that, to produce an updated hidden state and the second element in the output sequence, we need to combine the initial hidden state with the same sample.
之后,要生成更新的隐藏状态和输出序列中的第二个元素,我们需要将初始隐藏状态与相同的样本进行组合。
At last, to update the hidden state one more time and predict the final element in output sequence, we feed the sample another time.
最后,要再更新一次隐藏状态并预测输出序列中的最后一个元素,我们需要再一次提供样本。
As we have seen how to predict a single value based on a sequence and how to predict a sequence based on a single value. Now let’s see how we can predict sequences for sequences. In this scenario, order to make prediction with RNN we can perform the following steps −
如我们所见,如何基于序列预测单个值以及如何基于单个值预测序列。 现在让我们看看如何预测序列的序列。 在这种情况下,为了使用RNN进行预测,我们可以执行以下步骤-
First, to create an initial hidden state and predict the first element in the output sequence, we need to take the first element in the input sequence.
首先,要创建初始隐藏状态并预测输出序列中的第一个元素,我们需要获取输入序列中的第一个元素。
After that, to update the hidden state and predict the second element in the output sequence, we need to take the initial hidden state.
之后,要更新隐藏状态并预测输出序列中的第二个元素,我们需要采用初始隐藏状态。
At last, to predict the final element in the output sequence, we need to take the updated hidden state and the final element in the input sequence.
最后,要预测输出序列中的最后一个元素,我们需要获取更新的隐藏状态和输入序列中的最后一个元素。
To understand the working of recurrent neural networks (RNNs) we need to first understand how recurrent layers in the network work. So first let’s discuss how e can predict the output with a standard recurrent layer.
为了了解递归神经网络(RNN)的工作,我们需要首先了解网络中递归层的工作方式。 因此,首先让我们讨论e如何通过标准循环层来预测输出。
As we discussed earlier also that a basic layer in RNN is quite different from a regular layer in a neural network. In previous section, we also demonstrated in the diagram the basic architecture of RNN. In order to update the hidden state for the first-time step-in sequence we can use the following formula −
如前所述,RNN中的基本层与神经网络中的常规层完全不同。 在上一节中,我们还在图中演示了RNN的基本体系结构。 为了更新首次进入序列的隐藏状态,我们可以使用以下公式-
In the above equation, we calculate the new hidden state by calculating the dot product between the initial hidden state and a set of weights.
在上式中,我们通过计算初始隐藏状态和一组权重之间的点积来计算新的隐藏状态。
Now for the next step, the hidden state for the current time step is used as the initial hidden state for the next time step in the sequence. That’s why, to update the hidden state for the second time step, we can repeat the calculations performed in the first-time step as follows −
现在,对于下一步,将当前时间步的隐藏状态用作序列中下一时间步的初始隐藏状态。 这就是为什么要更新第二步的隐藏状态,我们可以重复在第一步中执行的计算,如下所示:
Next, we can repeat the process of updating the hidden state for the third and final step in the sequence as below −
接下来,我们可以按照以下顺序重复更新第三步和最后一步的隐藏状态的过程:
And when we have processed all the above steps in the sequence, we can calculate the output as follows −
当我们按顺序处理了所有上述步骤后,我们可以计算出如下输出:
For the above formula, we have used a third set of weights and the hidden state from the final time step.
对于上面的公式,我们使用了第三组权重和最后时间步骤中的隐藏状态。
The main issue with basic recurrent layer is of vanishing gradient problem and due to this it is not very good at learning long-term correlations. In simple words basic recurrent layer does not handle long sequences very well. That’s the reason some other recurrent layer types that are much more suited for working with longer sequences are as follows −
基本循环层的主要问题是消失的梯度问题,因此,它不是很擅长学习长期相关性。 用简单的话来说,基本的循环层不能很好地处理长序列。 这就是为什么其他一些更适合于较长序列的循环图层类型的原因如下-
Long-short term memory (LSTMs) networks were introduced by Hochreiter & Schmidhuber. It solved the problem of getting a basic recurrent layer to remember things for a long time. The architecture of LSTM is given above in the diagram. As we can see it has input neurons, memory cells, and output neurons. In order to combat the vanishing gradient problem, Long-short term memory networks use an explicit memory cell (stores the previous values) and the following gates −
Hochreiter&Schmidhuber引入了长期短期记忆(LSTM)网络。 它解决了使基本的循环层能够长时间记住事物的问题。 LSTM的体系结构如上图所示。 如我们所见,它具有输入神经元,记忆细胞和输出神经元。 为了解决梯度消失的问题,长期短期存储网络使用显式存储单元(存储先前的值)和随后的门-
Forget gate− As name implies, it tells the memory cell to forget the previous values. The memory cell stores the values until the gate i.e. ‘forget gate’ tells it to forget them.
忘记门 -顾名思义,它告诉存储单元忘记先前的值。 存储单元存储这些值,直到门(即“忘记门”)告诉它忘记它们为止。
Input gate− As name implies, it adds new stuff to the cell.
输入门 -顾名思义,它为单元添加了新内容。
Output gate− As name implies, output gate decides when to pass along the vectors from the cell to the next hidden state.
输出门 -顾名思义,输出门决定何时将矢量从单元传递到下一个隐藏状态。
Gradient recurrent units (GRUs) is a slight variation of LSTMs network. It has one less gate and are wired slightly different than LSTMs. Its architecture is shown in the above diagram. It has input neurons, gated memory cells, and output neurons. Gated Recurrent Units network has the following two gates −
梯度递归单位 (GRU)是LSTM网络的细微变化。 它的门少了一个,并且接线方式与LSTM略有不同。 上图显示了它的体系结构。 它具有输入神经元,门控存储单元和输出神经元。 门控循环单元网络具有以下两个门-
Update gate− It determines the following two things−
更新门 -它确定以下两件事-
What amount of the information should be kept from the last state?
上次状态应保留多少信息?
What amount of the information should be let in from the previous layer?
上一层应提供多少信息?
Reset gate− The functionality of reset gate is much like that of forget gate of LSTMs network. The only difference is that it is located slightly differently.
重置门 - 重置门的功能与LSTMs网络的忘记门非常相似。 唯一的区别是它的位置略有不同。
In contrast to Long-short term memory network, Gated Recurrent Unit networks are slightly faster and easier to run.
与长期短期存储网络相比,门控循环单元网络稍快且易于运行。
Before we can start, making prediction about the output from any of our data source, we need to first construct RNN and constructing RNN is quite same as we had build regular neural network in previous section. Following is the code to build one−
在开始对任何数据源的输出进行预测之前,我们需要首先构建RNN,并且构建RNN与在上一节中构建常规神经网络的过程完全相同。 以下是构建一个的代码-
from cntk.losses import squared_error
from cntk.io import CTFDeserializer, MinibatchSource, INFINITELY_REPEAT, StreamDefs, StreamDef
from cntk.learners import adam
from cntk.logging import ProgressPrinter
from cntk.train import TestConfig
BATCH_SIZE = 14 * 10
EPOCH_SIZE = 12434
EPOCHS = 10
We can also stack multiple recurrent layers in CNTK. For example, we can use the following combination of layers−
我们还可以在CNTK中堆叠多个循环层。 例如,我们可以使用以下图层组合:
from cntk import sequence, default_options, input_variable
from cntk.layers import Recurrence, LSTM, Dropout, Dense, Sequential, Fold
features = sequence.input_variable(1)
with default_options(initial_state = 0.1):
model = Sequential([
Fold(LSTM(15)),
Dense(1)
])(features)
target = input_variable(1, dynamic_axes=model.dynamic_axes)
As we can see in the above code, we have the following two ways in which we can model RNN in CNTK −
从上面的代码中可以看到,我们可以通过以下两种方式在CNTK中对RNN进行建模-
First, if we only want the final output of a recurrent layer, we can use the Fold layer in combination with a recurrent layer, such as GRU, LSTM, or even RNNStep.
首先,如果只需要循环层的最终输出,则可以将折叠层与循环层结合使用,例如GRU,LSTM甚至RNNStep。
Second, as an alternative way, we can also use the Recurrence block.
其次,作为一种替代方法,我们也可以使用Recurrence块。
Once we build the model, let’s see how we can train RNN in CNTK −
构建模型后,让我们看看如何在CNTK中训练RNN-
from cntk import Function
@Function
def criterion_factory(z, t):
loss = squared_error(z, t)
metric = squared_error(z, t)
return loss, metric
loss = criterion_factory(model, target)
learner = adam(model.parameters, lr=0.005, momentum=0.9)
Now to load the data into the training process, we must have to deserialize sequences from a set of CTF files. Following code have the create_datasource function, which is a useful utility function to create both the training and test datasource.
现在,要将数据加载到训练过程中,我们必须从一组CTF文件中反序列化序列。 以下代码具有create_datasource函数,该函数是用于创建训练和测试数据源的有用实用程序函数。
target_stream = StreamDef(field='target', shape=1, is_sparse=False)
features_stream = StreamDef(field='features', shape=1, is_sparse=False)
deserializer = CTFDeserializer(filename, StreamDefs(features=features_stream, target=target_stream))
datasource = MinibatchSource(deserializer, randomize=True, max_sweeps=sweeps)
return datasource
train_datasource = create_datasource('Training data filename.ctf')#we need to provide the location of training file we created from our dataset.
test_datasource = create_datasource('Test filename.ctf', sweeps=1) #we need to provide the location of testing file we created from our dataset.
Now, as we have setup the data sources, model and the loss function, we can start the training process. It is quite similar as we did in previous sections with basic neural networks.
现在,由于我们已经设置了数据源,模型和损失函数,因此可以开始训练过程。 就像我们在上一节中使用基本神经网络所做的那样,它非常相似。
progress_writer = ProgressPrinter(0)
test_config = TestConfig(test_datasource)
input_map = {
features: train_datasource.streams.features,
target: train_datasource.streams.target
}
history = loss.train(
train_datasource,
epoch_size=EPOCH_SIZE,
parameter_learners=[learner],
model_inputs_to_streams=input_map,
callbacks=[progress_writer, test_config],
minibatch_size=BATCH_SIZE,
max_epochs=EPOCHS
)
We will get the output similar as follows −
我们将获得类似以下的输出-
average since average since examples
loss last metric last
------------------------------------------------------
Learning rate per minibatch: 0.005
0.4 0.4 0.4 0.4 19
0.4 0.4 0.4 0.4 59
0.452 0.495 0.452 0.495 129
[…]
Actually redicting with a RNN is quite similar to making predictions with any other CNK model. The only difference is that, we need to provide sequences rather than single samples.
实际上,使用RNN进行预测与使用任何其他CNK模型进行预测非常相似。 唯一的区别是,我们需要提供序列而不是单个样本。
Now, as our RNN is finally done with training, we can validate the model by testing it using a few samples sequence as follows −
现在,由于我们的RNN终于经过训练完成,我们可以通过使用一些样本序列测试模型来验证模型,如下所示-
import pickle
with open('test_samples.pkl', 'rb') as test_file:
test_samples = pickle.load(test_file)
model(test_samples) * NORMALIZE
array([[ 8081.7905],
[16597.693 ],
[13335.17 ],
...,
[11275.804 ],
[15621.697 ],
[16875.555 ]], dtype=float32)
翻译自: https://www.tutorialspoint.com/microsoft_cognitive_toolkit/microsoft_cognitive_toolkit_quick_guide.htm