cntk数据集_CNTK-内存中和大数据集

cntk数据集_CNTK-内存中和大数据集_第1张图片

cntk数据集

CNTK-内存中和大数据集 (CNTK - In-Memory and Large Datasets)

In this chapter, we will learn about how to work with the in-memory and large datasets in CNTK.

在本章中,我们将学习如何使用CNTK中的内存和大型数据集。

使用内存较小的数据集进行训练 (Training with small in memory datasets)

When we talk about feeding data into CNTK trainer, there can be many ways, but it will depend upon the size of the dataset and format of the data. The data sets can be small in-memory or large datasets.

当我们谈论将数据馈入CNTK培训器时,可以有很多方法,但这取决于数据集的大小和数据的格式。 数据集可以是小型内存数据集,也可以是大型数据集。

In this section, we are going to work with in-memory datasets. For this, we will use the following two frameworks −

在本节中,我们将使用内存数据集。 为此,我们将使用以下两个框架-

  • Numpy

    脾气暴躁的
  • Pandas

    大熊猫

使用Numpy数组 (Using Numpy arrays)

Here, we will work with a numpy based randomly generated dataset in CNTK. In this example, we are going to simulate data for a binary classification problem. Suppose, we have a set of observations with 4 features and want to predict two possible labels with our deep learning model.

在这里,我们将使用CNTK中基于numpy的随机生成的数据集。 在此示例中,我们将模拟二进制分类问题的数据。 假设我们有一组具有4个特征的观察结果,并希望使用我们的深度学习模型预测两个可能的标签。

实施实例 (Implementation Example)

For this, first we must generate a set of labels containing a one-hot vector representation of the labels, we want to predict. It can be done with the help of following steps −

为此,首先我们必须生成一组标签,其中包含我们要预测的标签的单热矢量表示。 可以通过以下步骤完成-

Step 1 − Import the numpy package as follows −

步骤1-如下导入numpy包-


import numpy as np
num_samples = 20000

Step 2 − Next, generate a label mapping by using np.eye function as follows −

步骤2-接下来,使用np.eye函数生成标签映射,如下所示-


label_mapping = np.eye(2)

Step 3 − Now by using np.random.choice function, collect the 20000 random samples as follows −

步骤3-现在通过使用np.random.choice函数,如下收集20000个随机样本-


y = label_mapping[np.random.choice(2,num_samples)].astype(np.float32)

Step 4 − Now at last by using np.random.random function, generate an array of random floating point values as follows −

步骤4-现在最后通过使用np.random.random函数,生成一个随机浮点值数组,如下所示:


x = np.random.random(size=(num_samples, 4)).astype(np.float32)

Once, we generate an array of random floating-point values, we need to convert them to 32-bit floating point numbers so that it can be matched to the format expected by CNTK. Let’s follow the steps below to do this −

一次,我们生成了一个随机浮点值数组,我们需要将它们转换为32位浮点数,以便它可以与CNTK期望的格式匹配。 让我们按照以下步骤进行操作-

Step 5 − Import the Dense and Sequential layer functions from cntk.layers module as follows −

步骤5-从cntk.layers模块导入密集和顺序图层功能,如下所示-


from cntk.layers import Dense, Sequential

Step 6 − Now, we need to import the activation function for the layers in the network. Let us import the sigmoid as activation function −

步骤6-现在,我们需要为网络中的层导入激活功能。 让我们导入Sigmoid作为激活函数-


from cntk import input_variable, default_options
from cntk.ops import sigmoid

Step 7 − Now, we need to import the loss function to train the network. Let us import binary_cross_entropy as loss function −

步骤7-现在,我们需要导入损失功能来训练网络。 让我们导入binary_cross_entropy作为损失函数-


from cntk.losses import binary_cross_entropy

Step 8 − Next, we need to define the default options for the network. Here, we will be providing the sigmoid activation function as a default setting. Also, create the model by using Sequential layer function as follows −

步骤8-接下来,我们需要定义网络的默认选项。 在这里,我们将提供S型激活功能作为默认设置。 另外,通过使用顺序图层函数创建模型,如下所示:


with default_options(activation=sigmoid):
model = Sequential([Dense(6),Dense(2)])

Step 9 − Next, initialise an input_variable with 4 input features serving as the input for the network.

步骤9-接下来,使用4个输入要素 (用作网络的输入)初始化input_variable


features = input_variable(4)

Step 10 − Now, in order to complete it, we need to connect features variable to the NN.

步骤10-现在,为了完成它,我们需要将要素变量连接到NN。


z = model(features)

So, now we have a NN, with the help of following steps, let us train it using in-memory dataset −

所以,现在我们有了一个NN,在以下步骤的帮助下,让我们使用内存数据集训练它-

Step 11 − To train this NN, first we need to import learner from cntk.learners module. We will import sgd learner as follows −

步骤11-要训​​练该NN,首先我们需要从cntk.learners模块导入学习者。 我们将如下导入sgd学习器-


from cntk.learners import sgd

Step 12 − Along with that import the ProgressPrinter from cntk.logging module as well.

步骤12-以及从cntk.logging模块导入ProgressPrinter的 步骤


from cntk.logging import ProgressPrinter
progress_writer = ProgressPrinter(0)

Step 13 − Next, define a new input variable for the labels as follows −

步骤13-接下来,为标签定义一个新的输入变量,如下所示-


labels = input_variable(2)

Step 14 − In order to train the NN model, next, we need to define a loss using the binary_cross_entropy function. Also, provide the model z and the labels variable.

步骤14-为了训练NN模型,接下来,我们需要使用binary_cross_entropy函数定义一个损失。 另外,提供模型z和标签变量。


loss = binary_cross_entropy(z, labels)

Step 15 − Next, initialize the sgd learner as follows −

步骤15-接下来,如下初始化sgd学习器-


learner = sgd(z.parameters, lr=0.1)

Step 16 − At last, call the train method on the loss function. Also, provide it with the input data, the sgd learner and the progress_printer.−

步骤16-最后,在损失函数上调用train方法。 另外,向其提供输入数据, sgd学习器和progress_printer。


training_summary=loss.train((x,y),parameter_learners=[learner],callbacks=[progress_writer])

完整的实施示例 (Complete implementation example)


import numpy as np
num_samples = 20000
label_mapping = np.eye(2)
y = label_mapping[np.random.choice(2,num_samples)].astype(np.float32)
x = np.random.random(size=(num_samples, 4)).astype(np.float32)
from cntk.layers import Dense, Sequential
from cntk import input_variable, default_options
from cntk.ops import sigmoid
from cntk.losses import binary_cross_entropy
with default_options(activation=sigmoid):
   model = Sequential([Dense(6),Dense(2)])
features = input_variable(4)
z = model(features)
from cntk.learners import sgd
from cntk.logging import ProgressPrinter
progress_writer = ProgressPrinter(0)
labels = input_variable(2)
loss = binary_cross_entropy(z, labels)
learner = sgd(z.parameters, lr=0.1)
training_summary=loss.train((x,y),parameter_learners=[learner],callbacks=[progress_writer])

输出量 (Output)


Build info:
     Built time: *** ** **** 21:40:10
     Last modified date: *** *** ** 21:08:46 2019
     Build type: Release
     Build target: CPU-only
     With ASGD: yes
     Math lib: mkl
     Build Branch: HEAD
     Build SHA1:ae9c9c7c5f9e6072cc9c94c254f816dbdc1c5be6 (modified)
     MPI distribution: Microsoft MPI
     MPI version: 7.0.12437.6
-------------------------------------------------------------------
average   since   average   since examples
loss      last    metric    last
------------------------------------------------------
Learning rate per minibatch: 0.1
1.52      1.52      0         0     32
1.51      1.51      0         0     96
1.48      1.46      0         0    224
1.45      1.42      0         0    480
1.42       1.4      0         0    992
1.41      1.39      0         0   2016
1.4       1.39      0         0   4064
1.39      1.39      0         0   8160
1.39      1.39      0         0  16352

使用熊猫数据框 (Using Pandas DataFrames)

Numpy arrays are very limited in what they can contain and one of the most basic ways of storing data. For example, a single n-dimensional array can contain data of a single data type. But on the other hand, for many real-world cases we need a library that can handle more than one data type in a single dataset.

Numpy数组可以包含的内容以及存储数据的最基本方法之一非常有限。 例如,单个n维数组可以包含单个数据类型的数据。 但另一方面,对于许多实际情况,我们需要一个可处理单个数据集中多个数据类型的库。

One of the Python libraries called Pandas makes it easier to work with such kind of datasets. It introduces the concept of a DataFrame (DF) and allows us to load datasets from disk stored in various formats as DFs. For example, we can read DFs stored as CSV, JSON, Excel, etc.

一个名为Pandas的Python库使使用这种数据集变得更加容易。 它介绍了DataFrame(DF)的概念,并允许我们从以DF格式存储的磁盘中加载数据集。 例如,我们可以读取以CSV,JSON,Excel等格式存储的DF。

You can learn Python Pandas library in more detail at https://www.tutorialspoint.com/python_pandas/index.htm.

您可以在https://www.tutorialspoint.com/python_pandas/index.htm上详细了解Python Pandas库。

实施实例 (Implementation Example)

In this example, we are going to use the example of classifying three possible species of the iris flowers based on four properties. We have created this deep learning model in the previous sections too. The model is as follows −

在此示例中,我们将使用基于四个属性对鸢尾花的三种可能物种进行分类的示例。 我们也在之前的部分中创建了这种深度学习模型。 模型如下-


from cntk.layers import Dense, Sequential
from cntk import input_variable, default_options
from cntk.ops import sigmoid, log_softmax
from cntk.losses import binary_cross_entropy
model = Sequential([
Dense(4, activation=sigmoid),
Dense(3, activation=log_softmax)
])
features = input_variable(4)
z = model(features)

The above model contains one hidden layer and an output layer with three neurons to match the number of classes we can predict.

上面的模型包含一个隐藏层和一个包含三个神经元的输出层,以匹配我们可以预测的类数。

Next, we will use the train method and loss function to train the network. For this, first we must load and preprocess the iris dataset, so that it matches the expected layout and data format for the NN. It can be done with the help of following steps −

接下来,我们将使用训练方法和损失函数来训练网络。 为此,首先我们必须加载和预处理虹膜数据集,以使其与NN的预期布局和数据格式匹配。 可以通过以下步骤完成-

Step 1 − Import the numpy and Pandas package as follows −

步骤1-如下导入numpyPandas包-


import numpy as np
import pandas as pd

Step 2 − Next, use the read_csv function to load the dataset into memory −

步骤2-接下来,使用read_csv函数将数据集加载到内存中-


df_source = pd.read_csv(‘iris.csv’, names = [‘sepal_length’, ‘sepal_width’,
 ‘petal_length’, ‘petal_width’, ‘species’], index_col=False)

Step 3 − Now, we need to create a dictionary that will be mapping the labels in the dataset with their corresponding numeric representation.

步骤3-现在,我们需要创建一个字典,该字典将映射数据集中的标签及其对应的数字表示形式。


label_mapping = {‘Iris-Setosa’ : 0, ‘Iris-Versicolor’ : 1, ‘Iris-Virginica’ : 2}

Step 4 − Now, by using iloc indexer on the DataFrame, select the first four columns as follows −

步骤4-现在,通过在DataFrame上使用iloc索引器,如下选择前四列:


x = df_source.iloc[:, :4].values

Step 5 −Next, we need to select the species columns as the labels for the dataset. It can be done as follows −

步骤5-接下来,我们需要选择种类列作为数据集的标签。 它可以做到如下-


y = df_source[‘species’].values

Step 6 − Now, we need to map the labels in the dataset, which can be done by using label_mapping. Also, use one_hot encoding to convert them into one-hot encoding arrays.

步骤6-现在,我们需要在数据集中映射标签,这可以通过使用label_mapping来完成。 另外,使用one_hot编码将它们转换为one-hot编码数组。


y = np.array([one_hot(label_mapping[v], 3) for v in y])

Step 7 − Next, to use the features and the mapped labels with CNTK, we need to convert them both to floats −

步骤7-接下来,要将特征和映射标签与CNTK一起使用,我们需要将它们都转换为浮点数-


x= x.astype(np.float32)
y= y.astype(np.float32)

As we know that, the labels are stored in the dataset as strings and CNTK cannot work with these strings. That’s the reason, it needs one-hot encoded vectors representing the labels. For this, we can define a function say one_hot as follows −

众所周知,标签以字符串形式存储在数据集中,而CNTK无法使用这些字符串。 这就是原因,它需要表示标签的一键编码矢量。 为此,我们可以定义一个函数one_hot ,如下所示:


def one_hot(index, length):
result = np.zeros(length)
result[index] = index
return result

Now, we have the numpy array in the correct format, with the help of following steps we can use them to train our model −

现在,我们以正确的格式设置了numpy数组,在以下步骤的帮助下,我们可以使用它们来训练我们的模型-

Step 8 − First, we need to import the loss function to train the network. Let us import binary_cross_entropy_with_softmax as loss function −

步骤8-首先,我们需要导入损失功能来训练网络。 让我们导入binary_cross_entropy_with_softmax作为损失函数-


from cntk.losses import binary_cross_entropy_with_softmax

Step 9 − To train this NN, we also need to import learner from cntk.learners module. We will import sgd learner as follows −

步骤9-要训​​练该NN,我们还需要从cntk.learners模块导入学习者。 我们将如下导入sgd学习器-


from cntk.learners import sgd

Step 10 − Along with that import the ProgressPrinter from cntk.logging module as well.

步骤10-以及从cntk.logging模块导入ProgressPrinter的 步骤


from cntk.logging import ProgressPrinter
progress_writer = ProgressPrinter(0)

Step 11 − Next, define a new input variable for the labels as follows −

步骤11-接下来,为标签定义一个新的输入变量,如下所示-


labels = input_variable(3)

Step 12 − In order to train the NN model, next, we need to define a loss using the binary_cross_entropy_with_softmax function. Also provide the model z and the labels variable.

步骤12-为了训练NN模型,接下来,我们需要使用binary_cross_entropy_with_softmax函数定义损失。 还提供模型z和标签变量。


loss = binary_cross_entropy_with_softmax (z, labels)

Step 13 − Next, initialise the sgd learner as follows −

步骤13-接下来,如下初始化sgd学习器-


learner = sgd(z.parameters, 0.1)

Step 14 − At last, call the train method on the loss function. Also, provide it with the input data, the sgd learner and the progress_printer.

步骤14-最后,在损失函数上调用train方法。 另外,向其提供输入数据, sgd学习者和progress_printer


training_summary=loss.train((x,y),parameter_learners=[learner],callbacks=
[progress_writer],minibatch_size=16,max_epochs=5)

完整的实施示例 (Complete implementation example)


from cntk.layers import Dense, Sequential
from cntk import input_variable, default_options
from cntk.ops import sigmoid, log_softmax
from cntk.losses import binary_cross_entropy
model = Sequential([
Dense(4, activation=sigmoid),
Dense(3, activation=log_softmax)
])
features = input_variable(4)
z = model(features)
import numpy as np
import pandas as pd
df_source = pd.read_csv(‘iris.csv’, names = [‘sepal_length’, ‘sepal_width’, ‘petal_length’, ‘petal_width’, ‘species’], index_col=False)
label_mapping = {‘Iris-Setosa’ : 0, ‘Iris-Versicolor’ : 1, ‘Iris-Virginica’ : 2}
x = df_source.iloc[:, :4].values
y = df_source[‘species’].values
y = np.array([one_hot(label_mapping[v], 3) for v in y])
x= x.astype(np.float32)
y= y.astype(np.float32)
def one_hot(index, length):
result = np.zeros(length)
result[index] = index
return result
from cntk.losses import binary_cross_entropy_with_softmax
from cntk.learners import sgd
from cntk.logging import ProgressPrinter
progress_writer = ProgressPrinter(0)
labels = input_variable(3)
loss = binary_cross_entropy_with_softmax (z, labels)
learner = sgd(z.parameters, 0.1)
training_summary=loss.train((x,y),parameter_learners=[learner],callbacks=[progress_writer],minibatch_size=16,max_epochs=5)

输出量 (Output)


Build info:
     Built time: *** ** **** 21:40:10
     Last modified date: *** *** ** 21:08:46 2019
     Build type: Release
     Build target: CPU-only
     With ASGD: yes
     Math lib: mkl
     Build Branch: HEAD
     Build SHA1:ae9c9c7c5f9e6072cc9c94c254f816dbdc1c5be6 (modified)
     MPI distribution: Microsoft MPI
     MPI version: 7.0.12437.6
-------------------------------------------------------------------
average    since    average   since   examples
loss        last     metric   last
------------------------------------------------------
Learning rate per minibatch: 0.1
1.1         1.1        0       0      16
0.835     0.704        0       0      32
1.993      1.11        0       0      48
1.14       1.14        0       0     112
[………]

大型数据集训练 (Training with large datasets)

In the previous section, we worked with small in-memory datasets using Numpy and pandas, but not all datasets are so small. Specially the datasets containing images, videos, sound samples are large. MinibatchSource is a component, that can load data in chunks, provided by CNTK to work with such large datasets. Some of the features of MinibatchSource components are as follows −

在上一节中,我们使用Numpy和pandas处理了较小的内存数据集,但并非所有数据集都这么小。 特别是包含图像,视频,声音样本的数据集很大。 MinibatchSource是一个组件,可以按块加载数据,由CNTK提供,可以处理如此大的数据集。 MinibatchSource组件的一些功能如下-

  • MinibatchSource can prevent NN from overfitting by automatically randomize samples read from the data source.

    MinibatchSource可以通过自动随机化从数据源读取的样本来防止NN过度拟合。

  • It has built-in transformation pipeline which can be used to augment the data.

    它具有内置的转换管道,可用于扩充数据。

  • It loads the data on a background thread separate from the training process.

    它将数据加载到与训练过程分开的后台线程中。

In the following sections, we are going to explore how to use a minibatch source with out-of-memory data to work with large datasets. We will also explore, how we can use it to feed for training a NN.

在以下各节中,我们将探讨如何使用带有内存不足数据的小批量源处理大型数据集。 我们还将探索如何使用它来训练神经网络。

创建MinibatchSource实例 (Creating MinibatchSource instance)

In the previous section, we have used iris flower example and worked with small in-memory dataset using Pandas DataFrames. Here, we will be replacing the code that uses data from a pandas DF with MinibatchSource. First, we need to create an instance of MinibatchSource with the help of following steps −

在上一节中,我们使用了鸢尾花示例,并使用Pandas DataFrames处理了较小的内存数据集。 在这里,我们将使用MinibatchSource替换使用熊猫DF数据的代码。 首先,我们需要在以下步骤的帮助下创建MinibatchSource的实例-

实施实例 (Implementation Example)

Step 1 − First, from cntk.io module import the components for the minibatchsource as follows −

步骤1-首先,从cntk.io模块导入minibatchsource的组件,如下所示-


from cntk.io import StreamDef, StreamDefs, MinibatchSource, CTFDeserializer,
 INFINITY_REPEAT

Step 2 − Now, by using StreamDef class, crate a stream definition for the labels.

步骤2-现在,通过使用StreamDef类,为标签创建流定义。


labels_stream = StreamDef(field=’labels’, shape=3, is_sparse=False)

Step 3 − Next, create to read the features filed from the input file, create another instance of StreamDef as follows.

步骤3-接下来,创建以读取输入文件中归档的功能,如下所示创建另一个StreamDef实例。


feature_stream = StreamDef(field=’features’, shape=4, is_sparse=False)

Step 4 − Now, we need to provide iris.ctf file as input and initialise the deserializer as follows −

步骤4-现在,我们需要提供iris.ctf文件作为输入并按如下方式初始化解串器 -


deserializer = CTFDeserializer(‘iris.ctf’, StreamDefs(labels=
label_stream, features=features_stream)

Step 5 − At last, we need to create instance of minisourceBatch by using deserializer as follows −

步骤5-最后,我们需要使用反序列化器创建minisourceBatch的实例,如下所示-


Minibatch_source = MinibatchSource(deserializer, randomize=True)

创建一个MinibatchSource实例-完整的实现示例 (Creating a MinibatchSource instance - Complete implementation example)


from cntk.io import StreamDef, StreamDefs, MinibatchSource, CTFDeserializer, INFINITY_REPEAT
labels_stream = StreamDef(field=’labels’, shape=3, is_sparse=False)
feature_stream = StreamDef(field=’features’, shape=4, is_sparse=False)
deserializer = CTFDeserializer(‘iris.ctf’, StreamDefs(labels=label_stream, features=features_stream)
Minibatch_source = MinibatchSource(deserializer, randomize=True)

创建MCTF文件 (Creating MCTF file)

As you have seen above, we are taking the data from ‘iris.ctf’ file. It has the file format called CNTK Text Format(CTF). It is mandatory to create a CTF file to get the data for the MinibatchSource instance we created above. Let us see how we can create a CTF file.

如您在上面看到的,我们正在从“ iris.ctf”文件中获取数据。 它具有称为CNTK文本格式(CTF)的文件格式。 必须创建一个CTF文件来获取上面创建的MinibatchSource实例的数据。 让我们看看如何创建CTF文件。

实施实例 (Implementation Example)

Step 1 − First, we need to import the pandas and numpy packages as follows −

步骤1-首先,我们需要如下导入pandas和numpy包-


import pandas as pd
import numpy as np

Step 2 − Next, we need to load our data file, i.e. iris.csv into memory. Then, store it in the df_source variable.

步骤2-接下来,我们需要将数据文件(即iris.csv)加载到内存中。 然后,将其存储在df_source变量中。


df_source = pd.read_csv(‘iris.csv’, names = [‘sepal_length’, ‘sepal_width’, ‘petal_length’, ‘petal_width’, ‘species’], index_col=False)

Step 3 − Now, by using iloc indexer as the features, take the content of the first four columns. Also, use the data from species column as follows −

步骤3-现在,通过使用iloc索引器作为功能,获取前四列的内容。 另外,使用来自物种列的数据,如下所示:


features = df_source.iloc[: , :4].values
labels = df_source[‘species’].values

Step 4 − Next, we need to create a mapping between the label name and its numeric representation. It can be done by creating label_mapping as follows −

步骤4-接下来,我们需要在标签名称与其数字表示形式之间创建映射。 可以通过如下创建label_mapping来完成-


label_mapping = {‘Iris-Setosa’ : 0, ‘Iris-Versicolor’ : 1, ‘Iris-Virginica’ : 2}

Step 5 − Now, convert the labels to a set of one-hot encoded vectors as follows −

步骤5-现在,将标签转换为一组单热编码矢量,如下所示-


labels = [one_hot(label_mapping[v], 3) for v in labels]

Now, as we did before, create a utility function called one_hot to encode the labels. It can be done as follows −

现在,像我们之前所做的那样,创建一个称为one_hot的实用程序函数来对标签进行编码。 它可以做到如下-


def one_hot(index, length):
result = np.zeros(length)
result[index] = 1
return result

As, we have loaded and preprocessed the data, it’s time to store it on disk in the CTF file format. We can do it with the help of following Python code −

由于我们已经加载并预处理了数据,是时候以CTF文件格式将其存储在磁盘上了。 我们可以在以下Python代码的帮助下做到这一点-


With open(‘iris.ctf’, ‘w’) as output_file:
for index in range(0, feature.shape[0]):
feature_values = ‘ ‘.join([str(x) for x in np.nditer(features[index])])
label_values = ‘ ‘.join([str(x) for x in np.nditer(labels[index])])
output_file.write(‘features {} | labels {} \n’.format(feature_values, label_values))

创建MCTF文件-完整的实现示例 (Creating a MCTF file - Complete implementation example)


import pandas as pd
import numpy as np
df_source = pd.read_csv(‘iris.csv’, names = [‘sepal_length’, ‘sepal_width’, ‘petal_length’, ‘petal_width’, ‘species’], index_col=False)
features = df_source.iloc[: , :4].values
labels = df_source[‘species’].values
label_mapping = {‘Iris-Setosa’ : 0, ‘Iris-Versicolor’ : 1, ‘Iris-Virginica’ : 2}
labels = [one_hot(label_mapping[v], 3) for v in labels]
def one_hot(index, length):
result = np.zeros(length)
result[index] = 1
return result
With open(‘iris.ctf’, ‘w’) as output_file:
for index in range(0, feature.shape[0]):
feature_values = ‘ ‘.join([str(x) for x in np.nditer(features[index])])
label_values = ‘ ‘.join([str(x) for x in np.nditer(labels[index])])
output_file.write(‘features {} | labels {} \n’.format(feature_values, label_values))

馈送数据 (Feeding the data)

Once you create MinibatchSource, instance, we need to train it. We can use the same training logic as used when we worked with small in-memory datasets. Here, we will use MinibatchSource instance as the input for the train method on loss function as follows −

创建实例MinibatchSource之后 ,我们需要对其进行培训。 我们可以使用与处理小型内存数据集时相同的训练逻辑。 在这里,我们将使用MinibatchSource实例作为损失函数的train方法的输入,如下所示:

实施实例 (Implementation Example)

Step 1 − In order to log the output of the training session, first import the ProgressPrinter from cntk.logging module as follows −

步骤1-为了记录培训课程的输出,请首先从cntk.logging模块导入ProgressPrinter,如下所示:


from cntk.logging import ProgressPrinter

Step 2 − Next, to set up the training session, import the trainer and training_session from cntk.train module as follows −

步骤2-接下来,要设置培训课程,请从cntk.train模块中导入trainertraining_session ,如下所示:


from cntk.train import Trainer, 

Step 3 − Now, we need to define some set of constants like minibatch_size, samples_per_epoch and num_epochs as follows −

步骤3-现在,我们需要定义一些常量集,例如minibatch_sizesamples_per_epochnum_epochs ,如下所示:


minbatch_size = 16
samples_per_epoch = 150
num_epochs = 30

Step 4 − Next, in order to know CNTK how to read data during training, we need to define a mapping between the input variable for the network and the streams in the minibatch source.

步骤4-接下来,为了知道CNTK如何在训练期间读取数据,我们需要定义网络的输入变量和minibatch源中的流之间的映射。


input_map = {
     features: minibatch.source.streams.features,
     labels: minibatch.source.streams.features
}

Step 5 − Next, to log the output of the training process, initialise the progress_printer variable with a new ProgressPrinter instance as follows −

步骤5-接下来,要记录训练过程的输出,请使用新的ProgressPrinter实例初始化progress_printer变量,如下所示:


progress_writer = ProgressPrinter(0)

Step 6 − At last, we need to invoke the train method on the loss as follows −

步骤6-最后,我们需要对损失调用以下方法:


train_history = loss.train(minibatch_source,
parameter_learners=[learner],
  model_inputs_to_streams=input_map,
callbacks=[progress_writer],
epoch_size=samples_per_epoch,
max_epochs=num_epochs)

馈送数据-完整的实现示例 (Feeding the data - Complete implementation example)


from cntk.logging import ProgressPrinter
from cntk.train import Trainer, training_session
minbatch_size = 16
samples_per_epoch = 150
num_epochs = 30
input_map = {
   features: minibatch.source.streams.features,
   labels: minibatch.source.streams.features
}
progress_writer = ProgressPrinter(0)
train_history = loss.train(minibatch_source,
parameter_learners=[learner],
model_inputs_to_streams=input_map,
callbacks=[progress_writer],
epoch_size=samples_per_epoch,
max_epochs=num_epochs)

输出量 (Output)


-------------------------------------------------------------------
average   since   average   since  examples
loss      last     metric   last
------------------------------------------------------
Learning rate per minibatch: 0.1
1.21      1.21      0        0       32
1.15      0.12      0        0       96
[………]

翻译自: https://www.tutorialspoint.com/microsoft_cognitive_toolkit/microsoft_cognitive_toolkit_in_memory_and_large_datasets.htm

cntk数据集

你可能感兴趣的:(字符串,python,tensorflow,机器学习,深度学习)