使用JavaScript在浏览器中进行图像分类

by Kevin Scott

凯文·斯科特(Kevin Scott)

使用JavaScript在浏览器中进行图像分类 (Image Classification in the Browser with JavaScript)

Machine Learning has a reputation for demanding lots of data and powerful GPU computations. This leads many people to believe that building custom machine learning models for their specific dataset is impractical without a large investment of time and resources. In fact, you can leverage Transfer Learning on the web to train an accurate image classifier in less than a minute with just a few labeled images.

机器学习以要求大量数据和强大的GPU计算而闻名。 这使许多人认为,在不花费大量时间和资源的情况下为他们的特定数据集构建自定义机器学习模型是不切实际的。 实际上,您可以利用网上的Transfer Learning在不到一分钟的时间内仅用几张带标签的图像来训练准确的图像分类器。

图像分类有什么用? (What’s Image Classification Used For?)

Teaching a machine to classify images has a wide range of practical applications. You may have seen image classification at work in your photos app, automatically suggesting friends or locations for tagging. Image Classification can be used to recognize cancer cells, to recognize ships in satelitte imagery, or to automatically classify images on Yelp. It can even be used beyond the realm of images, analyzing heat maps of user activity for potential fraud, or Fourier transforms of audio waves.

教机器对图像进行分类具有广泛的实际应用。 您可能已经在照片应用程序中看到工作中的图像分类,会自动建议朋友或标记位置。 图像分类可以用于识别癌细胞 , 识别卫星图像中的船只或自动对Yelp上的图像进行分类 。 它甚至可以在图像领域之外使用,分析用户活动的热图以发现潜在的欺诈行为,或者对音频波进行傅立叶变换。

I recently released an open source tool to quickly train image classification models in your browser. Here’s how it works:

我最近发布了一个开源工具 ,可以在您的浏览器中快速训练图像分类模型。 运作方式如下:

Embedded here is a live demo of the tool you can use. I’ve put together a dataset for testing here (or feel free to build your own). The dataset has 10 images I downloaded from each of the three most popular searches on pexels.com : Mobile”, “Wood”, and “Notebook”.

此处嵌入了您可以使用的工具的实时演示。 我在这里整理了一个数据集进行测试 (或随意构建自己的数据集 )。 该数据集包含10个图像,这些图像是我从pexels.com上三个最受欢迎的搜索中分别下载的 :“移动”,“木材”和“笔记本”。

Drag the train folder into the drop zone, and once the model is trained, upload the validation folder to see how well your model can classify novel images.

train文件夹拖到放置区域,然后对模型进行训练,然后上传验证文件夹以查看模型对新颖图像的分类程度。

这是如何运作的? (How does this work?)

Transfer Learning is the special sauce that makes it possible to train extremely accurate models in your browser in a fraction of the time. Models are trained on large corpuses of data, and saved as pretrained models. Those pretrained models’ final layers can then be tuned to your specific use case.

转移学习是一种特殊的调味料,使您可以在短时间内在浏览器中训练极其精确的模型。 对模型进行大数据集训练,并保存为预训练模型。 然后,可以将这些经过预训练的模型的最终层调整为您的特定用例。

This works particularly well in the realm of computer vision, because so many features of images are generalizable. Rob Fergus and Matthew Zeiler demonstrate in their paper the features learned at the early stages of their model:

这在计算机视觉领域特别有效,因为图像的许多功能都是可推广的。 Rob Fergus和Matthew Zeiler 在他们的论文中展示了在模型早期阶段学到的功能:

The model is beginning to recognize generic features, including lines, circles, and shapes, that are applicable to any set of images. After a few more layers, it’s able to recognize more complex shapes like edges and words:

该模型开始认识到适用于任何图像集的通用特征,包括线条,圆形和形状。 再经过几层后,便可以识别更复杂的形状,例如边缘和文字:

The vast majority of images share general features such as lines and circles. Many share higher level features, things like an “eye” or a “nose”. This allows you to reuse the existing training that’s already been done, and tune just the last few layers on your specific dataset, which is faster and requires less data than training from scratch.

绝大多数图像具有线条和圆形等一般特征。 许多人共享更高级别的功能,例如“眼睛”或“鼻子”。 这使您可以重用已经完成的现有训练,并仅调整特定数据集的最后几层,与从头开始训练相比,它更快且所需的数据更少。

How much less data? It depends. How different your data is from your pre-trained model, how complex or variable your data is, and other factors can all play into your accuracy. With the example above, I got to 100% accuracy with 30 images. For something like dogs and cats, just a handful of images is enough to get good results. Adrian G has put together a more rigorous analysis on his blog.

少多少数据? 这要看情况 。 数据与预先训练的模型有何不同,数据的复杂性或可变性以及其他因素都会影响您的准确性。 在上面的示例中,我对30张图像的准确性达到了100%。 对于狗和猫之类的东西,仅需少量图像就足以获得良好的效果。 Adrian G在他的博客上进行了更为严格的分析 。

So, it depends on your dataset, but it’s probably less than you think.

因此,这取决于您的数据集,但可能比您想像的要少。

给我看代码! (Show me the Code!)

Next, we’ll look at how to import and tune a pretrained model in JavaScript. We’ll tune MobileNet, a pretrained model produced by Google.

接下来,我们将研究如何在JavaScript中导入和调整预训练的模型。 我们将调整MobileNet ,它是Google制作的经过预训练的模型。

MobileNets are a class of convolutional neural network designed by researches at Google. They are coined “mobile-first” in that they’re architected from the ground up to be resource-friendly and run quickly, right on your phone. — Matt Harvey

MobileNets是由Google的研究设计的一类卷积神经网络。 它们之所以被称为“移动优先”,是因为它们从头开始进行了架构设计,以节省资源并可以在您的手机上快速运行。 — 马特·哈维 ( Matt Harvey)

MobileNet is trained on a huge corpus of images called ImageNet, containing over 14 million labeled images belonging to a 1000 different categories. If you download mobilenet_v1_0.25_224, you'll see a structure of files like:

MobileNet在称为ImageNet的庞大图像库上进行了训练,其中包含1400万个带有标签的图像,这些图像属于1000个不同类别。 如果下载mobilenet_v1_0.25_224 ,则会看到类似以下文件的结构:

mobilenet_v1_0.25_224.ckpt.data-00000-of-00001mobilenet_v1_0.25_224.ckpt.indexmobilenet_v1_0.25_224.ckpt.metamobilenet_v1_0.25_224.tflitemobilenet_v1_0.25_224_eval.pbtxtmobilenet_v1_0.25_224_frozen.pbmobilenet_v1_0.25_224_info.txt

Within mobilenet_v1_0.25_224_eval.pbtxt, note the shape attribute:

mobilenet_v1_0.25_224_eval.pbtxt ,请注意shape属性:

attr {    key: "shape"    value {      shape {        dim {          size: -1        }        dim {          size: 224        }        dim {          size: 224        }        dim {          size: 3        }      }    }  }

This tells us that the first layer of this MobileNet expects to receive a Tensor of Rank 4 with dimensions [any, 224, 224, 3]. (If you're wondering what a Tensor is, check out this article first.)

这告诉我们,此MobileNet的第一层期望接收尺寸为[any, 224, 224, 3] 224,224,3]的等级4的张量。 (如果您想知道张量是什么, 请先阅读本文 。)

导入和设置 (Importing and Setup)

I’ve set up a repo with the necessary packages to get you going. Clone it and follow the readme instructions to install the packages and run it. In index.js, import Tensorflow.js with:

我已经为您准备了必要的软件包回购协议 。 克隆它并按照自述文件说明安装软件包并运行它。 在index.js ,导入Tensorflow.js:

import * as tf from '@tensorflow/tfjs';

Tensorflow.js provides a function to load a pretrained model asynchronously. We’ll use this to load MobileNet:

Tensorflow.js提供了一个异步加载预训练模型的功能。 我们将使用它来加载MobileNet:

function loadMobilenet() {  return tf.loadModel('https://storage.googleapis.com/tfjs-models/tfjs/mobilenet_v1_0.25_224/model.json');}
数据管道 (Data Pipelines)

At the heart of your machine learning model is data. Building a solid pipeline for processing your data is crucial for success. Often, a majority of your time will be spent working with your data pipeline.

机器学习模型的核心是数据。 建立用于处理数据的可靠管道对于成功至关重要。 通常, 您的大部分时间都将花在处理数据管道上 。

It may be surprising to the academic community to know that only a tiny fraction of the code in many machine learning systems is actually doing “machine learning”. When we recognize that a mature system might end up being (at most) 5% machine learning code and (at least) 95% glue code, reimplementation rather than reuse of a clumsy API looks like a much better strategy. — D. Sculley et all

对于学术界来说,知道许多机器学习系统中只有一小部分代码实际上在进行“机器学习”可能会让您感到惊讶。 当我们认识到一个成熟的系统最终可能会变成(最多)5%的机器学习代码和(至少)95%的粘合代码时,重新实现而不是笨拙的API重用似乎是一种更好的策略。 — D. Sculley等

There’s a few common ways you’ll see image data structured:

您可以通过几种常见的方式来查看图像数据的结构化:

  1. A list of folders containing images, where the folder name is the label

    包含图像的文件夹列表,其中文件夹名称为标签
  2. Images in a single folder, with images named by label (dog-1, dog-2)

    单个文件夹中的图像,带有以标签命名的图像( dog-1dog-2 )

  3. Images in a single folder, and a csv or other file with a mapping of label to file

    单个文件夹中的图像,以及带有标签到文件的映射的csv或其他文件

There’s no right way to organize your images. Choose whatever format makes sense for you and your team. This dataset is organized by folder.

没有正确的方法来组织图像。 选择对您和您的团队有意义的任何格式。 该数据集按文件夹组织。

Our data processing pipeline will consist of four parts:

我们的数据处理管道将包括四个部分:

  1. Load the image (and turn it into a tensor)

    加载图像(并将其转变为张量)
  2. Crop the image

    裁剪图像
  3. Resize the image

    调整图像大小
  4. Translate the Tensor into an appropriate input format

    将Tensor转换为适当的输入格式
1.加载图像 (1. Loading the Image)

Since our machine learning model expects Tensors, the first step is to load the image and translate its pixel data into a Tensor. Browsers provide many convenient tools to load images and read pixels, and Tensorflow.js provides a function to convert an Image object into a Tensor. (If you're in Node, you'll have to handle this yourself). This function will take a src URL of the image, load the image, and returns a promise resolving with a 3D Tensor of shape [width, height, color_channels]:

因为我们的机器学习模型预计, 张量 ,第一步是加载图像和它的像素数据转换为张量。 浏览器提供了许多方便的工具来加载图像和读取像素,而Tensorflow.js提供了将Image对象转换为Tensor的功能。 (如果您在Node中,则必须自己处理)。 此函数将获取图像的src URL,加载图像,并返回一个形状为[width, height, color_channels]的3D Tensor解析的[width, height, color_channels]

function loadImage(src) {  return new Promise((resolve, reject) => {    const img = new Image();    img.src = src;    img.onload = () => resolve(tf.fromPixels(img));    img.onerror = (err) => reject(err);  });}
2.裁剪图像 (2. Cropping the Image)

Many classifiers expect square images. This is not a strict requirement. If you build your own model, you can specify any size resolution you want. However, standard CNN architectures expect that images be of a fixed size. Given this necessity, many pretrained models accept squares, in order to support the widest variety of image ratios. (Squares also provide flexibility for handling a variety of data augmentation techniques).

许多分类器期望正方形图像。 这不是严格的要求。 如果构建自己的模型,则可以指定所需的任何尺寸分辨率。 但是,标准的CNN体​​系结构期望图像的大小固定 。 鉴于这种必要性,许多预训练模型接受正方形,以支持最广泛的图像比率。 (Squares还为处理各种数据增强技术提供了灵活性)。

We determined above that MobileNet expects 224x224 square images, so we’ll need to first crop our images. We do that by chopping off the edges of the longer side:

上面我们确定MobileNet期望224x224方形图像,因此我们需要首先裁剪图像。 我们通过切掉较长边的边缘来做到这一点:

function cropImage(img) {  const width = img.shape[0];  const height = img.shape[1];  // use the shorter side as the size to which we will crop  const shorterSide = Math.min(img.shape[0], img.shape[1]);  // calculate beginning and ending crop points  const startingHeight = (height - shorterSide) / 2;  const startingWidth = (width - shorterSide) / 2;  const endingHeight = startingHeight + shorterSide;  const endingWidth = startingWidth + shorterSide;  // return image data cropped to those points  return img.slice([startingWidth, startingHeight, 0], [endingWidth, endingHeight, 3]);}
3.调整图像大小 (3. Resizing the image)

Now that our image is square, we can resize it to 224x224. This part is easy: Tensorflow.js provides a resize method out of the box:

现在我们的图像是正方形的,我们可以将其调整为224x224。 这部分很容易:Tensorflow.js提供了一个开箱即用的调整大小的方法:

function resizeImage(image) {  return tf.image.resizeBilinear(image, [224, 224]);}
4.翻译张量 (4. Translate the Tensor)

Recall that our model expects an input object of the shape [any, 224, 224, 3]. This is known as a Tensor of Rank 4. This dimension refers to the number of training examples. If you have 10 training examples, that would be [10, 224, 224, 3].

回想一下,我们的模型期望输入对象的形状为[any, 224, 224, 3] 。 这被称为等级4的张量。此维度是指训练示例的数量。 如果您有10个训练示例[10, 224, 224, 3]

We also want our pixel data as a floating point number between -1 and 1, instead of integer data between 0 and 255, a process called normalization. While neural networks are generally agnostic to the size of the numbers coming in, using smaller numbers can help the network train faster.

我们还希望像素数据是介于-1和1之间的浮点数,而不是介于0和255之间的整数数据,这一过程称为标准化。 虽然神经网络通常无法确定传入的数字大小 ,但是使用较小的数字可以帮助网络更快地训练。

We can build a function that expands our Tensor and translates the integers into floats with:

我们可以构建一个函数来扩展Tensor,并使用以下命令将整数转换为浮点数:

function batchImage(image) {  // Expand our tensor to have an additional dimension, whose size is 1  const batchedImage = image.expandDims(0);  // Turn pixel data into a float between -1 and 1.  return batchedImage.toFloat().div(tf.scalar(127)).sub(tf.scalar(1));}
最终管道 (The Final Pipeline)

Putting all the above functions together into a single function, we get:

将以上所有功能放到一个功能中,我们得到:

function loadAndProcessImage(image) {  const croppedImage = cropImage(image);  const resizedImage = resizeImage(croppedImage);  const batchedImage = batchImage(resizedImage);  return batchedImage;}

We can now use this function to test that our data pipeline is set up correctly. We’ll import an image whose label is known (a drum) and see if the prediction matches the expected label:

现在,我们可以使用此功能来测试数据管道是否正确设置。 我们将导入一个标签已知的图像(鼓),并查看预测是否与期望的标签匹配:

import drum from './data/pretrained-model-data/drum.jpg';loadMobilenet().then(pretrainedModel => {  loadImage(drum).then(img => {    const processedImage = loadAndProcessImage(img);    const prediction = pretrainedModel.predict(processedImage);    // Because of the way Tensorflow.js works, you must call print on a Tensor instead of console.log.    prediction.print();  });});

You should see something like:

您应该看到类似以下内容:

[[0.0000273, 5e-7, 4e-7, ..., 0.0001365, 0.0001604, 0.0003134],]

If we inspect the shape of this Tensor, we’ll see that it is [1, 1000]. MobileNet returns a Tensor containing a prediction for every category, and since MobileNet has learned 1000 classes, we receive 1000 predictions, each representing the probability that the given image belongs to a given class.

如果检查此张量的形状,我们会看到它是[1, 1000] 。 MobileNet返回一个Tensor,其中包含针对每个类别的预测,并且由于MobileNet已经学习了1000个类别,因此我们收到了1000个预测,每个预测代表给定图像属于给定类别的概率。

In order to get an actual prediction, we need to determine the most likely prediction. We flatten the tensor to 1 dimension and get the max value, which corresponds to our most confident prediction:

为了获得实际的预测,我们需要确定最可能的预测。 我们将张量展平为1维并获得最大值,这对应于我们最自信的预测:

prediction.as1D().argMax().print();

This should produce:

这应该产生:

541

In the repo you’ll find a copy of the ImageNet class definitions in JSON format. You can import that JSON file to translate the numeric prediction into an actual string:

在仓库中,您将找到JSON格式的ImageNet类定义的副本 。 您可以导入该JSON文件以将数字预测转换为实际的字符串:

import labels from './imagenet_labels.json';loadMobilenet().then(pretrainedModel => {  ...  const labelPrediction = prediction.as1D().argMax().dataSync()[0];  console.log(`    Numeric prediction is ${labelPrediction}    The predicted label is ${labels[labelPrediction]}    The actual label is drum, membranophone, tympan  `);});

You should see that 541 corresponds to drum, membranophone, tympan, which is the category our image comes from. At this point you have a working pipeline and the ability to leverage MobileNet to predict ImageNet images.

您应该看到541对应于drum, membranophone, tympan ,这是图像的来源。 至此,您已经有了一个有效的管道,并且能够利用MobileNet预测ImageNet图像。

Now let’s look at how to tune MobileNet on your specific dataset.

现在,让我们看看如何在您的特定数据集上调整MobileNet。

训练模型 (Training The Model)

We want to build a model that successfully predicts novel data — that is, data it hasn’t seen before.

我们想要建立一个能够成功预测新颖数据的模型,也就是以前从未见过的数据。

To do this, you first train the model on labeled data — data that has already been identified — and you validate the model’s performance on other labeled data it hasn’t seen before.

为此,您首先需要在标签数据(已经确定的数据)上训练模型,然后在模型之前未见过的其他标签数据上验证模型的性能。

Supervised learning reverses this process, solving for m and b, given a set of x’s and y’s. In supervised learning, you start with many particulars — the data — and infer the general equation. And the learning part means you can update the equation as you see more x’s and y’s, changing the slope of the line to better fit the data. The equation almost never identifies the relationship between each x and y with 100% accuracy, but the generalization is powerful because later on you can use it to do algebra on new data. — Kathryn Hume

监督学习逆转了这一过程,给出了x和y的集合,求出m和b。 在监督学习中,您从许多细节(数据)开始,并推导了一般方程。 学习部分意味着您可以在看到更多的x和y时更新方程式,更改线的斜率以更好地拟合数据。 该方程几乎永远不会以100%的精度识别每个x和y之间的关系,但是泛化功能强大,因为稍后您可以使用它对新数据进行代数。 — 凯瑟琳·休姆

When you trained the model above by dragging the training folder in, the model produced a training score. This indicates how many images the classifier was able to learn to successfully predict out of the training set. The second number it produced indicated how many images it could predict that it hadn't seen before. This second score is the one you want to optimize for (well, you want to optimize for both, but the latter number is more applicable to novel data).

当您通过将training文件夹拖入上方来training上面的模型时,该模型会产生一个训练得分。 这表明分类器能够学习多少图像来成功预测出训练集。 它产生的第二个数字表示它可以预测多少张图像以前从未见过 。 第二个分数是您要优化的分数(嗯,您想同时优化两个分数,但后者更适用于新数据)。

We’re going to train on the colors dataset. In the repo, you’ll find a folder data/colors that contains:

我们将在颜色数据集上进行训练。 在存储库中,您将找到一个包含以下内容的data/colors文件夹:

validation/  blue/    blue-3.png  red/    red-3.pngtraining/  blue/    blue-1.png    blue-2.png  red/    red-1.png    red-2.png

Building machine learning models, I’ve found that code-related errors — a missing variable, an inability to compile — are fairly straight forward to fix, whereas training errors — the labels were in an incorrect order, or the images were being cropped incorrectly — are devilish to debug. Testing exhaustively and setting up sanity test cases can help save you a few gray hairs.

建立机器学习模型后,我发现与代码相关的错误(缺少变量,无法编译)很容易解决,而训练错误(标签顺序不正确或图像裁剪不正确) -难以调试。 进行详尽的测试并设置健全性测试用例可以帮助您节省几头白发。

The data/colors folder provides a list of solid red and blue colors that are guaranteed to be easy to train with. We'll use these to train our model and ensure that our machine learning code learns correctly, before attempting with a more complicated dataset.

data/colors文件夹提供了保证易于训练的纯红色和蓝色列表。 在尝试使用更复杂的数据集之前,我们将使用它们来训练我们的模型并确保我们的机器学习代码能够正确学习。

import blue1 from '../data/colors/training/blue/blue-1.png';import blue2 from '../data/colors/training/blue/blue-2.png';import blue3 from '../data/colors/validation/blue/blue-3.png';import red1 from '../data/colors/training/red/red-1.png';import red2 from '../data/colors/training/red/red-2.png';import red3 from '../data/colors/validation/red/red-3.png';const training = [  blue1,  blue2,  red1,  red2,];// labels should match the positions of their associated imagesconst labels = [  'blue',  'blue',  'red',  'red',];

When we previously loaded MobileNet, we used the model without any modifications. When training, we want to use a subset of its layers — specifically, we want to ignore the final layers that produce the one-of-1000 classification. You can inspect the structure of a pretrained model with .summary():

先前加载MobileNet时,我们使用的模型未经任何修改。 训练时,我们要使用其图层的子集-具体来说,我们要忽略产生1000分之一分类的最终图层。 您可以使用.summary .summary()检查预训练模型的结构:

loadMobilenet().then(mobilenet => {  mobilenet.summary();});

In your console should be the model output, and near the end you should see something like:

在控制台中应该是模型输出,并且在最后,您应该看到类似以下内容:

conv_dw_13_bn (BatchNormaliz [null,7,7,256]            1024      _________________________________________________________________conv_dw_13_relu (Activation) [null,7,7,256]            0         _________________________________________________________________conv_pw_13 (Conv2D)          [null,7,7,256]            65536     _________________________________________________________________conv_pw_13_bn (BatchNormaliz [null,7,7,256]            1024      _________________________________________________________________conv_pw_13_relu (Activation) [null,7,7,256]            0         _________________________________________________________________global_average_pooling2d_1 ( [null,256]                0         _________________________________________________________________reshape_1 (Reshape)          [null,1,1,256]            0         _________________________________________________________________dropout (Dropout)            [null,1,1,256]            0         _________________________________________________________________conv_preds (Conv2D)          [null,1,1,1000]           257000    _________________________________________________________________act_softmax (Activation)     [null,1,1,1000]           0         _________________________________________________________________reshape_2 (Reshape)          [null,1000]               0         =================================================================Total params: 475544Trainable params: 470072Non-trainable params: 5472_________________________________________________________________

What we’re looking for is the final Activation layer that is not softmax (softmax is the activation used to boil the predictions down to one of a thousand categories). That layer is conv_pw_13_relu. We return a pretrained model that includes everything up to that activation layer:

我们正在寻找的是不是softmax的最终Activation层( softmax 是用于将预测分解为一千个类别的激活 )。 该层是conv_pw_13_relu 。 我们返回一个预训练的模型,其中包括直到激活层的所有内容:

function buildPretrainedModel() {  return loadMobilenet().then(mobilenet => {    const layer = mobilenet.getLayer('conv_pw_13_relu');    return tf.model({      inputs: mobilenet.inputs,      outputs: layer.output,    });  });}

Let’s write a function to loop through an array of images and return a Promise that resolves when they load.

让我们编写一个函数来遍历图像数组并返回一个Promise,该Promise在加载图像时进行解析。

function loadImages(images, pretrainedModel) {  let promise = Promise.resolve();  for (let i = 0; i < images.length; i++) {    const image = images[i];    promise = promise.then(data => {      return loadImage(image).then(loadedImage => {        // Note the use of `tf.tidy` and `.dispose()`. These are two memory management        // functions that Tensorflow.js exposes.        // https://js.tensorflow.org/tutorials/core-concepts.html        //        // Handling memory management is crucial for building a performant machine learning        // model in a browser.        return tf.tidy(() => {          const processedImage = loadAndProcessImage(loadedImage, pretrainedModel);          if (data) {            const newData = data.concat(processedImage);            data.dispose();            return newData;          }          return tf.keep(processedImage);        });      });    });  }  return promise;}

We build a sequential promise that iterates over each image and processes it. Alternatively, you can use Promise.all to load images in parallel, but be aware of UI performance if you do that.

我们建立了一个顺序保证,可以迭代每个图像并对其进行处理。 另外,您可以使用Promise.all并行加载图像,但是如果这样做,请注意UI性能。

Putting those functions together, we get:

将这些功能放在一起,我们得到:

buildPretrainedModel().then(pretrainedModel => {  loadImages(training, pretrainedModel).then(xs => {    xs.print();  })});

Calling your data “x” and “y” is a convention in the machine learning world, carrying over from its mathematical origins. You can call your variables whatever you want, but I find it useful to stick to the conventions where I can.

在机器学习世界中 ,将数据称为“ x”和“ y”是一种惯例,它沿袭了其数学起源。 您可以随心所欲地调用变量,但是我发现尽可能遵守约定很有用。

标签 (Labels)

Next, you’ll need to convert your labels into numeric form. However, it’s not as simple as assigning a number to each category. To demonstrate, let’s say you’re classifying three categories of fruit:

接下来,您需要将标签转换为数字形式。 但是,这不像为每个类别分配一个数字那么简单。 为了说明这一点,假设您将水果分为三类:

raspberry - 0blueberry - 1strawberry - 2

Denoting numbers like this can imply a relationship where one does not exist, since these numbers are considered ordinal values. They imply some order in the data. Real world consequences of this might be that the network decides that a blueberry is something that is halfway between a raspberry and a strawberry, or that a strawberry is the “best” of the berries.

这样表示数字可能意味着不存在一个关系,因为这些数字被认为是数值。 它们暗示数据中的某些顺序。 现实世界中的后果可能是网络决定蓝莓是介于树莓和草莓中间的东西,或者草莓是浆果中“最好的”草莓。

To prevent these incorrect assumptions, we use a process called “one hot encoding”, resulting in data that looks like:

为了避免这些错误的假设,我们使用了一个称为“一次热编码”的过程,生成的数据如下所示:

raspberry  - [1, 0, 0]blueberry  - [0, 1, 0]strawberry - [0, 0, 1]

(Two great articles that go into more depth on one hot encoding are here and here.) We can leverage Tensorflow.js’s built in oneHot functions to translate our labels:

( 这里和这里,有两篇很棒的文章对一种热编码进行了更深入的介绍。)我们可以利用Tensorflow.js内置的oneHot函数来翻译标签:

function oneHot(labelIndex, classLength) {  return tf.tidy(() => tf.oneHot(tf.tensor1d([labelIndex]).toInt(), classLength));};

This function takes a particular number (labelIndex, a number that corresponds to a label) and translates it to a one hot encoding, given some number of classes (classLength). We can use the function with the following bit of code, that first builds a mapping of numbers-to-labels off the incoming array of labels, and then builds a Tensor containing those one-hot encoded labels:

在给定一定数量的类( classLength )的labelIndex ,此函数采用一个特定的数字( labelIndex ,一个与标签相对应的数字)并将其转换为一种热编码。 我们可以将函数与以下代码一起使用,该函数首先在传入的标签数组上构建数字到标签的映射,然后构建包含那些一次性编码标签的Tensor:

function getLabelsAsObject(labels) {  let labelObject = {};  for (let i = 0; i < labels.length; i++) {    const label = labels[i];    if (labelObject[label] === undefined) {      // only assign it if we haven't seen it before      labelObject[label] = Object.keys(labelObject).length;    }  }  return labelObject;}function addLabels(labels) {  return tf.tidy(() => {    const classes = getLabelsAsObject(labels);    const classLength = Object.keys(classes).length;    let ys;    for (let i = 0; i < labels.length; i++) {      const label = labels[i];      const labelIndex = classes[label];      const y = oneHot(labelIndex, classLength);      if (i === 0) {        ys = y;      } else {        ys = ys.concat(y, 0);      }    }    return ys;  });};

Now that we have our data, we can build our model. You are welcome to innovate at this stage, but I find that building on others’ conventions tends to produce a good enough model in most cases. We’ll look to the Webcam Tensorflow.js example for a well structured transfer learning model we’ll reuse largely verbatim.

现在我们有了数据,我们可以建立模型了。 欢迎您在此阶段进行创新,但是我发现在大多数情况下,以他人的惯例为基础往往会产生足够好的模型。 我们将看一下Webcam Tensorflow.js示例,了解一个结构良好的转移学习模型,我们将逐字重用。

Things worth highlighting are that the first layer matches the output shape of our pretrained model, and the final softmax layer corresponds to the number of labels, defined as numberOfClasses. 100 units on the second layer is arbitrary, and you can absolutely experiment with changing this number for your particular use case.

值得强调的是,第一层与我们预先训练的模型的输出形状匹配,而最后的softmax层对应于标签的数量,定义为numberOfClasses 。 第二层上的100个单位是任意的,您可以针对特定用例绝对尝试更改此数字。

function getModel(numberOfClasses) {  const model = tf.sequential({    layers: [      tf.layers.flatten({inputShape: [7, 7, 256]}),      tf.layers.dense({        units: 100,        activation: 'relu',        kernelInitializer: 'varianceScaling',        useBias: true      }),      tf.layers.dense({        units: numberOfClasses,        kernelInitializer: 'varianceScaling',        useBias: false,        activation: 'softmax'      })    ],  });  model.compile({    optimizer: tf.train.adam(0.0001),    loss: 'categoricalCrossentropy',    metrics: ['accuracy'],  });  return model;}

Here are various links if you want to go into a little more depth on the neural networks’ internal parts:

如果您想更深入地了解神经网络的内部部分,可以使用以下各种链接:

  • tf.sequential

    tf.sequential

  • tf.layers.flatten

    tf.layers.flatten

  • tf.layers.dense

    tf.layers.dense

  • the activation relu

    激活relu

  • adam optimizer

    adam 优化器

  • categoricalCrossentropy loss

    categoricalCrossentropy 损失

The final step is actually train the model, which we do by calling .fit() on the model. We shuffle our training images so the model doesn't learn to rely on the order of the incoming training data, and we train for 20 epochs. (An epoch denotes one cycle through your entire training set.)

最后一步实际上是训练模型,我们通过在模型上调用.fit()来完成。 我们对训练图像进行混洗,以使模型不会学会依赖于传入训练数据的顺序,并且训练了20个纪元。 (一个时期代表整个训练集的一个循环。)

function makePrediction(pretrainedModel, image, expectedLabel) {  loadImage(image).then(loadedImage => {    return loadAndProcessImage(loadedImage, pretrainedModel);  }).then(loadedImage => {    console.log('Expected Label', expectedLabel);    console.log('Predicted Label', predict(model, loadedImage));    loadedImage.dispose();  });}buildPretrainedModel().then(pretrainedModel => {  loadImages(training, pretrainedModel).then(xs => {    const ys = addLabels(labels);    const model = getModel(2);    model.fit(xs, ys, {      epochs: 20,      shuffle: true,    }).then(history => {      // make predictions      makePrediction(pretrainedModel, blue3, "0");      makePrediction(pretrainedModel, red3, "1");    });  });});

How many epochs should you run for?

您应该跑几个纪元?

Unfortunately, there is no right answer to this question. The answer is different for different datasets but you can say that the numbers of epochs is related to how diverse your data is — Sagar Sharma

不幸的是,这个问题没有正确的答案。 对于不同的数据集,答案是不同的,但是您可以说,时期的数量与数据的多样性有关— Sagar Sharma

Basically, you can run it until it’s good, or until it’s clear it’s not working, or you run out of time.

基本上,您可以运行它直到它运行良好,或者直到它显然不起作用,或者时间用完为止。

You should see 100% accuracy in the training above. Try modifying the code to work on the Pexels dataset. I found in my testing that my accuracy numbers fall a little bit with this more complex dataset.

您应该在上面的培训中看到100%的准确性。 尝试修改代码以在Pexels数据集上工作。 我在测试中发现,这个更复杂的数据集使我的准确度下降了一点。

最后的想法 (Final thoughts)

In summary, it’s cheap and fast to build on top of a pretrained model and get a classifier that is pretty darn accurate.

总之,在预训练模型的基础上构建并获得非常准确的分类器既便宜又快速。

When coding machine learning, be careful to test your code at each section of the process and validate with data you know works. It pays to set up a stable and reusable data pipeline early in your process, since so much of your time is spent working with your data.

对机器学习进行编码时,请小心在过程的每个部分测试您的代码,并使用已知有效的数据进行验证。 在您的流程中尽早建立稳定且可重复使用的数据管道是值得的,因为您花费了很多时间来处理数据。

Finally, if you’re interested in learning more about training CNNs from scratch, a great place to start is Fast.ai’s tutorials for hackers. It’s built in Python but you can translate the ideas in Node.js if you want to stay in Javascript.

最后,如果您想了解更多有关从头开始训练CNN的信息,那么Fast.ai的黑客教程就是一个很好的起点。 它是Python内置的,但是如果您想保留Java脚本,则可以在Node.js中翻译思想。

Originally published at https://thekevinscott.com

最初发布在https://thekevinscott.com

翻译自: https://www.freecodecamp.org/news/image-classification-in-the-browser-with-javascript-bec7b5a7a8c3/

你可能感兴趣的:(大数据,python,tensorflow,神经网络,机器学习)