机器学习量子_量子机器学习：神经网络学习

机器学习量子

My last articles tackled Bayes nets on quantum computers (read it here!), and k-means clustering, our first steps into the weird and wonderful world of quantum machine learning.

我的最后一篇文章讨论了量子计算机上的贝叶斯网络( 在这里阅读！)和k-means聚类，这是我们进入量子机器学习的怪异世界的第一步。

This time, we’re going a little deeper into the rabbit hole and looking at how to build a neural network on a quantum computer.

这次，我们将更深入地研究兔子洞，并研究如何在量子计算机上构建神经网络。

In case you aren’t up to speed on neural nets, don’t worry — we’re starting with neural nets 101.

如果您不适应神经网络的速度，请不要担心-我们将从神经网络101开始。

什么是(经典的)神经网络？ (What even is a (classical) neural network?)

Almost everyone has heard of neural networks — they’re used to run some of the coolest tech we have today — self driving cars, voice assistants, and even the software that generates super realistic pictures of famous people doing questionable things.

几乎每个人都听说过神经网络-它们已经被用来运行我们今天拥有的一些最酷的技术-自动驾驶汽车，语音助手，甚至是生成可疑人物做事的超逼真的图片的软件。

What makes them different from regular algorithms is that instead of having to write down a set of rules, we need to provide networks with examples of the problem we want it to solve.

它们与常规算法的不同之处在于，我们不必编写一组规则，而需要为网络提供我们要解决的问题的示例。

We could feed a network with some data from the IRIS data set, which contains information about three kinds of flowers, and it might guess which kind of flower it is:

我们可以使用IRIS数据集中的一些数据为网络提供数据，该数据包含有关三种花的信息，并且可能会猜测它是哪种花：

Image by Author 图片作者

So now we know what neural networks do — but how do they do it?

所以现在我们知道了神经网络的作用-但是它们是如何做到的？

重量，偏见和基石 (Weight, biases and building blocks)

Neural networks are made up of many small units called neurons, which look like this:

神经网络由许多称为神经元的小单元组成，如下所示：

Image by Author 图片作者

Most neurons take multiple numeric inputs (the blue circles), and multiply each one of them by a weight (the wᵢs) that represent how important each input is. The larger the magnitude of a weight, the more important the associated input is.

大多数神经元接受多个数字输入(蓝色圆圈)，然后将每个数字与一个权重(wᵢs)相乘，代表每个输入的重要性。权重的大小越大，关联的输入就越重要。

The bias is treated like another weight, only that the input it multiplies always has a value of 1. When we add up all the weighted inputs, we get the activation value of the neuron, represented by the purple circle in the picture above:

偏差被视为另一个权重，只是它所乘的输入始终具有一个值1。当我们将所有加权的输入相加时，我们得到了神经元的激活值，由上图中的紫色圆圈表示：

The activation value is then passed through a function (the blue rectangle), and the result is the output of the neuron:

激活值然后通过一个函数(蓝色矩形)传递，结果是神经元的输出：

We can change a neuron’s behavior by changing the function it uses to transform its activation value — for example, we could use a super simple transformation, like this one:

我们可以通过更改神经元用来转换其激活值的函数来更改神经元的行为，例如，可以使用如下所示的超简单转换：

In practice, however, we use more complex ones, like the sigmoid function:

但是，实际上，我们使用更复杂的函数，例如Sigmoid函数：

How are neurons useful?

神经元有什么用？

They can make decisions based on the inputs they receive — for example, we could use a neuron to predict whether we’ll eat pizza or pasta the next time we eat out at an Italian place by feeding it the answers to three questions:

他们可以根据收到的输入来做出决定-例如，我们可以使用神经元来预测下一次在意大利吃饭的时候是吃披萨还是面食，方法是将三个问题的答案提供给他们：

Do I like the pasta at this restaurant?
我喜欢这家餐厅的面食吗？
Does the restaurant have pesto sauce?
该餐厅有香蒜酱吗？
Does the restaurant have a triple cheese pizza?
这家餐厅有三层奶酪披萨吗？

Putting aside possible health concerns, let’s see what the neuron might look like — we can encode the inputs using 0 to represent no, 1 to represent yes, and do the same with the outputs by mapping 0 and 1 to pasta and pizza respectively:

撇开可能的健康问题，让我们看看神经元可能是什么样子—我们可以使用0代表否，1代表是对输入进行编码，并通过分别将0和1映射到意大利面和披萨来对输出进行相同的处理：

Image by Author 图片作者

Let’s use the step function to transform the neuron’s activation value:

让我们使用step函数来转换神经元的激活值：

Just using one neuron, we can capture multiple decision making behaviours:

仅使用一个神经元，我们就可以捕获多种决策行为：

If we like the pasta at a restaurant, we choose to order pasta unless pesto sauce is out and they serve a triple cheese pizza.
如果我们喜欢在餐厅吃意大利面，我们选择点意大利面，除非没有香蒜酱，并且可以提供三层奶酪比萨。
If we don’t like the pasta at a restaurant, we order a pizza unless pesto sauce is available, and the triple cheese pizza is not.
如果我们不喜欢在餐厅吃意大利面，我们会点披萨，除非有香蒜酱可用，而三层奶酪披萨则没有。

We can also do things the other way — we can program a neuron so that it corresponds to a specific set of preferences.

我们还可以用其他方式来做事情-我们可以对神经元进行编程，使其与一组特定的偏好相对应。

If all we wanted to do was predict what we would eat the next time we go out, it would probably be easy to figure out a set of weights and biases for one neuron, but what if we had to do the same with a full-sized network?

如果我们要做的只是预测下次出门吃什么，可能很容易找出一组神经元的权重和偏见，但是如果我们必须对一个神经元做同样的事情，那该怎么办？规模的网络？

It would probably take a while.

这可能需要一段时间。

Fortunately, instead of guessing the values of the weights we need, we can create algorithms that change the parameters of a network — the weights, biases, and even the structure — so that it can learn a solution for a problem we want to solve.

幸运的是，我们无需猜测所需的权重值，而是可以创建可更改网络参数(权重，偏差甚至结构)的算法，以便它可以为我们要解决的问题学习解决方案。

下降到顶部 (Going down to get to the top)

Ideally, a network’s prediction would be the same as the label associated with the input we feed it — so the smaller the difference between the prediction and actual output, the better the set of weights the network has learned.

理想情况下，网络的预测应与与我们为其输入的输入相关联的标签相同-因此，预测与实际输出之间的差异越小，网络所获权重集就越好。

We quantify this difference using a loss function, which can take any form we want, like this one, which is called the quadratic loss function:

我们使用损失函数来量化这种差异，损失函数可以采用我们想要的任何形式，例如这种形式，称为二次损失函数：

y(x) is the desired output and o(x, θ) is the network’s output when fed data x with parameters θ— since the loss is always non negative, once it takes on values close to 0, we know that the network has learned a good parameter set. Of course, there are other problems that can crop up, like over-fitting, but we’ll ignore those for now.

y(x)是期望的输出，而o(x，θ)是在馈送带有参数θ的数据x时网络的输出-由于损耗始终为非负值，一旦取值接近于0，我们就知道网络具有学习了一个好的参数集。当然，还会出现其他问题，例如过度拟合，但我们暂时将其忽略。

Using the loss function, we can figure out what the best parameter set for our network is:

使用损失函数，我们可以找出为我们的网络设置的最佳参数是：

So instead of guessing weights, all we need to do is minimize C with respect to parameters θ — which we can do using a technique called gradient descent:

因此，除了猜测权重之外，我们要做的就是相对于参数θ最小化C ，我们可以使用称为“梯度下降”的技术来做到这一点：

All we’re doing here is looking at how the loss changes if we increase the value of θᵢ, and then updating θᵢ so that the loss decreases by a little bit. η is a small number which controls by how much we change θᵢ every time we update it.

我们在这里所做的只是查看如果增加θᵢ的值后损耗如何变化，然后更新θᵢ以使损耗稍微降低。 η是一个很小的数字，它控制着每次更新θᵢ时会改变多少。

Why do we need η to be small? We could just adjust it so that the loss on the current x is close to zero after just one update— most times this is not a great idea, because while it would reduce the loss on the current x, it often leads to much worse performance on all the other data samples we feed to the network.

为什么我们需要η小？我们可以对其进行调整，以使一次更新后当前x的损失接近零-多数情况下，这不是一个好主意，因为虽然这会减少当前x的损失，但通常会导致更差的性能在所有其他数据样本上，我们将其馈送到网络。

Awesome!

太棒了！

Now that we’ve got the basics down, let’s figure out how to build a quantum neural network.

现在，我们已经掌握了基础知识，接下来让我们弄清楚如何构建量子神经网络。

进入量子宇宙 (Into the quantumverse)

The quantum neural net we’ll be building doesn’t work the exact same way as the classical networks we’ve worked on so far—instead of using neurons with weights and biases, we encode the input data into a bunch of qubits, apply a sequence of quantum gates, and change the gate parameters to minimize a loss function:

我们将要建立的量子神经网络的工作方式与迄今为止我们所研究的经典网络完全不同-我们不是使用具有权重和偏差的神经元，而是将输入数据编码为一堆qubit，应用一系列量子门，并更改门参数以最小化损耗函数：

Image by Author 图片作者

While that might sound new, the idea is still the same — change the parameter set to minimize the difference between the network predictions and input labels.

尽管这听起来很新，但想法还是一样的-更改参数集以最大程度地减少网络预测和输入标签之间的差异。

To keep things simple, we’ll be building a binary classifier — meaning that every data point fed to the network has to have an associated label of either 0 or 1.

为简单起见，我们将构建一个二进制分类器-这意味着馈入网络的每个数据点都必须具有0或1的关联标签。

How does it work?

它是如何工作的？

We start by feeding some data x to the network, which is passed through a feature map — a function that transforms the input data into a form we can use to create the input quantum state:

我们首先将一些数据x馈入网络，然后通过特征图传递给网络-该函数将输入数据转换为可用于创建输入量子态的形式：

The feature map we use can look like almost anything — here’s one that takes in a two dimensional vector x, and spits out an angle:

我们使用的特征图看起来几乎可以是任何东西-这是一个输入二维向量x并吐出一个角度的图：

Once x is encoded as a quantum state, we apply a series of quantum gates:

将x编码为量子态后，我们将应用一系列量子门：

The network output, which we’ll call π(x, θ), is the probability of the last qubit being measured as in the |1〉 state, plus a bias term that is added classically:

网络输出，我们称之为π( x ， θ)是在| 1〉状态下测量最后一个qubit的概率，加上经典添加的偏置项：

The Zₙ ₋ ₁ stands for a Z gate applied to the last qubit.

Zₙ代表施加到最后一个量子位的Z门。

Finally, we take the output and the label associated with x, and use them to compute the loss over the sample — we’ll use the same quadratic loss from above. The cost over the entire data set X we feed the network then becomes:

最后，我们获取与x关联的输出和标签，并使用它们来计算样本的损失-我们将从上方使用相同的二次损失。然后，我们为网络提供的整个数据集X的成本变为：

The prediction of the network p can be obtained from the output:

网络p的预测可以从输出中获得：

Now all we need to do is to figure out how to compute the gradients of the loss function l(x, θ). While we could do it classically, that would be boring — what we need is a way to compute them on a quantum computer.

现在，我们要做的就是弄清楚如何计算损失函数l ( x ，θ)的梯度。尽管我们可以经典地做到这一点，但这将很无聊–我们需要的是一种在量子计算机上进行计算的方法。

一种计算梯度的新方法 (A new way to compute gradients)

Let’s start by differentiating the loss function with respect to a parameter θᵢ:

让我们从关于参数θᵢ的损耗函数开始：

Let’s expand the last term:

让我们扩展最后一个术语：

We can quickly get rid of the constant terms — and in the case that θᵢ = b, we know that the gradient is simply 1:

我们可以快速摆脱常数项-并且在θᵢ= b的情况下，我们知道梯度只是1：

Now, using the product rule, we can expand further:

现在，使用乘积规则，我们可以进一步扩展：

That probably looks a little painful to read — but thanks to the Hermitian conjugate (the †), this has a concise representation:

读起来可能有点痛苦-但由于使用了Hermitian共轭(†)，因此表示简洁：

Since U(θ) is made up of multiple gates, each of them controlled by a different parameter (or sets of parameters), finding the partial derivative of U only involves differentiating the gate Uᵢ(θᵢ) that is dependent on θᵢ:

由于U (θ)由多个门组成，每个门由不同的参数(或一组参数)控制，因此找到U的偏导数仅涉及区分门Uᵢ (θᵢ) 取决于θᵢ：

This is where the form we choose for each Uᵢ becomes important. We’ll use the same form for every Uᵢ, which we’ll call a G gate — the choice of form is arbitrary, so you could use any other form you can think of instead:

在这里，我们为每个Uᵢ选择的形式变得很重要。我们将对每个Uᵢ使用相同的形式，我们将其称为G门-形式的选择是任意的，因此您可以使用可以想到的任何其他形式来代替：

Now that we know what each Uᵢ looks like, we can find its derivative:

现在我们知道每个Uᵢ的样子，我们可以找到它的派生词：

Lucky for us, we can express this in terms of the G gate:

对我们来说很幸运，我们可以用G来表示门：

So all that’s left is to figure out how to create a circuit that gives us the inner product form we need:

因此，剩下的就是弄清楚如何创建一个电路，为我们提供所需的内部产品形式：

The easiest way to get a measurable that is proportional to this is to use the Hadamard test — first, we prepare the input quantum state and push an ancilla into superposition:

获得与之成比例的可测量值的最简单方法是使用Hadamard检验 -首先，我们准备输入量子态并将辅助子项推入叠加状态：

Now apply Zₙ ₋ ₁B onto ψ, conditioned on the ancilla being in the 1 state:

现在，将ZₙB₁B应用于 ψ，条件是附加柱处于1状态：

Then flip the ancilla, and do the same with A:

然后翻转ancilla，并使用A进行相同操作：

Finally, apply another Hadamard gate onto the ancilla:

最后，将另一个Hadamard门应用到辅助部件上：

Now the probability of measuring the ancilla as 0 is

现在测量辅助线为0的概率为

So if we substitute U(θ) for B, and a copy of U(θ) with Uᵢ swapped out for its derivative for A, then the probability of the ancilla qubit will give us the gradient of π(x, θ) with respect to θᵢ.

因此，如果我们用U (θ)代替B ，并且用Uᵢ的U (θ)的副本替换为A的导数，那么辅助量子位的概率将给我们π( x ，相对于θᵢ。

Great!

大！

We figured out a way to analytically compute gradients on a quantum computer — now all that’s left is to build our quantum neural network.

我们找到了一种在量子计算机上分析计算梯度的方法-现在剩下的就是建立我们的量子神经网络。

建立量子神经网络 (Building a quantum neural network)

Let’s import all the modules we need to kick things off:

让我们导入所有需要启动的模块：

from qiskit import QuantumRegister, ClassicalRegister
from qiskit import Aer, execute, QuantumCircuit
from qiskit.extensions import UnitaryGate
import numpy as np

Now let’s take a look at some of our data (you can get it right here!) — it’s a processed version of the IRIS data set, with one class removed:

现在，让我们看一些数据(您可以在这里找到它！)—它是IRIS数据集的处理后的版本，其中删除了一个类：

0.803772773,0.5516087658,0.2206435063,0.0315205009,0
0.714141252,0.2664706164,0.6182118301,0.1918588438,1
0.7761140001,0.5497474167,0.3072117917,0.03233808334,0
0.8609385733,0.4400352708,0.2487155878,0.05739590488,0
0.690525124,0.3214513508,0.6071858849,0.2262065061,1

We need to separate the features (the first four columns) from the labels:

我们需要将功能(前四列)与标签分开：

data = np.genfromtxt("processedIRISData.csv", delimiter=",")
X = data[:, 0:4]
features = np.array([convertDataToAngles(i) for i in X])
Y = data[:, -1]

Now let’s build a function that will do the feature mapping for us.

现在，让我们构建一个为我们进行功能映射的函数。

Since the input vectors are normalized and 4 dimensional, there is a super simple option for the mapping — use 2 qubits to hold the encoded data, and use a mapping that just recreates the input vector as a quantum state.

由于输入向量是归一化的和4维的，因此映射有一个超级简单的选择-使用2个量子位来保存编码数据，并使用仅将输入向量重新创建为量子态的映射。

For this we need two functions — one to extract angles from the vectors:

为此，我们需要两个函数-一个从向量中提取角度：

def convertDataToAngles(data):
    """
    Takes in a normalised 4 dimensional vector and returns 
    three angles such that the encodeData function returns 
    a quantum state with the same amplitudes as the 
    vector passed in. 
    """
    prob1 = data[2] ** 2 + data[3] ** 2
    prob0 = 1 - prob1
    angle1 = 2 * np.arcsin(np.sqrt(prob1))
    prob1 = data[3] ** 2 / prob1
    angle2 = 2 * np.arcsin(np.sqrt(prob1))
    prob1 = data[1] ** 2 / prob0
    angle3 = 2 * np.arcsin(np.sqrt(prob1))
    return np.array([angle1, angle2, angle3])

Another to convert the angles we get into a quantum state:

另一种将角度转换为量子态的方法：

def encodeData(qc, qreg, angles):
    """
    Given a quantum register belonging to a quantum
    circuit, performs a series of rotations and controlled
    rotations characterized by the angles parameter.
    """    
    qc.ry(angles[0], qreg[1])
    qc.cry(angles[1], qreg[1], qreg[0])
    qc.x(qreg[1])
    qc.cry(angles[2], qreg[1], qreg[0])
    qc.x(qreg[1])

This might seem a little confusing, but understanding how it works isn’t essential to building the QNN — you can read up on it here if you like.

这似乎有些令人困惑，但是了解它的工作方式对于构建QNN并不是必不可少的-如果愿意，可以在这里阅读。

Now we can write the functions we need to implement U(θ), which will take the form of alternating layers of RY and CX gates.

现在我们可以编写实现U (θ)所需的函数，该函数将采用RY和CX交替层的形式盖茨。

Why do we need the CX layers?

为什么我们需要CX层？

If we didn’t include them, we wouldn’t be able to perform any entanglement operations, which would limit the area within the Hilbert space that our network can reach — using CX gates, the network can capture interactions between qubits that it wouldn't be able to without them.

如果不包括它们，我们将无法执行任何纠缠操作，这将限制我们的网络在希尔伯特空间内可以到达的区域-使用CX门，网络可以捕获量子比特之间的相互作用，不能没有他们。

We’ll start with the G gates:

我们将从G门开始：

def GGate(qc, qreg, params):
    """
    Given a parameter α, return a single
    qubit gate of the form
    [cos(α), sin(α)]
    [-sin(α), cos(α)]
    """    
    u00 = np.cos(params[0])
    u01 = np.sin(params[0])
    gateLabel = "G({})".format(
        params[0]
    )
    GGate = UnitaryGate(np.array(
        [[u00, u01], [-u01, u00]]
    ), label=gateLabel)
    return GGate


def GLayer(qc, qreg, params):
    """
    Applies a layer of GGates onto the qubits of register
    qreg in circuit qc, parametrized by angles params.
    """    
    for i in range(2):
        qc.append(GGate(qc, qreg, params[i]), [qreg[i]])

Next, we’ll do the CX gates:

接下来，我们将进行CX门操作：

def CXLayer(qc, qreg, order):
    """
    Applies a layer of CX gates onto the qubits of register
    qreg in circuit qc, with the order of application
    determined by the value of the order parameter.
    """    
    if order:
        qc.cx(qreg[0], qreg[1])
    else:
        qc.cx(qreg[1], qreg[0])

Now we put them together to get U(θ):

现在我们将它们放在一起以获得U (θ)：

def generateU(qc, qreg, params):
    """
    Applies the unitary U(θ) to qreg by composing multiple 
    G layers and CX layers. The unitary is parametrized by
    the array passed into params.
    """    
    for i in range(params.shape[0]):
        GLayer(qc, qreg, params[i])
        CXLayer(qc, qreg, i % 2)

Next we create a function that allows us to get the output of the network, and another that converts those outputs into class predictions:

接下来，我们创建一个函数，该函数使我们能够获取网络的输出，而另一个函数会将这些输出转换为类预测：

def getPrediction(qc, qreg, creg, backend):
    """
    Returns the probability of measuring the last qubit
    in register qreg as in the |1⟩ state.
    """    
    qc.measure(qreg[0], creg[0])
    job = execute(qc, backend=backend, shots=10000)
    results = job.result().get_counts()
    if '1' in results.keys():
        return results['1'] / 100000
    else:
        return 0
    
def convertToClass(predictions):
    """
    Given a set of network outputs, returns class predictions
    by thresholding them.
    """    
    return (predictions >= 0.5) * 1

Now we can build a function that performs a forward pass on the network — feeds it some data, processes it, and gives us the network output:

现在，我们可以构建一个在网络上执行前向传递的功能-向其提供一些数据，对其进行处理，并为我们提供网络输出：

def forwardPass(params, bias, angles, backend):
    """
    Given a parameter set params, input data in the form
    of angles, a bias, and a backend, performs a full 
    forward pass on the network and returns the network
    output.
    """
    qreg = QuantumRegister(2)
    anc = QuantumRegister(1)
    creg = ClassicalRegister(1)
    qc = QuantumCircuit(qreg, anc, creg)
    encodeData(qc, qreg, angles)
    generateU(qc, qreg, params)
    pred = getPrediction(qc, qreg, creg, backend) + bias
    return pred

After that, we can write all the functions we need to measure gradients — first, we need to be able to apply controlled versions of U(θ):

之后，我们可以编写测量梯度所需的所有功能-首先，我们需要能够应用U (θ)的受控版本：

def CGLayer(qc, qreg, anc, params):
    """
    Applies a controlled layer of GGates, all conditioned
    on the first qubit of the anc register.
    """
    for i in range(2):
        qc.append(GGate(
            qc, qreg, params[i]
        ).control(1), [anc[0], qreg[i]])
        
def CCXLayer(qc, qreg, anc, order):
    """
    Applies a layer of Toffoli gates with the first
    control qubit always being the first qubit of the anc
    register, and the second depending on the value
    passed into the order parameter.
    """
    if order:
        qc.ccx(anc[0], qreg[0], qreg[1])
    else:
        qc.ccx(anc[0], qreg[1], qreg[0])
        
def generateCU(qc, qreg, anc, params):
    """
    Applies a controlled version of the unitary U(θ),
    conditioned on the first qubit of register anc.
    """
    for i in range(params.shape[0]):
        CGLayer(qc, qreg, anc, params[i])
        CCXLayer(qc, qreg, anc, i % 2)

Using this we can create a function that computes expectation values:

使用这个我们可以创建一个计算期望值的函数：

def computeRealExpectation(params1, params2, angles, backend):
    """
    Computes the real part of the inner product of the
    quantum states produced by acting with U(θ)
    characterised by two sets of parameters, params1 and
    params2.
    """
    qreg = QuantumRegister(2)
    anc = QuantumRegister(1)
    creg = ClassicalRegister(1)
    qc = QuantumCircuit(qreg, anc, creg)
    encodeData(qc, qreg, angles)
    qc.h(anc[0])
    generateCU(qc, qreg, anc, params1)
    qc.cz(anc[0], qreg[0])
    qc.x(anc[0])
    generateCU(qc, qreg, anc, params2)
    qc.x(anc[0])
    qc.h(anc[0])
    prob = getPrediction(qc, anc, creg, backend)
    return 2 * (prob - 0.5)

Now we can figure out the gradients of the loss function — the multiplication we do at the end is to account for the π(x, θ) - y(x) term in the gradient:

现在我们可以计算出损失函数的梯度-最后我们要做的乘法是解决梯度中的π ( x ，θ)-y( x )项：

def computeGradient(params, angles, label, bias, backend):
    """
    Given network parameters params, a bias bias, input data
    angles, and a backend, returns a gradient array holding
    partials with respect to every parameter in the array
    params.
    """
    prob = forwardPass(params, bias, angles, backend)
    gradients = np.zeros_like(params)
    for i in range(params.shape[0]):
        for j in range(params.shape[1]):
            newParams = np.copy(params)
            newParams[i, j, 0] += np.pi / 2
            gradients[i, j, 0] = computeRealExpectation(
                params, newParams, angles, backend
            )
            newParams[i, j, 0] -= np.pi / 2
    biasGrad = (prob + bias - label)
    return gradients * biasGrad, biasGrad

Once we have the gradients, we can update the network parameters using gradient descent, along with a trick called momentum, which helps speed up training times:

一旦有了梯度，就可以使用梯度下降以及称为动量的技巧来更新网络参数，这有助于加快训练时间：

def updateParams(params, prevParams, grads, learningRate, momentum):
    """
    Updates the network parameters using gradient descent 
    and momentum.
    """
    delta = params - prevParams
    paramsNew = np.copy(params)
    paramsNew = params - grads * learningRate + momentum * delta
    return paramsNew, params

Now we can build our cost and accuracy functions so we can see how our network is responding to training:

现在，我们可以构建成本和准确性功能，以便了解我们的网络如何响应培训：

def cost(labels, predictions):
    """
    Returns the sum of quadratic losses over the set
    (labels, predictions).
    """
    loss = 0
    for label, pred in zip(labels, predictions):
        loss += (pred - label) ** 2


    return loss / 2




def accuracy(labels, predictions):
    """
    Returns the percentage of correct predictions in the
    set (labels, predictions).
    """
    acc = 0
    for label, pred in zip(labels, predictions):
        if label == pred:
            acc += 1


    return acc / labels.shape[0]

Finally, we create the function that trains the network, and call it:

最后，我们创建训练网络的函数，并调用它：

def trainNetwork(data, labels, backend):
    """
    Train a quantum neural network on inputs data and
    labels, using backend backend. Returns the parameters
    learned.
    """
    np.random.seed(1)
    numSamples = labels.shape[0]
    numTrain = int(numSamples * 0.75)
    ordering = np.random.permutation(range(numSamples))
    trainingData = data[ordering[:numTrain]]
    validationData = data[ordering[numTrain:]]
    trainingLabels = labels[ordering[:numTrain]]
    validationLabels = labels[ordering[numTrain:]]
    params = np.random.sample((5, 2, 1))
    bias = 0.01
    prevParams = np.copy(params)
    prevBias = bias
    batchSize = 5
    momentum = 0.9
    learningRate = 0.02
    
    for iteration in range(15):
        samplePos = iteration * batchSize
        batchTrainingData = trainingData[samplePos:samplePos + 5]
        batchLabels = trainingLabels[samplePos:samplePos + 5]
        batchGrads = np.zeros_like(params)
        batchBiasGrad = 0
        for i in range(batchSize):
            grads, biasGrad = computeGradient(
                params, batchTrainingData[i], batchLabels[i], bias, backend
            )
            batchGrads += grads / batchSize
            batchBiasGrad += biasGrad / batchSize


        params, prevParams = updateParams(
            params, prevParams, batchGrads, learningRate, momentum
        )


        temp = bias
        bias += -learningRate * batchBiasGrad + momentum * (bias - prevBias)
        prevBias = temp


        trainingPreds = np.array([forwardPass(
            params, bias, angles, backend
        ) for angles in trainingData])
        print('Iteration {} | Loss: {}'.format(
            iteration + 1, cost(trainingLabels, trainingPreds)
        ))


    validationProbs = np.array(
        [forwardPass(
            params, bias, angles, backend
        ) for angles in validationData]
    )
    validationClasses = convertToClass(validationProbs)
    validationAcc = accuracy(validationLabels, validationClasses)
    print('Validation accuracy:', validationAcc)


    return params
  
backend = Aer.get_backend('qasm_simulator')
learnedParams = trainNetwork(features, Y, backend)

The numbers we pass into the np.random.sample() method determines the size of our parameter set — the first number (5) is the number of G layers we want.

我们传递给np.random.sample()方法的数字确定了参数集的大小-第一个数字(5)是所需的G层数。

This was the output I got after training a network with five layers for fifteen iterations:

这是在训练具有五个层的网络进行十五次迭代后得到的输出：

Iteration 1 | Loss: 17.433085925400004
Iteration 2 | Loss: 16.29878057140824
Iteration 3 | Loss: 14.796300997002378
Iteration 4 | Loss: 13.45048890335602
Iteration 5 | Loss: 12.207399339199581
Iteration 6 | Loss: 11.203202358947257
Iteration 7 | Loss: 9.836832509742251
Iteration 8 | Loss: 8.901883213728054
Iteration 9 | Loss: 8.022787152763158
Iteration 10 | Loss: 7.408032981549452
Iteration 11 | Loss: 6.728295582051598
Iteration 12 | Loss: 6.193162047195093
Iteration 13 | Loss: 5.866241892968018
Iteration 14 | Loss: 5.445387724245562
Iteration 15 | Loss: 5.19377811976361
Validation accuracy: 1.0

Looks pretty good — we’ve achieved 100% accuracy on the validation set, meaning that the network generalised to unseen examples successfully!

看起来不错-我们已经在验证集上实现了100％的准确性，这意味着该网络成功地推广到了看不见的示例！

结语 (Wrapping up)

So we built a quantum neural network —awesome!

因此，我们建立了一个量子神经网络-太棒了！

There are a couple of ways we can maybe bring the loss further down — train the network for a few more iterations, or play with hyper-parameters like batch size and learning rate.

我们有几种方法可以使损失进一步降低-对网络进行更多的迭代训练，或者使用批处理大小和学习率等超参数。

A cool way to take things forward would be to experiment with different gate selections for U(θ) — you might be able to find one that works a lot better!

一种前进的好方法是尝试对U (θ)使用不同的门选择-您可能会找到效果更好的选择！

You can grab the entire project here. If you have any questions, drop a comment here, or get in touch — I would be happy to help!

您可以在此处获取整个项目。如果您有任何疑问，请在此处发表评论或与我们联系-我们将很乐意为您提供帮助！

翻译自: https://towardsdatascience.com/quantum-machine-learning-learning-on-neural-networks-fdc03681aed3

机器学习量子

机器学习 量子_量子机器学习：神经网络学习

什么是(经典的)神经网络？ (What even is a (classical) neural network?)

重量，偏见和基石 (Weight, biases and building blocks)

下降到顶部 (Going down to get to the top)

进入量子宇宙 (Into the quantumverse)

一种计算梯度的新方法 (A new way to compute gradients)

建立量子神经网络 (Building a quantum neural network)

结语 (Wrapping up)

你可能感兴趣的:(神经网络,机器学习,人工智能,深度学习,python)

机器学习量子_量子机器学习：神经网络学习