机器学习 量子
My last articles tackled Bayes nets on quantum computers (read it here!), and k-means clustering, our first steps into the weird and wonderful world of quantum machine learning.
我的最后一篇文章讨论了量子计算机上的贝叶斯网络( 在这里阅读!)和k-means聚类,这是我们进入量子机器学习的怪异世界的第一步。
This time, we’re going a little deeper into the rabbit hole and looking at how to build a neural network on a quantum computer.
In case you aren’t up to speed on neural nets, don’t worry — we’re starting with neural nets 101.
什么是(经典的)神经网络? (What even is a (classical) neural network?)
Almost everyone has heard of neural networks — they’re used to run some of the coolest tech we have today — self driving cars, voice assistants, and even the software that generates super realistic pictures of famous people doing questionable things.
What makes them different from regular algorithms is that instead of having to write down a set of rules, we need to provide networks with examples of the problem we want it to solve.
We could feed a network with some data from the IRIS data set, which contains information about three kinds of flowers, and it might guess which kind of flower it is:
So now we know what neural networks do — but how do they do it?
重量,偏见和基石 (Weight, biases and building blocks)
Neural networks are made up of many small units called neurons, which look like this:
Most neurons take multiple numeric inputs (the blue circles), and multiply each one of them by a weight (the wᵢs) that represent how important each input is. The larger the magnitude of a weight, the more important the associated input is.
大多数神经元接受多个数字输入(蓝色圆圈),然后将每个数字与一个权重(wᵢs)相乘,代表每个输入的重要性。 权重的大小越大,关联的输入就越重要。
The bias is treated like another weight, only that the input it multiplies always has a value of 1. When we add up all the weighted inputs, we get the activation value of the neuron, represented by the purple circle in the picture above:
The activation value is then passed through a function (the blue rectangle), and the result is the output of the neuron:
We can change a neuron’s behavior by changing the function it uses to transform its activation value — for example, we could use a super simple transformation, like this one:
In practice, however, we use more complex ones, like the sigmoid function:
How are neurons useful?
They can make decisions based on the inputs they receive — for example, we could use a neuron to predict whether we’ll eat pizza or pasta the next time we eat out at an Italian place by feeding it the answers to three questions:
- Do I like the pasta at this restaurant? 我喜欢这家餐厅的面食吗?
- Does the restaurant have pesto sauce? 该餐厅有香蒜酱吗?
- Does the restaurant have a triple cheese pizza? 这家餐厅有三层奶酪披萨吗?
Putting aside possible health concerns, let’s see what the neuron might look like — we can encode the inputs using 0 to represent no, 1 to represent yes, and do the same with the outputs by mapping 0 and 1 to pasta and pizza respectively:
Let’s use the step function to transform the neuron’s activation value:
Just using one neuron, we can capture multiple decision making behaviours:
- If we like the pasta at a restaurant, we choose to order pasta unless pesto sauce is out and they serve a triple cheese pizza. 如果我们喜欢在餐厅吃意大利面,我们选择点意大利面,除非没有香蒜酱,并且可以提供三层奶酪比萨。
- If we don’t like the pasta at a restaurant, we order a pizza unless pesto sauce is available, and the triple cheese pizza is not. 如果我们不喜欢在餐厅吃意大利面,我们会点披萨,除非有香蒜酱可用,而三层奶酪披萨则没有。
We can also do things the other way — we can program a neuron so that it corresponds to a specific set of preferences.
If all we wanted to do was predict what we would eat the next time we go out, it would probably be easy to figure out a set of weights and biases for one neuron, but what if we had to do the same with a full-sized network?
It would probably take a while.
Fortunately, instead of guessing the values of the weights we need, we can create algorithms that change the parameters of a network — the weights, biases, and even the structure — so that it can learn a solution for a problem we want to solve.
下降到顶部 (Going down to get to the top)
Ideally, a network’s prediction would be the same as the label associated with the input we feed it — so the smaller the difference between the prediction and actual output, the better the set of weights the network has learned.
We quantify this difference using a loss function, which can take any form we want, like this one, which is called the quadratic loss function:
y(x) is the desired output and o(x, θ) is the network’s output when fed data x with parameters θ— since the loss is always non negative, once it takes on values close to 0, we know that the network has learned a good parameter set. Of course, there are other problems that can crop up, like over-fitting, but we’ll ignore those for now.
y(x)是期望的输出,而o(x,θ)是在馈送带有参数θ的数据x时网络的输出-由于损耗始终为非负值,一旦取值接近于0,我们就知道网络具有学习了一个好的参数集。 当然,还会出现其他问题,例如过度拟合,但我们暂时将其忽略。
Using the loss function, we can figure out what the best parameter set for our network is:
So instead of guessing weights, all we need to do is minimize C with respect to parameters θ — which we can do using a technique called gradient descent:
因此,除了猜测权重之外,我们要做的就是相对于参数θ最小化C ,我们可以使用称为“梯度下降”的技术来做到这一点:
All we’re doing here is looking at how the loss changes if we increase the value of θᵢ, and then updating θᵢ so that the loss decreases by a little bit. η is a small number which controls by how much we change θᵢ every time we update it.
我们在这里所做的只是查看如果增加θᵢ的值后损耗如何变化,然后更新θᵢ以使损耗稍微降低。 η是一个很小的数字,它控制着每次更新θᵢ时会改变多少。
Why do we need η to be small? We could just adjust it so that the loss on the current x is close to zero after just one update— most times this is not a great idea, because while it would reduce the loss on the current x, it often leads to much worse performance on all the other data samples we feed to the network.
为什么我们需要η小? 我们可以对其进行调整,以使一次更新后当前x的损失接近零-多数情况下,这不是一个好主意,因为虽然这会减少当前x的损失,但通常会导致更差的性能在所有其他数据样本上,我们将其馈送到网络。
Now that we’ve got the basics down, let’s figure out how to build a quantum neural network.
进入量子宇宙 (Into the quantumverse)
The quantum neural net we’ll be building doesn’t work the exact same way as the classical networks we’ve worked on so far—instead of using neurons with weights and biases, we encode the input data into a bunch of qubits, apply a sequence of quantum gates, and change the gate parameters to minimize a loss function:
While that might sound new, the idea is still the same — change the parameter set to minimize the difference between the network predictions and input labels.
To keep things simple, we’ll be building a binary classifier — meaning that every data point fed to the network has to have an associated label of either 0 or 1.
How does it work?
We start by feeding some data x to the network, which is passed through a feature map — a function that transforms the input data into a form we can use to create the input quantum state:
The feature map we use can look like almost anything — here’s one that takes in a two dimensional vector x, and spits out an angle:
Once x is encoded as a quantum state, we apply a series of quantum gates:
The network output, which we’ll call π(x, θ), is the probability of the last qubit being measured as in the |1〉 state, plus a bias term that is added classically:
网络输出,我们称之为π( x , θ)是在| 1〉状态下测量最后一个qubit的概率,加上经典添加的偏置项:
The Zₙ ₋ ₁ stands for a Z gate applied to the last qubit.
Finally, we take the output and the label associated with x, and use them to compute the loss over the sample — we’ll use the same quadratic loss from above. The cost over the entire data set X we feed the network then becomes:
最后,我们获取与x关联的输出和标签,并使用它们来计算样本的损失-我们将从上方使用相同的二次损失。 然后,我们为网络提供的整个数据集X的成本变为:
The prediction of the network p can be obtained from the output:
Now all we need to do is to figure out how to compute the gradients of the loss function l(x, θ). While we could do it classically, that would be boring — what we need is a way to compute them on a quantum computer.
现在,我们要做的就是弄清楚如何计算损失函数l ( x ,θ)的梯度。 尽管我们可以经典地做到这一点,但这将很无聊–我们需要的是一种在量子计算机上进行计算的方法。
一种计算梯度的新方法 (A new way to compute gradients)
Let’s start by differentiating the loss function with respect to a parameter θᵢ:
Let’s expand the last term:
We can quickly get rid of the constant terms — and in the case that θᵢ = b, we know that the gradient is simply 1:
我们可以快速摆脱常数项-并且在θᵢ= b的情况下,我们知道梯度只是1:
Now, using the product rule, we can expand further:
That probably looks a little painful to read — but thanks to the Hermitian conjugate (the †), this has a concise representation:
Since U(θ) is made up of multiple gates, each of them controlled by a different parameter (or sets of parameters), finding the partial derivative of U only involves differentiating the gate Uᵢ(θᵢ) that is dependent on θᵢ:
由于U (θ)由多个门组成,每个门由不同的参数(或一组参数)控制,因此找到U的偏导数仅涉及区分门Uᵢ (θᵢ) 取决于θᵢ:
This is where the form we choose for each Uᵢ becomes important. We’ll use the same form for every Uᵢ, which we’ll call a G gate — the choice of form is arbitrary, so you could use any other form you can think of instead:
在这里,我们为每个Uᵢ选择的形式变得很重要。 我们将对每个Uᵢ使用相同的形式,我们将其称为G门-形式的选择是任意的,因此您可以使用可以想到的任何其他形式来代替:
Now that we know what each Uᵢ looks like, we can find its derivative:
现在我们知道每个Uᵢ的样子, 我们可以找到它的派生词:
Lucky for us, we can express this in terms of the G gate:
对我们来说很幸运,我们可以用G来表示 门:
So all that’s left is to figure out how to create a circuit that gives us the inner product form we need:
The easiest way to get a measurable that is proportional to this is to use the Hadamard test — first, we prepare the input quantum state and push an ancilla into superposition:
获得与之成比例的可测量值的最简单方法是使用Hadamard检验 -首先,我们准备输入量子态并将辅助子项推入叠加状态:
Now apply Zₙ ₋ ₁B onto ψ, conditioned on the ancilla being in the 1 state:
现在,将ZₙB₁B应用于 ψ,条件是附加柱处于1状态:
Then flip the ancilla, and do the same with A:
Finally, apply another Hadamard gate onto the ancilla:
Now the probability of measuring the ancilla as 0 is
So if we substitute U(θ) for B, and a copy of U(θ) with Uᵢ swapped out for its derivative for A, then the probability of the ancilla qubit will give us the gradient of π(x, θ) with respect to θᵢ.
因此,如果我们用U (θ)代替B ,并且用Uᵢ的U (θ)的副本替换为A的导数, 那么辅助量子位的概率将给我们π( x , 相对于θᵢ。
We figured out a way to analytically compute gradients on a quantum computer — now all that’s left is to build our quantum neural network.
建立量子神经网络 (Building a quantum neural network)
Let’s import all the modules we need to kick things off:
from qiskit import QuantumRegister, ClassicalRegister
from qiskit import Aer, execute, QuantumCircuit
from qiskit.extensions import UnitaryGate
import numpy as np
Now let’s take a look at some of our data (you can get it right here!) — it’s a processed version of the IRIS data set, with one class removed:
We need to separate the features (the first four columns) from the labels:
data = np.genfromtxt("processedIRISData.csv", delimiter=",")
X = data[:, 0:4]
features = np.array([convertDataToAngles(i) for i in X])
Y = data[:, -1]
Now let’s build a function that will do the feature mapping for us.
Since the input vectors are normalized and 4 dimensional, there is a super simple option for the mapping — use 2 qubits to hold the encoded data, and use a mapping that just recreates the input vector as a quantum state.
For this we need two functions — one to extract angles from the vectors:
def convertDataToAngles(data):
Takes in a normalised 4 dimensional vector and returns
three angles such that the encodeData function returns
a quantum state with the same amplitudes as the
vector passed in.
prob1 = data[2] ** 2 + data[3] ** 2
prob0 = 1 - prob1
angle1 = 2 * np.arcsin(np.sqrt(prob1))
prob1 = data[3] ** 2 / prob1
angle2 = 2 * np.arcsin(np.sqrt(prob1))
prob1 = data[1] ** 2 / prob0
angle3 = 2 * np.arcsin(np.sqrt(prob1))
return np.array([angle1, angle2, angle3])
Another to convert the angles we get into a quantum state:
def encodeData(qc, qreg, angles):
Given a quantum register belonging to a quantum
circuit, performs a series of rotations and controlled
rotations characterized by the angles parameter.
qc.ry(angles[0], qreg[1])
qc.cry(angles[1], qreg[1], qreg[0])
qc.cry(angles[2], qreg[1], qreg[0])
This might seem a little confusing, but understanding how it works isn’t essential to building the QNN — you can read up on it here if you like.
Now we can write the functions we need to implement U(θ), which will take the form of alternating layers of RY and CX gates.
现在我们可以编写实现U (θ)所需的函数,该函数将采用RY和CX交替层的形式 盖茨。
Why do we need the CX layers?
If we didn’t include them, we wouldn’t be able to perform any entanglement operations, which would limit the area within the Hilbert space that our network can reach — using CX gates, the network can capture interactions between qubits that it wouldn't be able to without them.
We’ll start with the G gates:
def GGate(qc, qreg, params):
Given a parameter α, return a single
qubit gate of the form
[cos(α), sin(α)]
[-sin(α), cos(α)]
u00 = np.cos(params[0])
u01 = np.sin(params[0])
gateLabel = "G({})".format(
GGate = UnitaryGate(np.array(
[[u00, u01], [-u01, u00]]
), label=gateLabel)
return GGate
def GLayer(qc, qreg, params):
Applies a layer of GGates onto the qubits of register
qreg in circuit qc, parametrized by angles params.
for i in range(2):
qc.append(GGate(qc, qreg, params[i]), [qreg[i]])
Next, we’ll do the CX gates:
def CXLayer(qc, qreg, order):
Applies a layer of CX gates onto the qubits of register
qreg in circuit qc, with the order of application
determined by the value of the order parameter.
if order:
qc.cx(qreg[0], qreg[1])
qc.cx(qreg[1], qreg[0])
Now we put them together to get U(θ):
现在我们将它们放在一起以获得U (θ):
def generateU(qc, qreg, params):
Applies the unitary U(θ) to qreg by composing multiple
G layers and CX layers. The unitary is parametrized by
the array passed into params.
for i in range(params.shape[0]):
GLayer(qc, qreg, params[i])
CXLayer(qc, qreg, i % 2)
Next we create a function that allows us to get the output of the network, and another that converts those outputs into class predictions:
def getPrediction(qc, qreg, creg, backend):
Returns the probability of measuring the last qubit
in register qreg as in the |1⟩ state.
qc.measure(qreg[0], creg[0])
job = execute(qc, backend=backend, shots=10000)
results = job.result().get_counts()
if '1' in results.keys():
return results['1'] / 100000
return 0
def convertToClass(predictions):
Given a set of network outputs, returns class predictions
by thresholding them.
return (predictions >= 0.5) * 1
Now we can build a function that performs a forward pass on the network — feeds it some data, processes it, and gives us the network output:
def forwardPass(params, bias, angles, backend):
Given a parameter set params, input data in the form
of angles, a bias, and a backend, performs a full
forward pass on the network and returns the network
qreg = QuantumRegister(2)
anc = QuantumRegister(1)
creg = ClassicalRegister(1)
qc = QuantumCircuit(qreg, anc, creg)
encodeData(qc, qreg, angles)
generateU(qc, qreg, params)
pred = getPrediction(qc, qreg, creg, backend) + bias
return pred
After that, we can write all the functions we need to measure gradients — first, we need to be able to apply controlled versions of U(θ):
之后,我们可以编写测量梯度所需的所有功能-首先,我们需要能够应用U (θ)的受控版本:
def CGLayer(qc, qreg, anc, params):
Applies a controlled layer of GGates, all conditioned
on the first qubit of the anc register.
for i in range(2):
qc, qreg, params[i]
).control(1), [anc[0], qreg[i]])
def CCXLayer(qc, qreg, anc, order):
Applies a layer of Toffoli gates with the first
control qubit always being the first qubit of the anc
register, and the second depending on the value
passed into the order parameter.
if order:
qc.ccx(anc[0], qreg[0], qreg[1])
qc.ccx(anc[0], qreg[1], qreg[0])
def generateCU(qc, qreg, anc, params):
Applies a controlled version of the unitary U(θ),
conditioned on the first qubit of register anc.
for i in range(params.shape[0]):
CGLayer(qc, qreg, anc, params[i])
CCXLayer(qc, qreg, anc, i % 2)
Using this we can create a function that computes expectation values:
def computeRealExpectation(params1, params2, angles, backend):
Computes the real part of the inner product of the
quantum states produced by acting with U(θ)
characterised by two sets of parameters, params1 and
qreg = QuantumRegister(2)
anc = QuantumRegister(1)
creg = ClassicalRegister(1)
qc = QuantumCircuit(qreg, anc, creg)
encodeData(qc, qreg, angles)
generateCU(qc, qreg, anc, params1)
qc.cz(anc[0], qreg[0])
generateCU(qc, qreg, anc, params2)
prob = getPrediction(qc, anc, creg, backend)
return 2 * (prob - 0.5)
Now we can figure out the gradients of the loss function — the multiplication we do at the end is to account for the π(x, θ) - y(x) term in the gradient:
现在我们可以计算出损失函数的梯度-最后我们要做的乘法是解决梯度中的π ( x ,θ)-y( x )项:
def computeGradient(params, angles, label, bias, backend):
Given network parameters params, a bias bias, input data
angles, and a backend, returns a gradient array holding
partials with respect to every parameter in the array
prob = forwardPass(params, bias, angles, backend)
gradients = np.zeros_like(params)
for i in range(params.shape[0]):
for j in range(params.shape[1]):
newParams = np.copy(params)
newParams[i, j, 0] += np.pi / 2
gradients[i, j, 0] = computeRealExpectation(
params, newParams, angles, backend
newParams[i, j, 0] -= np.pi / 2
biasGrad = (prob + bias - label)
return gradients * biasGrad, biasGrad
Once we have the gradients, we can update the network parameters using gradient descent, along with a trick called momentum, which helps speed up training times:
def updateParams(params, prevParams, grads, learningRate, momentum):
Updates the network parameters using gradient descent
and momentum.
delta = params - prevParams
paramsNew = np.copy(params)
paramsNew = params - grads * learningRate + momentum * delta
return paramsNew, params
Now we can build our cost and accuracy functions so we can see how our network is responding to training:
def cost(labels, predictions):
Returns the sum of quadratic losses over the set
(labels, predictions).
loss = 0
for label, pred in zip(labels, predictions):
loss += (pred - label) ** 2
return loss / 2
def accuracy(labels, predictions):
Returns the percentage of correct predictions in the
set (labels, predictions).
acc = 0
for label, pred in zip(labels, predictions):
if label == pred:
acc += 1
return acc / labels.shape[0]
Finally, we create the function that trains the network, and call it:
def trainNetwork(data, labels, backend):
Train a quantum neural network on inputs data and
labels, using backend backend. Returns the parameters
numSamples = labels.shape[0]
numTrain = int(numSamples * 0.75)
ordering = np.random.permutation(range(numSamples))
trainingData = data[ordering[:numTrain]]
validationData = data[ordering[numTrain:]]
trainingLabels = labels[ordering[:numTrain]]
validationLabels = labels[ordering[numTrain:]]
params = np.random.sample((5, 2, 1))
bias = 0.01
prevParams = np.copy(params)
prevBias = bias
batchSize = 5
momentum = 0.9
learningRate = 0.02
for iteration in range(15):
samplePos = iteration * batchSize
batchTrainingData = trainingData[samplePos:samplePos + 5]
batchLabels = trainingLabels[samplePos:samplePos + 5]
batchGrads = np.zeros_like(params)
batchBiasGrad = 0
for i in range(batchSize):
grads, biasGrad = computeGradient(
params, batchTrainingData[i], batchLabels[i], bias, backend
batchGrads += grads / batchSize
batchBiasGrad += biasGrad / batchSize
params, prevParams = updateParams(
params, prevParams, batchGrads, learningRate, momentum
temp = bias
bias += -learningRate * batchBiasGrad + momentum * (bias - prevBias)
prevBias = temp
trainingPreds = np.array([forwardPass(
params, bias, angles, backend
) for angles in trainingData])
print('Iteration {} | Loss: {}'.format(
iteration + 1, cost(trainingLabels, trainingPreds)
validationProbs = np.array(
params, bias, angles, backend
) for angles in validationData]
validationClasses = convertToClass(validationProbs)
validationAcc = accuracy(validationLabels, validationClasses)
print('Validation accuracy:', validationAcc)
return params
backend = Aer.get_backend('qasm_simulator')
learnedParams = trainNetwork(features, Y, backend)
The numbers we pass into the np.random.sample() method determines the size of our parameter set — the first number (5) is the number of G layers we want.
This was the output I got after training a network with five layers for fifteen iterations:
Iteration 1 | Loss: 17.433085925400004
Iteration 2 | Loss: 16.29878057140824
Iteration 3 | Loss: 14.796300997002378
Iteration 4 | Loss: 13.45048890335602
Iteration 5 | Loss: 12.207399339199581
Iteration 6 | Loss: 11.203202358947257
Iteration 7 | Loss: 9.836832509742251
Iteration 8 | Loss: 8.901883213728054
Iteration 9 | Loss: 8.022787152763158
Iteration 10 | Loss: 7.408032981549452
Iteration 11 | Loss: 6.728295582051598
Iteration 12 | Loss: 6.193162047195093
Iteration 13 | Loss: 5.866241892968018
Iteration 14 | Loss: 5.445387724245562
Iteration 15 | Loss: 5.19377811976361
Validation accuracy: 1.0
Looks pretty good — we’ve achieved 100% accuracy on the validation set, meaning that the network generalised to unseen examples successfully!
结语 (Wrapping up)
So we built a quantum neural network —awesome!
There are a couple of ways we can maybe bring the loss further down — train the network for a few more iterations, or play with hyper-parameters like batch size and learning rate.
A cool way to take things forward would be to experiment with different gate selections for U(θ) — you might be able to find one that works a lot better!
一种前进的好方法是尝试对U (θ)使用不同的门选择-您可能会找到效果更好的选择!
You can grab the entire project here. If you have any questions, drop a comment here, or get in touch — I would be happy to help!
您可以在此处获取整个项目。 如果您有任何疑问,请在此处发表评论或与我们联系-我们将很乐意为您提供帮助!
翻译自: https://towardsdatascience.com/quantum-machine-learning-learning-on-neural-networks-fdc03681aed3
机器学习 量子