glsl 指定片段深度
by Emil Wallner
埃米尔·沃尔纳(Emil Wallner)
In this article, we’ll explore six snippets of code that made deep learning what it is today. We’ll cover the inventors and the background to their breakthroughs. Each story includes simple code samples on FloydHub and GitHub to play around with.
在本文中,我们将探讨六个片段,这些片段使深度学习今天的内容成为可能。 我们将介绍发明人及其突破的背景。 每个故事都包含FloydHub和GitHub上的简单代码示例, 供您试用 。
If this is your first encounter with deep learning, I’d suggest reading my Deep Learning 101 for Developers.
如果这是您第一次接触深度学习,建议您阅读我的《面向开发人员的深度学习101》 。
To run the code examples on FloydHub, install the floydcommand line tool. Then clone the code examples I’ve provided to your local machine.
要在FloydHub上运行代码示例,请安装floydcommand line工具 。 然后克隆我提供给本地计算机的代码示例 。
Note: If you are new to FloydHub, you might want to first read the getting started with FloydHub section in my earlier post.
注意:如果您是FloydHub的新手,则可能需要先阅读我以前的文章中的FloydHub入门部分。
Initiate the CLI in the example project folder on your local machine. Now you can spin up the project on FloydHub with the following command:
在本地计算机上的示例项目文件夹中启动CLI。 现在,您可以使用以下命令在FloydHub上启动项目:
floyd run --data emilwallner/datasets/mnist/1:mnist --tensorboard --mode jupyter
Deep learning started with a snippet of math.
深度学习从一小段数学开始。
I’ve translated it into Python:
我已经将其翻译成Python:
# y = mx + b# m is slope, b is y-interceptdef compute_error_for_line_given_points(b, m, coordinates): totalError = 0 for i in range(0, len(coordinates)): x = coordinates[i][0] y = coordinates[i][1] totalError += (y - (m * x + b)) ** 2 return totalError / float(len(coordinates))# example compute_error_for_line_given_points(1, 2, [[3,6],[6,9],[12,18]])
This was first published by Adrien-Marie Legendre in 1805. He was a Parisian mathematician who was also known for measuring the meter.
该书最初由Adrien-Marie Legendre于1805年出版。他是巴黎的数学家,以测量电表而闻名。
He had a particular obsession with predicting the future location of comets. He had the locations of a couple of past comets. He was relentless as he used them in his search for a method to calculate their trajectory.
他对预测彗星的未来位置特别着迷。 他曾经有过两次彗星的位置。 当他使用它们搜索方法来计算它们的轨迹时,他不懈地努力。
It really was one of those spaghetti-on-the-wall moments. He tried several methods, then one version finally stuck with him.
这确实是那些意大利面条式的时刻之一。 他尝试了几种方法,然后终于有了一个版本。
Legendre’s process started by guessing the future location of a comet. Then he squared the errors he made, and finally remade his guess to reduce the sum of the squared errors. This was the seed for linear regression.
Legendre的过程始于猜测彗星的未来位置。 然后,他对所犯的错误求平方,最后重新进行猜测,以减少平方误差的总和。 这是线性回归的种子。
Play with the above code in the Jupyter notebook I’ve provided to get a feel for it. m
is the coefficient and b
in the constant for your prediction, and the coordinates
are the locations of the comet. The goal is to find a combination of m
and b
where the error is as small as possible.
在我提供的Jupyter笔记本中使用上面的代码来体验一下。 m
是用于预测的系数和常数中的b
, coordinates
是彗星的位置。 目的是找到误差尽可能小的m
和b
的组合。
This is the core of deep learning:
这是深度学习的核心:
Legendre’s method of manually trying to reduce the error rate was time-consuming. Peter Debye was a Nobel prize winner from The Netherlands. He formalized a solution for this process a century later in 1909.
勒让德勒(Legendre)手动尝试降低错误率的方法非常耗时。 彼得·德拜(Peter Debye)是荷兰的诺贝尔奖获得者。 一个世纪后的1909年,他为这一过程确定了解决方案 。
Let’s imagine that Legendre had one parameter to worry about — we’ll call it X
. The Y
axis represents the error value for each value of X
. Legendre was searching for where X
results in the lowest error.
假设Legendre有一个参数值得担心-我们将其称为X
Y
轴代表X
每个值的误差值。 Legendre正在寻找X
导致最小错误的位置。
In this graphical representation, we can see that the value of X
that minimizes the error Y
is when X = 1.1
.
在此图形表示中,我们可以看到使误差Y
最小的X
值是X = 1.1
。
Peter Debye noticed that the slope to the left of the minimum is negative, while it’s positive on the other side. Thus, if you know the value of the slope at any given X
value, you can guide Y
towards its minimum.
彼得·德拜(Peter Debye)注意到,最小值左侧的斜率为负,而另一侧的斜率为正。 因此,如果您知道任意给定X
值处的斜率值,则可以将Y
引向其最小值。
This led to the method of gradient descent. The principle is used in almost every deep learning model.
这导致了梯度下降的方法。 几乎每个深度学习模型都使用该原理。
To play with this, let’s assume that the error function is Error = x⁵ -2x³-2
. To know the slope of any given X
value we take its derivative, which is 5x⁴ - 6x²
:
为此,我们假设误差函数为Error = x⁵ -2x³-2
。 要知道,任何给定的斜率X
我们把它的衍生物,它是价值5x⁴ - 6x²
:
Watch Khan Academy’s video if you need to brush up your knowledge on derivatives.
如果需要重新掌握衍生产品知识,请观看可汗学院的视频 。
Debye’s math translated into Python:
德拜的数学翻译成Python:
current_x = 0.5 # the algorithm starts at x=0.5learning_rate = 0.01 # step size multipliernum_iterations = 60 # the number of times to train the function
#the derivative of the error function (x**4 = the power of 4 or x^4) def slope_at_given_x_value(x): return 5 * x**4 - 6 * x**2
# Move X to the right or left depending on the slope of the error functionfor i in range(num_iterations): previous_x = current_x current_x += -learning_rate * slope_at_given_x_value(previous_x) print(previous_x)
print("The local minimum occurs at %f" % current_x)
The trick here is the learning_rate
. By going in the opposite direction of the slope it approaches the minimum. Additionally, the closer it gets to the minimum, the smaller the slope gets. This reduces each step as the slope approaches zero.
这里的窍门是learning_rate
。 通过沿倾斜的相反方向走,它接近最小值。 另外,越接近最小值,斜率就越小。 当斜率接近零时,这减少了每个步骤。
num_iterations
is your estimated time of iterations before you reach the minimum. Play with the parameters it to get an intuition for gradient descent.
num_iterations
是您达到最小值之前的估计迭代时间。 玩弄它的参数以获得对梯度下降的直觉。
Combining the method of least square and gradient descent you get linear regression. In the 1950s and 1960s, a group of experimental economists implemented versions of these ideas on early computers. The logic was implemented on physical punch cards — truly handmade software programs. It took several days to prepare these punch cards and up to 24 hours to run one regression analysis through the computer.
结合最小二乘和梯度下降的方法,可以得到线性回归。 在1950年代和1960年代,一群实验经济学家在早期计算机上实现了这些思想的版本。 该逻辑在物理打Kong卡上实现-真正的手工软件程序。 准备这些打Kong卡花了几天时间,最多花了24小时才能通过计算机运行一次回归分析。
Here’s a linear regression example translated into Python so that you don’t have to do it in punch cards:
这是翻译成Python的线性回归示例,因此您无需在打Kong卡中进行操作:
#Price of wheat/kg and the average price of breadwheat_and_bread = [[0.5,5],[0.6,5.5],[0.8,6],[1.1,6.8],[1.4,7]]
def step_gradient(b_current, m_current, points, learningRate): b_gradient = 0 m_gradient = 0 N = float(len(points)) for i in range(0, len(points)): x = points[i][0] y = points[i][1] b_gradient += -(2/N) * (y - ((m_current * x) + b_current)) m_gradient += -(2/N) * x * (y - ((m_current * x) + b_current)) new_b = b_current - (learningRate * b_gradient) new_m = m_current - (learningRate * m_gradient) return [new_b, new_m]
def gradient_descent_runner(points, starting_b, starting_m, learning_rate, num_iterations): b = starting_b m = starting_m for i in range(num_iterations): b, m = step_gradient(b, m, points, learning_rate) return [b, m]
gradient_descent_runner(wheat_and_bread, 1, 1, 0.01, 100)
This should not introduce anything new. However, it can be a bit of a mind boggle to merge the error function with gradient descent. Run the code and play around with this linear regression simulator.
这不应引入任何新内容。 但是,将误差函数与梯度下降合并可能会有点麻烦。 运行代码并使用此线性回归模拟器进行测试 。
Enter Frank Rosenblatt — the guy who dissected rat brains during the day and searched for signs of extraterrestrial life at night. In 1958, he hit the front page of New York Times: “New Navy Device Learns By Doing” with a machine that mimics a neuron.
输入弗兰克·罗森布拉特(Frank Rosenblatt),他是白天解剖大鼠大脑并在夜间寻找外星生命迹象的人。 1958年,他用模仿神经元的机器登上《纽约时报》的头版:“ 新海军做事学得 ”。
If you showed Rosenblatt’s machine 50 sets of two images, one with a mark to the left and the other on the right, it could make the distinction without being pre-programmed. The public got carried away with the possibilities of a true learning machine.
如果您向罗森布拉特的机器展示了50套两幅图像,一个图像在左侧,而另一个在右侧,则无需预先编程就可以区分。 公众对真正的学习机器的可能性感到迷恋。
For every training cycle, you start with input data to the left. Initial random weights are added to all the input data. They are then summed up. If the sum is negative, it’s translated into 0
, otherwise, it’s mapped into a 1
.
对于每个训练周期,都从左侧的输入数据开始。 初始随机权重将添加到所有输入数据中。 然后将它们汇总。 如果总和为负数,则将其转换为0
,否则将其映射为1
。
If the prediction is correct, then nothing happens to the weights in that cycle. If it’s wrong, you multiply the error with a learning rate. This adjusts the weights accordingly.
如果预测是正确的,则该周期中的权重不会发生任何变化。 如果错误,则将错误乘以学习率。 这将相应地调整权重。
Let’s run the perceptron with the classic OR logic.
让我们使用经典的OR逻辑运行感知器 。
The perceptron machine translated into Python:
感知器机器翻译成Python:
from random import choice from numpy import array, dot, random 1_or_0 = lambda x: 0 if x < 0 else 1 training_data = [ (array([0,0,1]), 0), (array([0,1,1]), 1), (array([1,0,1]), 1), (array([1,1,1]), 1), ] weights = random.rand(3) errors = [] learning_rate = 0.2 num_iterations = 100
for i in range(num_iterations): input, truth = choice(training_data) result = dot(weights, input) error = truth - 1_or_0(result) errors.append(error) weights += learning_rate * error * input for x, _ in training_data: result = dot(x, w) print("{}: {} -> {}".format(input[:2], result, 1_or_0(result)))
In 1969, Marvin Minsky and Seymour Papert destroyed the idea. At the time, Minsky and Papert ran the AI lab at MIT. They wrote a book proving that the perceptron could only solve linear problems. They also debunked claims about the multi-layer perceptron. Sadly, Frank Rosenblatt died in a boat accident two years later.
1969年,Maven·明斯基(Marvin Minsky)和西摩·帕特(Seymour Papert)取消了这个主意 。 当时,Minsky和Papert在MIT负责AI实验室。 他们写了一本书,证明感知器只能解决线性问题。 他们还揭穿了有关多层感知器的主张。 可悲的是,弗兰克·罗森布拉特(Frank Rosenblatt)两年后死于船祸。
In 1970 a Finnish master student, discovered the theory to solve non-linear problems with multi-layered perceptrons. Because of the mainstream criticism of the perceptron, the funding of AI dried up for more than a decade. This was known as the first AI winter.
1970年,芬兰的一名硕士生发现了解决多层感知器非线性问题的理论 。 由于对感知器的主流批评,人工智能的资金枯竭了十多年。 这被称为第一个AI冬季。
The power of Minsky and Papert’s critique was the XOR problem. The logic is the same as the OR logic with one exception — when you have two true statements (1 & 1), you return False (0).
Minsky和Papert的批评之力是XOR问题。 该逻辑与OR逻辑相同,但有一个例外-当您有两个true语句(1&1)时,返回False(0)。
In the OR logic, it’s possible to divide the true combination from the false ones. But as you can see, you can’t divide the XOR logic with one linear function.
在“或”逻辑中,可以将真实组合与错误组合相除。 但是正如您所看到的,您不能将XOR逻辑除以一个线性函数。
By 1986, several experiments proved that neural networks could solve complex nonlinear problems. At the time, computers were 10,000 times faster compared to when the theory was developed. This is how Rumelhart introduced the legendary paper:
到1986年,一些实验证明神经网络可以解决复杂的非线性问题。 当时,计算机的速度是理论发展速度的10,000倍。 Rumelhart就是这样介绍传奇论文的:
We describe a new learning procedure, back-propagation, for networks of neuron-like units. The procedure repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector. As a result of the weight adjustments, internal ‘hidden’ units which are not part of the input or output come to represent important features of the task domain, and the regularities in the task are captured by the interactions of these units. The ability to create useful new features distinguishes back-propagation from earlier, simpler methods such as the perceptron-convergence procedure” — Nature 323, 533–536 (09 October 1986)
我们描述了一种新的学习程序,即反向传播,用于神经元样单元的网络。 该过程反复调整网络中连接的权重,以最小化对网络的实际输出矢量和所需输出矢量之间的差异的度量。 权重调整的结果是,内部“隐藏”单元(它们不是输入或输出的一部分)开始代表任务域的重要特征,并且任务中的规律性通过这些单元的交互作用来捕获。 创建有用的新功能的能力使反向传播与早期,更简单的方法(例如感知器收敛过程)区分开来。— Nature 323,533–536(1986年10月9日)
To understand the core of this paper, we’ll code the implementation by DeepMind’s Andrew Trask. This is not a random snippet of code. It’s been used in Andrew Karpathy’s deep learning course at Stanford and Siraj Raval’s Udacity course. It solves the XOR problem, thawing the first AI winter.
为了理解本文的核心,我们将对DeepMind的Andrew Trask的实现进行编码。 这不是随机的代码片段。 在斯坦福大学的安德鲁·卡帕蒂(Andrew Karpathy)的深度学习课程和西拉杰·拉瓦尔(Siraj Raval)的Udacity课程中,都使用了它。 它解决了XOR问题,解冻了第一个AI冬季。
Before we dig into the code, play with this simulator for one to two hours to grasp the core logic. Then read Trask’s blog post.
在深入研究代码之前,请使用此模拟器玩一到两个小时,以掌握核心逻辑。 然后阅读Trask的博客文章 。
Note that the added parameter [1]
in the X_XOR
data are bias neurons.
注意,在X_XOR
数据中添加的参数[1]
是偏向神经元 。
They have the same behavior as a constant in a linear function:
它们具有与线性函数中的常量相同的行为:
import numpy as np
X_XOR = np.array([[0,0,1], [0,1,1], [1,0,1],[1,1,1]]) y_truth = np.array([[0],[1],[1],[0]])
np.random.seed(1)syn_0 = 2*np.random.random((3,4)) - 1syn_1 = 2*np.random.random((4,1)) - 1
def sigmoid(x): output = 1/(1+np.exp(-x)) return outputdef sigmoid_output_to_derivative(output): return output*(1-output)
for j in range(60000): layer_1 = sigmoid(np.dot(X_XOR, syn_0)) layer_2 = sigmoid(np.dot(layer_1, syn_1)) error = layer_2 - y_truth layer_2_delta = error * sigmoid_output_to_derivative(layer_2) layer_1_error = layer_2_delta.dot(syn_1.T) layer_1_delta = layer_1_error * sigmoid_output_to_derivative(layer_1) syn_1 -= layer_1.T.dot(layer_2_delta) syn_0 -= X_XOR.T.dot(layer_1_delta) print("Output After Training: \n", layer_2)
Backpropagation, matrix multiplication, and gradient descent combined can be hard to wrap your mind around. The visualizations of this process is often a simplification of what’s going on behind the hood. Focus on understanding the logic behind it, but don’t worry too much of having a mental picture of it.
反向传播,矩阵乘法和梯度下降相结合可能很难引起您的注意。 该过程的可视化通常是对引擎盖背后发生的事情的简化。 专注于理解其背后的逻辑,但不要太担心它的心理状况。
Also, look at Andrew Karpathy’s lecture on backpropagation, play with these visualizations, and read Michael Nielsen’s chapter on it.
另外,请参阅安德鲁·卡帕蒂(Andrew Karpathy)关于反向传播的讲座 ,并使用这些可视化效果 ,并阅读迈克尔·尼尔森(Michael Nielsen)上的章节 。
Deep neural networks are neural networks with more than one layer between the input and output layer. The notion was introduced by Rina Dechter in 1986. But it didn’t gain mainstream attention until 2012. This was soon after IBM Watson’s Jeopardy victory and Google’s cat recognizer.
深度神经网络是在输入和输出层之间具有一层以上的神经网络。 这个概念由Rina Dechter于1986年提出 。但是直到2012年才引起主流关注。 这是IBM沃森(Watson)的Jeopardy胜利和Google的猫识别器之后不久。
The core structure of deep neural network have stayed the same. But they are now applied to several different problems. There have also been a lot of improvement in regularization.
深度神经网络的核心结构保持不变。 但是,现在它们已应用于几个不同的问题。 正则化方面也有很多改进。
In 1963, it was a set of math functions to simplify noisy earth data. They are now used in neural networks to improve their ability to generalize.
1963年,它是一组数学函数来简化嘈杂的地球数据 。 它们现在被用于神经网络以提高其泛化能力。
A large share of the innovation is due to computing power. This improved researcher’s innovation cycles — what took a supercomputer one year to calculate in the mid-eighties takes half a second with today’s GPU technology.
创新的很大一部分归功于计算能力。 改进了研究人员的创新周期-八十年代中期,超级计算机花了一年的时间进行计算,而今天的GPU技术却需要半秒的时间。
The reduced cost in computing and the development of deep learning libraries have now made it accessible to the general public. Let’s look at an example of a common deep learning stack, starting from the bottom layer:
计算成本的降低和深度学习库的发展现已使公众可以使用。 让我们看一个常见的深度学习堆栈的示例,从底层开始:
GPU > Nvidia Tesla K80. The hardware commonly used for graphics processing. Compared to CPUs, they are on average 50–200 times faster for deep learning.
CUDA > low level programming language for the GPUs
CUDA >适用于GPU的低级编程语言
CuDNN > Nvidia’s library to optimize CUDA
CuDNN > Nvidia的库可优化CUDA
Tensorflow > Google’s deep learning framework on top of CuDNN
Tensorflow > 基于CuDNN的 Google深度学习框架
TFlearn > A front-end framework for Tensorflow
TFlearn > Tensorflow的前端框架
Let’s have a look at the MNIST image classification of digits, the “Hello World” of deep learning.
让我们看一下数字的MNIST图像分类,即深度学习的“ Hello World”。
Implemented in TFlearn:
在TFlearn中实现:
from __future__ import division, print_function, absolute_importimport tflearnfrom tflearn.layers.core import dropout, fully_connectedfrom tensorflow.examples.tutorials.mnist import input_datafrom tflearn.layers.conv import conv_2d, max_pool_2dfrom tflearn.layers.normalization import local_response_normalizationfrom tflearn.layers.estimator import regression
# Data loading and preprocessingmnist = input_data.read_data_sets("/data/", one_hot=True)X, Y, testX, testY = mnist.train.images, mnist.train.labels, mnist.test.images, mnist.test.labelsX = X.reshape([-1, 28, 28, 1])testX = testX.reshape([-1, 28, 28, 1])
# Building convolutional networknetwork = tflearn.input_data(shape=[None, 28, 28, 1], name='input')network = conv_2d(network, 32, 3, activation='relu', regularizer="L2")network = max_pool_2d(network, 2)network = local_response_normalization(network)network = conv_2d(network, 64, 3, activation='relu', regularizer="L2")network = max_pool_2d(network, 2)network = local_response_normalization(network)network = fully_connected(network, 128, activation='tanh')network = dropout(network, 0.8)network = fully_connected(network, 256, activation='tanh')network = dropout(network, 0.8)network = fully_connected(network, 10, activation='softmax')network = regression(network, optimizer='adam', learning_rate=0.01, loss='categorical_crossentropy', name='target')
# Trainingmodel = tflearn.DNN(network, tensorboard_verbose=0)model.fit({'input': X}, {'target': Y}, n_epoch=20, validation_set=({'input': testX}, {'target': testY}), snapshot_step=100, show_metric=True, run_id='convnet_mnist')
There are plenty of great articles explaining the MNIST problem: here and here.
有很多很棒的文章解释MNIST问题: 在这里和这里 。
As you see in the TFlearn example, the main logic of deep learning is still similar to Rosenblatt’s perceptron. Instead of using a binary Heaviside step function, today’s networks mostly use Relu (Rectifier linear unit) activations.
正如您在TFlearn示例中看到的那样,深度学习的主要逻辑仍然类似于Rosenblatt的感知器。 如今的网络大多不使用二进制的Heaviside步进函数,而是使用Relu(整流器线性单元)激活。
In the last layer of the convolutional neural network, loss equals categorical_crossentropy
. This is an evolution of Legendre’s least square, a logistical regression for multiple categories. The optimizer adam
originates from the work of Debye’ gradient descent.
在卷积神经网络的最后一层,损耗等于categorical_crossentropy
。 这是Legendre最小二乘的演变,它是多个类别的逻辑回归。 优化器adam
源自Debye的梯度下降工作。
Tikhonov’s regularization notion is widely implemented in the form of dropout layers and regularization functions, L1/L2
.
Tikhonov的正则化概念以丢包层和正则化函数L1/L2
的形式广泛实现。
If you want a better intuition for neural networks and how to implement them, read my previous post: Deep Learning 101 for Coders.
如果您想更好地了解神经网络以及如何实现它们,请阅读我以前的文章: 《面向程序员的深度学习101》。
Thanks to Ignacio Tonoli de Maussion, Brian Young, Paal Rgd, Tomas Moška, and Charlie Harrington for reading drafts of this. Code sources are included in the Jupyter notebooks.
感谢Ignacio Tonoli de Maussion ,Brian Young, Paal Rgd , TomasMoška和Charlie Harrington阅读了此草稿。 代码源包含在Jupyter笔记本中。
This is a part of a multi-part blog series as I learn deep learning. I’ve spent a decade exploring human learning. I worked for Oxford’s business school, invested in education startups, and built an education technology business. Last year, I enrolled at Ecole 42 to apply my knowledge of human learning to machine learning.
在我学习深度学习时,这是一个由多个部分组成的博客系列的一部分。 我花了十年的时间探索人类学习。 我在牛津大学商学院工作,投资了教育初创公司,并建立了教育技术公司。 去年,我参加了42大学 ,将我的人类学习知识应用到机器学习中。
You can follow along my learning journey on Twitter. If you have any questions/suggestions please leave a comment below or ping me on Medium.
您可以在Twitter上关注我的学习历程。 如果您有任何问题/建议请在下方留言,或ping我的中号 edium。
This was first published as a community post on Floydhub’s blog.
它最初是作为社区帖子发布在Floydhub的博客上的。
翻译自: https://www.freecodecamp.org/news/the-history-of-deep-learning-explored-through-6-code-snippets-d0a0e8545202/
glsl 指定片段深度