Non-data scientists often use the word Model to mistakenly refer to the what in Machine Learning we canonically call Model Architecture. And it couldn't be more wrong!
非数据科学家经常使用“ 模型 ”一词来错误地指代机器学习中我们通常所说的“模型体系结构”。 而且这没有错!
是否想偶尔听到有关Tensorflow,Keras,DeepLearning4J,Python和Java的抱怨? (Wanna hear occasional rants about Tensorflow, Keras, DeepLearning4J, Python and Java?)
Join me on twitter @ twitter.com/hudsonmendes!
加入我的Twitter @ twitter.com / hudsonmendes!
Taking Machine Learning models to production is a battle. And there I share my learnings (and my sorrows) there, so we can learn together!
将机器学习模型投入生产是一场战斗。 我在那里分享我的学习(和悲伤),所以我们可以一起学习!
The Model Architecture is very important to the Model. Without the variety of architectures we have we would never be able to fit data for the vastness of different problems that are currently being successfully solved by Deep Learning.
模型架构对模型非常重要。 如果没有各种各样的架构,我们将无法为深度学习当前成功解决的众多不同问题提供合适的数据。
However, the Weights are a vital part of the model too. Different Weights can describe functions that have absolutely different geometry.
但是, 权重也是该模型的重要组成部分 。 不同的权重可以描述具有完全不同的几何形状的函数。
Model = Architecture (a.k.a. algorithm) + Weights (a.k.a. parameters)
模型=体系结构(又名算法)+权重(又名参数)
Let's have a look at how it works:
让我们看一下它是如何工作的:
The same function "architecture" (or, "equation") f(x) = tanh(w × x) has produced very different material functions with different physical shapes, due to their different weights (or "coefficients").
相同的函数“体系结构”(或“等式”) f(x)= tanh(w×x)由于权重(或“系数”)不同而产生了具有完全不同物理形状的 材料函数 。
不当用语 (Inappropriate terminology)
In the statement above, I have used inappropriate terminology on purpose:
在以上声明中,我故意使用了不恰当的术语:
"architecture" was used to describe the way that the variables interact within the function;
“ 体系结构 ”用来描述变量在函数内的交互方式;
"physical shape” was used to describe something as immaterial as a curve in the cartesian plan, which could not be less physical.
“ 物理形状 ”用于描述在笛卡尔计划中不重要的东西,如曲线,它的物理性也不能少。
However, this inaccurate terminology sets out some conceptual approximations that I will now use in the Thought Experiment below.
但是,这种不准确的术语提出了一些概念上的近似,我现在将在下面的“思想实验”中使用它们。
思想实验:作为物理墙的功能 (Thought Experiment: A function as a physical wall)
Imagine that you have a large group of people in a garden. There are people with red shirts and people with blue shirts.
想象一下,您在花园中有很多人。 有人穿着红色衬衫,有人穿着蓝色衬衫。
The emperor of the garden comes to you and issues the orders:
花园的皇帝来找您并下达命令:
Imagine the garden is a cartesian plan. We will build a wall in the garden dividing blue shirts from red shirts. Your job is to give me the function that will describe exactly the trajectory of that wall.
想象一下,花园是笛卡尔计划。 我们将在花园里修建一堵墙,将蓝色衬衫和红色衬衫分开。 您的工作是给我提供可以准确描述该墙的轨迹的功能。
You then start trying to do your thing. At first, you try this function here:
然后,您开始尝试做您的事情。 首先,您可以在这里尝试此功能:
f(x) = tanh(tanh(-1x + -0.14905)x + 0.2514)
And this function fails miserably to do the job. A wall like that would separate people in a very wrong manner.
而且此功能无法正常工作。 这样的墙会以非常错误的方式将人们分开。
But you keep going, work hard on that function, and finally you come to the following equation:
但是,您继续前进,努力工作该功能,最后得出以下方程式:
f(x) = tanh(tanh(1x + -0.14905)x + 0.2514)
Yay! This wall now seems to be perfect!
好极了! 现在这堵墙似乎很完美!
It's almost like when the wall was built using that geometry, it will be a wall that separates exactly what it needs to physically separate.
几乎就像使用该几何体建造墙时,它将是一堵墙,将需要物理分离的墙完全分开。
If you looked carefully, all we had to do is change one coefficient of "x", and this is where we invested all our "Learning Process".
如果仔细看,我们要做的就是更改一个系数“ x” , 这就是我们投入所有“学习过程”的地方。
This though experiment is something I often use to reason what is going on inside a model, and I find them incredibly handy.
尽管我经常使用这种实验来推断模型内部发生了什么,但是我发现它们非常方便。
深度学习模型是一种功能 (A Deep Learning Model is a Function)
Given an input "x", our model can predict something about it; in other words, it gives us some "y".
给定输入“ x ”,我们的模型可以预测有关它的信息。 换句话说,它给我们一些“ y ”。
In order words, our model maps an input element "x", from the set "X", to something else that could be observed as the element "y" from the set "Y".
换句话说,我们的模型将输入元素“ X ”从集合“ X ”映射到其他可以作为集合“ Y ”的元素“ y ”观察到的东西。
The definition above remind us very closely the definition of a function in Discrete Maths.
上面的定义非常清楚地提醒我们离散数学中函数的定义。
Therefore, a model can be represented as f(x) = y.
因此,模型可以表示为f(x)= y 。
It is important to notice that W (weights) are not explicit in that function, and the consumers of that API do not know what the programmer has hardcoded inside the function f.
重要的是要注意,W(权重)在该函数中不是明确的 ,并且该API的使用者不知道程序员在函数f中进行了硬编码。
模型架构 (Model Architecture)
The model architecture is the code and algorithm that describes the bare bones of the function, or in other words the equation, that will be executed when we try to use the Model to predict anything..
模型体系结构是描述函数的基本内容(即方程式)的代码和算法,当我们尝试使用模型来预测任何内容时,将执行这些代码和算法。
This function is no more than a sequence of steps (therefore an algorithm) that often can be represented as a D.A.G. (directed acyclical graph), with the goal of mapping "x" to "y" successfully.
此功能只不过是一系列步骤(因此是一种算法),通常可以表示为DAG (有向无环图),目的是成功将“ x ”映射到“ y ”。
However, this algorithm relies on w (weights) that are not explicit in the input of f(x) = y.
但是,此算法依赖于在f(x)= y的输入中未明确显示的w(权重) 。
This is analog to say that W (weights) have been hard-coded inside f.
这类似于说W(权重)已在f内进行了硬编码 。
模型权重(“在功能f中进行硬编码”) (Model Weights ("hard-coded in the function f"))
Really? In a certain way, YES!The weights have been hard-coded inside the function f.
真? 以某种方式,是!权重已在函数f中进行了硬编码。
Who has hardcoded the weights?
谁对权重进行了硬编码?
Time to tweet about how much AI is just an over-glorified sequence of if/else statements? NO, because if/else are NOT exactly what's going on inside a Deep Learning Model.
是时候讨论一下多少AI只是if / else语句的过度夸张序列? 不,因为if / else并非深度学习模型内部的确切情况 。
But first of all: Who hardcoded the Weights inside a Deep Learning model? And that is the next question that we must answer.
但首先: 谁在深度学习模型中对权重进行了硬编码? 这是我们必须回答的下一个问题。
谁对权重进行了硬编码? (Who has hardcoded the weights?)
This question really unveils what Deep Learning is about: It is about learning the Weights (W) that will compose the Model and be slotted into the Architecture (equation)!
这个问题确实揭示了深度学习的含义: 它是关于学习权重(W)的,权重(W)将构成模型并放入架构(方程式)!
To make it real simple, let me try to picture what I mean by "hard-coded" into the function, with a code example:
为了简单起见,让我尝试通过代码示例将“硬编码”的含义描述为函数:
# let "example_person" be an array with the following information:
example_person = [
100 # wages per week,
70 # age
]
Now, let's create 2 functions, the first one with hard-coded if/else:
现在,让我们创建2个函数,第一个函数带有硬编码的if / else:
def likelihood_low_income_youth_1(X):
wages_per_week, age = X
if age <= 18 and wages_per_week <= 100:
return 1
else:
return 0
And another with a more mathematical approach:
另一个采用更数学的方法:
def likelihood_low_income_youth_2(X):
X = np.array(X)
W = np.array([ 0.01388, 0.0025 ])
return 1 - X.dot(W)
The way these two functions work is analogous, and if we set a simple threshold like 0.5 percent, they will will provide us with the same conclusion.
这两个函数的工作方式类似,如果我们将简单的阈值设置为0.5%,它们将为我们提供相同的结论。
It would, at least for 100% of the data points we provided it with, which is a single datapoint, described by our example_person.
至少对于我们提供的100%的数据点而言,它是单个数据点,由example_person描述。
likely_low_income_youth_1 = likelihood_low_income_youth_1(example_person) > 0.5likely_low_income_youth_2 = likelihood_low_income_youth_2(example_person) > 0.5assert likely_low_income_youth_1 == likely_low_income_youth_2
如果两者都是硬编码的(如果/其他和权重),为什么要选择权重? 他们更难读! (If both are hardcoded (if/else, and weights), why go for the weights? They are harder to read!)
That is true, and as a sworn follower of the Clean Code book, as well as of the great design patterns we have at this point, I cannot counter that.
的确如此,作为《清洁法规》书籍的坚定拥护者,以及我们目前拥有的出色设计模式,我无法反驳。
But there is still a reason: the if/else example requires a coder to program it, where as the second can be learned from data!
但是仍然有一个原因: if / else示例需要一个编码器来编程,因此可以从数据中学习第二个!
One may say that the machine can learn to code explicit if/else instructions, and that is true.
可能有人说机器可以学习对显式if / else指令进行编码 ,这是事实。
However, it is much easier to learn weights than if/else instructions or any other type of syntax. And even to get to a Seq2Seq model that is capable of generating code, some weights will have to be learned along the way.
但是,学习权重比if / else指令或任何其他类型的语法要容易得多。 甚至为了获得能够生成代码的Seq2Seq模型,在此过程中也必须学习一些权重。
The learning process itself I am not covering in this post.
我不会在这篇文章中介绍学习过程本身。
But in a short statement, Weights are learned using differentiation of the activation functions and adjusting their values, over iterations, using a process called back propagation.
但简短地说,权重是通过激活函数的微分并通过称为反向传播的过程在迭代过程中调整其值来学习的 。
结论 (Conclusion)
A model is a function composed by its architecture ("equation") and its weights (learned coefficients).
模型是由其架构(“等式”)及其权重(学习的系数)组成的函数。
Whereas the "coded part of the model" is really the model architecture, the weights, that are learned from the data, may cause the results coming out of the model function completely.
尽管“模型的编码部分”实际上是模型体系结构,但是从数据中学习到的权重可能导致结果完全脱离模型功能。
Therefore, they must be seen and talked about as a single thing.
因此,必须将它们视为一体并加以讨论。
想保持联系吗? 推特! (Wanna keep in Touch? Twitter!)
I’m Hudson Mendes (@hudsonmendes), coder, 35, husband, father, Principal Research Engineer, Data Science @ AIQUDO, Voice To Action.
我是Hudson Mendes ( @hudsonmendes ),编码员,35岁,丈夫,父亲,数据科学 @ AIQUDO的 首席研究工程师 ,语音行动 。
I’ve been on the Software Engineering road for 19+ years, and occasionally publish rants about Tensorflow, Keras, DeepLearning4J, Python & Java.
我从事软件工程工作已有19年以上,偶尔发布有关Tensorflow,Keras,DeepLearning4J,Python和Java的文章。
Join me there, and I will keep you in the loop with my daily struggle to get ML Models to Production!
加入我的行列 ,我将每天为使ML模型投入生产而竭尽全力!
翻译自: https://medium.com/swlh/what-exactly-is-a-deep-learning-model-be8cf39934ec