极限学习机和支持向量机

了解机器学习 (Understanding ML)

Note! This is just a major overview of the ELM evolution. It doesn't include all possible versions and tweaks done to ELMs through the years.

注意！ 这只是ELM演进的主要概述。 它不包括多年来对ELM所做的所有可能的版本和调整。

什么是ELM？ (What is ELM?)

ELM (Extreme Learning Machines) are feedforward neural networks. “Invented” in 2006 by G. Huang and it’s based on the idea of inverse matrix approximation.

ELM(极限学习机)是前馈神经网络。是由G. Huang在2006年“发明”的，它基于逆矩阵逼近的思想。

If you’re not familiar with ELMs please check out my article “Introduction to Extreme Learning Machines” first.

如果您不熟悉ELM，请先阅读我的文章“ 极限学习机简介 ”。

进化何时开始？ (When did evolution start?)

I-ELM(2006) (I-ELM (2006))

I-ELM structure. Source: An improved algorithm for incremental extreme learning machine I-ELM结构。 资料来源： 增量式极限学习机的改进算法

After the original publication in 2006, Huang and his associates published another paper on a different type of ELMs called I-ELM (Incremental ELM). As the name says, I-ELM is an incremental version of the standard ELM network. The idea of I-ELM is quite simple:

在2006年最初发表之后，Huang及其同事发表了另一篇关于另一种ELM的论文，称为I-ELM (增量ELM)。顾名思义，I-ELM是标准ELM网络的增量版本。 I-ELM的想法很简单：

Define max number of hidden nodes L and expected training accuracy ϵ Starting from l=0 (l is a current number of hidden nodes):

定义隐藏节点升最大数量和预期的训练精度ε从升开始= 0(l为隐藏节点的当前数目)：

Increment l_t = l_{t-1} + 1
增量l_t = l_ {t-1} + 1
Initialize weights w_l and bias b_l of the newly added hidden neuron randomly (do not reinitialize already existing neurons)
随机初始化新添加的隐藏神经元的权重w_l和b_l(不要重新初始化已经存在的神经元)
Calculate output vector H
计算输出向量H
Calculate weight vector β^
计算权重向量β ^
Calculate error after adding a node
添加节点后计算错误
Check if E <ϵ
检查如果E <ε
If not then increase the number of hidden nodes and repeat the process.
如果不是，则增加隐藏节点的数量并重复该过程。

There is a chance that l > L at some point in the process and E>ϵ. At this point, we should repeat the whole process of training and initialization.

存在这样的可能性：L> l位于在Craft.io和E>ε一些点。在这一点上，我们应该重复训练和初始化的整个过程。

The idea of incrementing the size of the network is not new and usually produces better results than setting network size “by hand”. There is one disadvantage which is especially important in terms of ELMs… computation time. If your network happens to be large (let’s say 1000 hidden nodes), in worse cases we have to make 1000 matrix inversions.

增加网络大小的想法并不是什么新鲜事，通常比“手动”设置网络大小产生更好的结果。就ELMs…计算时间而言，有一个特别重要的缺点。如果您的网络恰好很大(假设有1000个隐藏节点)，那么在更坏的情况下，我们必须进行1000次矩阵求逆。

If you’re interested in I-ELM, you should know there are many variations of it:

如果您对I-ELM感兴趣，则应该知道它有很多变体：

II-ELM (improved I-ELM)
II-ELM(改进的I-ELM)
CI-ELM (convex I-ELM)
CI-ELM(凸I-ELM)
EI-ELM (enhance I-ELM)
EI-ELM(增强I-ELM)

I’m not going to explain every one of them because this article should be just a quick summary and a place to start instead of the whole book about all variations of ELMs. Besides that probably every person reading this is here not by a mistake and know how to find more information about an interesting topic if he/she knows what to look for :P

我将不对它们中的每一个进行解释，因为本文应该只是一个快速的摘要和一个开始的地方，而不是有关ELM所有变体的整本书。除此之外，可能每个人都不会误会，并且如果他/她知道要寻找的内容，他们就会知道如何找到有关有趣主题的更多信息：P

P-ELM(2008年) (P-ELM (2008))

After introducing an incremental version of ELM another improvement was to use pruning to achieve the optimal structure of the network. P-ELM (pruned ELM) was introduced in 2008 by Hai-Jun Rong. The algorithm starts with a very large network and removes nodes that are not relevant to predictions. By “not relevant” we mean that node is not taking part in predicting output value (i.e. output value is close to 0). This idea was able to produce smaller classifiers and is mostly suitable for pattern classification.

引入增量版本的ELM之后，另一项改进是使用修剪来实现网络的最佳结构。 P-ELM(修剪的ELM)是由Hai-Jun Rong于2008年推出的。该算法从一个非常大的网络开始，并删除与预测无关的节点。 “不相关”是指节点不参与预测输出值(即输出值接近0)。这个想法能够产生较小的分类器，并且最适合于模式分类。

EM-ELM(2009年) (EM-ELM (2009))

This version of ELM is not a standalone version but an improvement of I-ELM. EM stands for Error-Minimized and allows to add a group of nodes instead of only one. Those nodes are inserted randomly into the network until the error is not below ϵ.

此版本的ELM不是独立版本，而是对I-ELM的改进。 EM代表“最小化错误”，并允许添加一组节点，而不是仅添加一个。这些节点随机插入网络，直到错误不低于ϵ 。

正规化ELM(2009) (Regularized ELM (2009))

Starting in 2009, Zheng studied the stability and generalization performance of ELM. He and his team come up with the idea of adding regularization to the original formula for calculating β^.

郑从2009年开始研究ELM的稳定性和泛化性能。他和他的团队提出了将正则化添加到用于计算β ^的原始公式中的想法。

Right now it looks like:

现在看起来像：

TS-ELM(2010) (TS-ELM (2010))

Two-stage ELM (TS-ELM) was a proposition to once again minimize network structure. Like the name says, it consists of two stages:

两阶段ELM(TS-ELM)是再次最小化网络结构的主张。顾名思义，它包括两个阶段：

Applying forward recursive algorithm to choose the hidden nodes from candidates generated randomly in each step. Hidden nodes are added until the stopping criterion is matched.
应用前向递归算法从每个步骤随机生成的候选中选择隐藏节点。隐藏节点将被添加，直到匹配停止标准为止。
Review of an existing structure. Even if we created a network with the minimum number of nodes to match our criterion, some of them might no longer be that useful. In this stage, we’re going to remove unimportant nodes.
审查现有结构。即使我们创建了一个节点数量最少以符合我们标准的网络，其中的某些节点可能不再有用。在这一阶段，我们将删除不重要的节点。

凯尔姆(2010) (KELM (2010))

Kernel-based ELM (KELM) was introduced and uses kernel function instead of H^T H. This idea was inspired by SVM and the main kernel function used with ELMs is RBF (Radial Basis Function). KELMs are used to design Deep ELMs.

引入了基于内核的ELM(KELM)，并使用内核函数代替H ^ TH 。这个想法受SVM的启发，ELM使用的主要内核功能是RBF( 径向基函数 )。 KELM用于设计深度ELM。

V-ELM(2012) (V-ELM (2012))

Voting-based ELM (V-ELM) was proposed in 2012 to improve performance on classification tasks. Problem was that the standard training process of ELM might not achieve the optimal boundary for classification then adding nodes randomly. Because of that, some samples which are near that boundary might be misclassified. In V-ELM we’re not training just one network but many of them and then, base on the majority voting method, selecting the optimal one.

2012年提出了基于投票的ELM(V-ELM)，以提高分类任务的性能。问题是ELM的标准训练过程可能无法达到分类的最佳边界，然后随机添加节点。因此，靠近该边界的一些样本可能会被错误分类。在V-ELM中，我们不只是训练一个网络，而是训练许多网络，然后根据多数表决方法选择最佳网络。

ELM-AE(2013) (ELM-AE (2013))

When in 2013 ideas like RBM and autoencoders starting to get popular, Kasnu produces a paper on ELM-AE (ELM Auto-Encoders). The main goal is to be able to reproduce an input vector, as well as standard autoencoders does. Structure of ELM-AE looks the same as standard ELM

当在2013年的想法一样RBM和自动编码开始变得流行，Kasnu产生的ELM-AE(ELM自动编码器)的论文。主要目标是像标准自动编码器一样能够再现输入向量。 ELM-AE的结构与标准ELM相同

ELM-AE structure. Source: Representational Learning with ELMs for Big Data ELM-AE结构。 资料来源： 利用ELM进行代表性学习

There are three types of ELM-AE:

ELM-AE有三种类型：

Compression. Higher-dimensional input space to the lower-dimensional hidden layer (less hidden nodes than input).
压缩。高维输入空间到低维隐藏层(比输入少的隐藏节点)。
Equal representation. Data dimensionality remains the same (same number of nodes in hidden and input)
平等代表。数据维数保持不变(隐藏和输入中的节点数相同)
Sparsing. Lower-dimensional input space to the higher-dimensional hidden layer (more hidden nodes than input)
稀疏。低维输入空间到高维隐藏层(隐藏节点多于输入)

There are two main differences between standard ELMs and ELM-AE. The first one is that ELM-AE is unsupervised. As an output, we’re using the same vectors as input. Second thing is that weights in ELM-AE are orthogonal, the same goes for bias in the hidden layer. This is important because ELM-AE is used to create a deep version of ELMs.

标准ELM和ELM-AE之间有两个主要区别。第一个是ELM-AE是不受监督的。作为输出，我们使用相同的向量作为输入。第二件事是，ELM-AE中的权重是正交的，隐藏层中的偏差也是如此。这很重要，因为ELM-AE用于创建ELM的深版本。

MLELM(2013) (MLELM (2013))

In the same paper (Representational Learning with ELMs for Big Data) Kasnu proposed a version of ELM called Multi-Layer ELM. This idea is based on stacked autoencoders and consists of multiple ELM-AE.

在同一篇论文中(大数据ELM的代表性学习)，Kasnu提出了一种称为多层ELM的ELM版本。这个想法基于堆叠式自动编码器，并且由多个ELM-AE组成。

MLELM structure. Source: Representational Learning with ELMs for Big Data MLELM结构。 资料来源： 利用ELM进行代表性学习

You might ask “Why even bother with creating something similar to stacked autoencoders but with ELMs?”. If we look at how MLELM works we can see that it doesn’t require fine-tuning. That makes it a lot faster to construct than standard autoencoders networks. Like I’ve said, MLELM uses ELM-AE to train the parameters in each layer and removes output layers, so we’re left with only input and hidden layers of the ELM-AEs.

您可能会问：“为什么还要创建类似于堆叠式自动编码器但使用ELM的东西呢？”。如果我们看一下MLELM的工作方式，我们可以看到它不需要微调。这使得构建比标准自动编码器网络快得多。就像我说过的一样，MLLEM使用ELM-AE训练每一层中的参数并删除输出层，因此我们只剩下ELM-AE的输入层和隐藏层。

DELM(2015) (DELM (2015))

Deep ELM is one of the newest (and last major iteration in ELM evolution at the point of writing this article). DELMs are based on the idea of MLELMs with the use of KELM as the output layer.

Deep ELM是最新的(在撰写本文时，也是ELM演变中的最后一次重大迭代)。 DELM基于MLELM的思想，并使用KELM作为输出层。

DELM structure. Source: Deep Extreme Learning Machine and Its Application in EEG Classification. DELM结构。 资料来源： 深度极限学习机及其在脑电分类中的应用。

结论 (Conclusion)

ELMs were evolving through the years and definitely copying some major ideas from the field of machine learning. Some of those ideas work really great and could be useful when designing real-life models. You should remember that is just a brief summary of what happened in the field of ELM, not a complete review (not even close). It’s highly probable that if you type some prefix before ELM there is already a version of ELM with that prefix :)

多年以来，ELM不断发展，并且肯定复制了机器学习领域的一些主要思想。这些想法中的一些确实非常有用，并且在设计实际模型时可能有用。您应该记住，这只是对ELM领域发生的情况的简要概述，而不是完整的评论(甚至没有结束)。如果您在ELM之前键入一些前缀，则很有可能已经存在带有该前缀的ELM版本：)

Originally published at https://erdem.pl.

最初发布在 https://erdem.pl 。

翻译自: https://towardsdatascience.com/evolution-of-extreme-learning-machines-2c7caf08e76b