神经网络为什么优于逻辑回归
Neural networks are often regarded as the holy grail, all-knowing, solution-to-everything of machine learning, primarily because they are complex. Tree-based methods, on the other hand, are not treated with the same awe and hype, primarily because they seem simple. While they seem so different, they are simply two sides of the same coin.
神经网络通常被认为是机器学习所有内容的全知全能的圣杯,主要是因为它们很复杂。 另一方面,基于树的方法没有得到相同的敬畏和炒作,主要是因为它们看起来很简单。 尽管它们看起来如此不同,但它们只是同一枚硬币的两个侧面。
Tree-based methods routinely outperform neural networks. Any Kaggler knows that XGBoost is by far the most popular choice for top-performing competition submissions. In essence, what puts tree-based methods and neural networks in the same category is that they approach problems by deconstructing them piece-by-piece, instead of finding one complex boundary that can separate the entire dataset, like SVM or Logistic Regression.
基于树的方法通常胜过神经网络。 任何Kaggler都知道XGBoost是迄今为止表现最佳的竞赛作品最受欢迎的选择。 本质上,将基于树的方法和神经网络归为同一类的是,它们通过逐个解构来解决问题,而不是找到可以分离整个数据集的复杂边界,例如SVM或Logistic回归。
Very obviously, tree-based methods progressively split the feature space along various features to optimize the information gain. What is less obvious is that neural networks approaches the task similarly. Each neuron watches over a specific section of the feature space (with various overlapping). If an input falls into that space, certain neurons are activated.
显然,基于树的方法将特征空间沿各种特征逐渐划分,以优化信息增益。 不太明显的是,神经网络以相似的方式处理任务。 每个神经元监视特征空间的特定部分(具有各种重叠)。 如果输入落入该空间,则会激活某些神经元。
Neural networks take a probabilistic view towards this piece-by-piece model fitting, whereas trees a deterministic one. Regardless, both rely on the depth of their model for their performance because their components correlate with sections of the feature space.
神经网络对这种逐段模型拟合具有概率性的看法,而树则具有确定性。 无论如何,两者都依赖于模型的深度来实现性能,因为它们的组件与要素空间的各个部分相关。
A model with too many components — nodes in the case of trees, neurons in the case of networks — overfits, and a model with too little cannot give meaningful predictions at all. (Both begin to memorize data points instead of actually learning generalizations.)
包含过多组件的模型(对于树而言,是节点,对于网络而言,是神经元),过拟合,而对于模型而言,过少的组件根本无法给出有意义的预测。 (两者都开始记住数据点,而不是实际学习概括。)
For more intuition on how neural networks split down the feature space, look into the Universal Approximation Theorem.
有关神经网络如何分解特征空间的更多直观信息,请查看通用近似定理 。
Although there are many powerful variants of decision trees like random forests, gradient boosting, adaptive boosting, and deep forests, in general tree-based methods are essentially simplified versions of neural networks.
尽管决策树有许多强大的变体,例如随机森林,梯度增强,自适应增强和深林,但一般而言,基于树的方法实质上是神经网络的简化版本。
Tree-based methods approach the problem piece-by-piece through vertical and horizontal lines to minimize entropy (optimizer & loss). Neural networks approach the problem piece-by-piece through manipulating shapes of activation functions (see how does relu work so well?).
基于树的方法通过垂直和水平线逐个处理问题,以最大程度地减少熵 (优化器和损耗)。 神经网络通过操纵激活函数的形状来逐步解决问题 (请参阅relu如何如此有效 ?)。
- Tree-based methods are deterministic, opposed to being probabilistic. This leads to some nice simplifications like automatic feature selection. 基于树的方法是确定性的,而不是概率性的。 这导致了一些很好的简化,例如自动功能选择。
- Conditional nodes that are activated in decision trees are analogous to neurons being activated (information flow). 在决策树中激活的条件节点类似于被激活的神经元(信息流)。
- Neural networks fit parameters to transform the input and indirectly direct the activations of following neurons. Decision trees explicitly fit parameters to direct the information flow. (This is a result of being deterministic opposed to probabilistic.) 神经网络拟合参数以转换输入并间接指导后续神经元的激活。 决策树显式适合参数以指导信息流。 (这是确定性反对概率的结果。)
Of course, this is an abstract and perhaps even controversial claim to make. There are admittedly many mental hurdles to building this connection. Regardless, it’s an important piece in understanding when and why tree-based methods outperform neural networks.
当然,这是一个抽象的甚至是有争议的主张。 诚然,建立这种联系存在许多精神障碍。 无论如何,这都是了解何时以及为什么基于树的方法优于神经网络的重要内容。
Tabular, or structured data that comes in the form of a table, is natural for a decision tree. Most everyone agrees a neural network is overkill for tabular data regression and prediction, so we make a few simplifications. We choose ones and zeroes opposed to probabilities, which is the primary root of differences between the two algorithms. Hence, trees succeed in cases where the nuances of probability are not necessary, like structured data.
表格形式的表格或结构化数据对于决策树是很自然的。 大多数人都认为,神经网络对于表格数据的回归和预测是过大的,因此我们进行了一些简化。 我们选择与概率相对的一和零,这是两种算法之间差异的主要根源。 因此,树在不需要概率的细微差别的情况下会成功,例如结构化数据。
Tree-based methods perform well enough on, for example, the MNIST dataset because each digit has several defining features. Probabilities are simply not a necessary calculation. It’s not a very complex problem at all, which is why well-designed tree ensembles perform at the same level or even better than a modern convolutional neural network.
基于树的方法在MNIST数据集上的表现非常出色,因为每个数字都有几个定义特征。 概率根本不是必要的计算。 这根本不是一个非常复杂的问题,这就是为什么精心设计的树状集成在相同的水平上甚至比现代卷积神经网络表现更好的原因。
Often, people will be inclined to say that ‘trees simply memorize rules’, which is true. It’s the same with neural networks, which memorize more complex, probability-based rules. Instead of explicitly yielding a True/False to a condition like x>3, neural networks will blow an input up to a high value to yield a sigmoid value of 1 or to produce some continuous expression.
通常,人们会倾向于说“树只是记住规则”,这是事实。 神经网络也是如此,神经网络会记住更复杂的基于概率的规则。 神经网络不会显式地对x > 3之类的条件产生True / False,而是将输入吹大到一个高值,以产生S型值1或产生一些连续表达式。
On the other hand, because neural networks are so complex, there’s lots that can be done with them. The convolution and the recurrent layers are all brilliant adaptations of neural networks that work well because they operate on data that often requires the nuance of probabilistic calculation.
另一方面,由于神经网络太复杂了,因此可以做很多事情。 卷积和递归层都是神经网络的出色改编,它们很好地起作用,因为它们处理的数据通常需要概率计算的细微差别。
Few images can be modelled with 1s and 0s. A decision tree value cannot handle datasets with many intermediate values (e.g. 0.5), which is why it works well on MNIST, in which pixel values are almost all either black or white, but not others (i.e. ImageNet). Similarly, text has too much information and too many exceptions to express in only deterministic terms.
很少有图像可以使用1s和0s建模。 决策树值不能处理具有许多中间值(例如0.5)的数据集,这就是为什么它在MNIST上能很好地工作的原因,在MNIST中,像素值几乎都是黑色或白色,而其他像素值(例如ImageNet)几乎没有。 同样,文本具有太多的信息和太多的例外情况,只能用确定性的术语来表达。
This is also why neural networks are predominantly used in these areas, and why neural network research stagnated earlier (<2000s) when massive quantities of image and text data were not available. Other common use cases of neural networks are limited to massive predictions, like YouTube’s video recommendation algorithm, in which the scale is so large that probabilities must be involved.
这也是为什么在这些领域中主要使用神经网络的原因,以及为什么在无法获得大量图像和文本数据时神经网络的研究停滞不前(<2000年代)的原因。 神经网络的其他常见用例仅限于大规模预测,例如YouTube的视频推荐算法,该算法的规模如此之大以至于必须考虑概率。
Go to any data science team in a company and they likely use a tree-based model rather than a neural network. Unless they’re building a heavy-duty appliance, like blurring the background of a video in Zoom, the deterministic nature of trees makes tasks of everyday business classification lightweight, with the same general approach as a neural network.
前往公司中的任何数据科学团队,他们都可能使用基于树的模型而不是神经网络。 除非他们要构建重型设备(如在Zoom中模糊视频的背景),否则树的确定性将使轻巧的日常业务分类任务变得简单,并采用与神经网络相同的通用方法。
It’s also arguable that deterministic modelling is, in many real-world cases, more natural than probabilistic modelling. For instance, a tree would be a good choice to predict if a user purchases an item or not from an ecommerce site, since users naturally follow a rule-based decision process. It may look something like this:
在许多实际情况下,确定性建模比概率建模更自然,这也是有争议的。 例如,由于用户自然会遵循基于规则的决策过程,因此树是预测用户是否从电子商务站点购买商品的好选择。 它可能看起来像这样:
- Have I had a pleasant experience on this platform before? If yes, proceed. 我以前在这个平台上有过愉快的经历吗? 如果是,请继续。
- Do I need this item now? e.g. Should I be purchasing sunglasses and swimming trunks in the winter? If yes, proceed. 我现在需要这个物品吗? 例如,我应该在冬天购买太阳镜和泳裤吗? 如果是,请继续。
- Based on my demographics, would this be a product I am interested in buying? If yes, proceed. 根据我的人口统计,这是我有兴趣购买的产品吗? 如果是,请继续。
- Is this item too expensive? If no, proceed. 这东西太贵了吗? 如果否,请继续。
- Have other customers rated this item to a certain threshold such that I feel comfortable buying it? If yes, proceed. 是否有其他客户将该商品的评分定为某个阈值,以便让我放心购买? 如果是,请继续。
Humans, in general, follow very rule-based and structured decision making processes. In these cases, probabilistic modelling is not necessary.
通常,人类遵循非常基于规则和结构化的决策过程。 在这些情况下,概率建模不是必需的。
In summary,
综上所述,
- Tree-based methods are best thought of as scaled down versions of neural networks, approaching feature classification, optimization, information flow, etc. in simpler terms. 最好将基于树的方法视为神经网络的缩小版本,以更简单的方式处理特征分类,优化,信息流等问题。
- The primary difference in usage between tree-based methods and neural networks is in deterministic (0/1) vs. probabilistic structures of data. Structured (tabular) data is consistently better modelled with deterministic models. 基于树的方法和神经网络在用法上的主要区别在于确定性(0/1)与概率性数据结构。 使用确定性模型可以更好地对结构化(表格)数据进行建模。
- Don’t underestimate the power of tree-based methods. 不要低估基于树的方法的功能。
翻译自: https://towardsdatascience.com/when-and-why-tree-based-models-often-outperform-neural-networks-ceba9ecd0fd8
神经网络为什么优于逻辑回归