It is a common misconception that support vector machines are only useful when solving classification problems.


The purpose of using SVMs for regression problems is to define a hyperplane as in the image above, and fit as many instances as is feasible within this hyperplane while at the same time limiting margin violations.


In this way, SVMs used in this manner differ from classification tasks, where the objective is to fit the largest possible hyperplane between two separate classes (while also limiting margin violations).


As a matter of fact, SVMs can handle regression modelling quite effectively. Let’s take hotel bookings as an example.

实际上,SVM可以非常有效地处理回归建模。 让我们以酒店预订为例。

预测酒店客户的平均每日房价 (Predicting Average Daily Rates Across Hotel Customers)

Suppose that we are building a regression model to predict the average daily rate (or rate that a customer pays on average per day) for a hotel booking. A model is constructed with the following features:

假设我们正在建立回归模型,以预测酒店预订的平均每日房价(或客户平均每天所支付的房价)。 构建具有以下特征的模型:

  • Cancellation (whether a customer cancels their booking or not)

  • Country of Origin

  • Market Segment

  • Deposit Type

  • Customer Type

  • Required Car Parking Spaces

  • Week of Arrival


Note that the ADR values are also populated for customers that cancelled — the response variable in this case reflects the ADR that would have been paid had the customer proceeded with the booking.


The original study by Antonio, Almeida and Nunes (2016) can be accessed from the References section below.


建筑模型 (Model Building)

Using the features as outlined above, the SVM model is trained and validated on the training set (H1), with the predictions compared to the actual ADR values across the test set (H2).


The model is trained as follows:


>>> from sklearn.svm import LinearSVR
>>> svm_reg = LinearSVR(epsilon=1.5)
>>> svm_reg.fit(X_train, y_train)LinearSVR(C=1.0, dual=True, epsilon=1.5, fit_intercept=True,
intercept_scaling=1.0, loss='epsilon_insensitive', max_iter=1000,
random_state=None, tol=0.0001, verbose=0)>>> predictions = svm_reg.predict(X_val)
>>> predictionsarray([100.75090575, 109.08222631, 79.81544167, ..., 94.50700112,
55.65495607, 65.5248653 ])

Now, the same model is used on the features in the test set to generate predicted ADR values:


bpred = svm_reg.predict(atest)

Let’s compare the predicted ADR to actual ADR on a mean absolute error (MAE) and root mean squared error (RMSE) basis.


>>> mean_absolute_error(btest, bpred)
29.50931462735928>>> print('mse (sklearn): ', mean_squared_error(btest,bpred))
>>> math.sqrt(mean_squared_error(btest, bpred))

Note that the sensitivity of the SVM to additional training instances is set by the epsilon (ϵ) parameter, i.e. the higher the parameter, the more of an impact additional training instances has on the model results.


In this instance, a large margin of 1.5 was used. Here is the model performance when a margin of 0.5 is used.

在这种情况下,使用了1.5的大余量。 这是使用0.5的裕度时的模型性能。

>>> mean_absolute_error(btest, bpred)29.622491512816826>>> print('mse (sklearn): ', mean_squared_error(btest,bpred))
>>> math.sqrt(mean_squared_error(btest, bpred))44.7963000500928

We can see that there has been virtually no change in the MAE or RMSE parameters through modifying the ϵ parameter.


That said, we want to ensure that the SVM model is not overfitting. Specifically, if we find that the best fit is achieved when ϵ = 0, then this might be a sign that the model is overfitting.

也就是说,我们要确保SVM模型不会过拟合。 具体来说,如果我们发现当ϵ = 0时达到了最佳拟合则可能表明该模型过度拟合。

Here are the results when we set ϵ = 0.

当我们设置ϵ = 0时,结果如下。

  • MAE: 31.86

    MAE: 31.86

  • RMSE: 47.65

    RMSE: 47.65

Given that we are not seeing higher accuracy when ϵ = 0, there does not seem to be any evidence that overfitting is an issue in our model — at least not from this standpoint.

考虑到当ϵ = 0时我们看不到更高的精度,因此似乎没有任何证据表明过度拟合是我们模型中的一个问题-至少从这个角度来看不是这样。

SVM性能与神经网络相比如何? (How Does SVM Performance Compare To A Neural Network?)

When using the same features, how does the SVM performance accuracy compare to that of a neural network?


Consider the following neural network configuration:


>>> model = Sequential()
>>> model.add(Dense(8, input_dim=8, kernel_initializer='normal', activation='elu'))
>>> model.add(Dense(1669, activation='elu'))
>>> model.add(Dense(1, activation='linear'))
>>> model.summary()Model: "sequential"
Layer (type) Output Shape Param #
dense (Dense) (None, 8) 72
dense_1 (Dense) (None, 1669) 15021
dense_2 (Dense) (None, 1) 1670
Total params: 16,763
Trainable params: 16,763
Non-trainable params: 0

The model is trained across 30 epochs with a batch size of 150:


>>> model.compile(loss='mse', optimizer='adam', metrics=['mse','mae'])
>>> history=model.fit(X_train, y_train, epochs=30, batch_size=150, verbose=1, validation_split=0.2)
>>> predictions = model.predict(X_test)

The following MAE and RMSE are obtained on the test set:


  • MAE: 29.89

    MAE: 29.89

  • RMSE: 43.91

    RMSE: 43.91

We observed that when ϵ was set to 1.5 for the SVM model, the MAE and RMSE came in at 29.5 and 44.6 respectively. In this regard, the SVM has matched the neural network in prediction accuracy on the test set.

我们观察到 SVM模型设置为1.5 ,MAE和RMSE分别为29.5和44.6。 在这方面,SVM在测试集的预测准确性方面已与神经网络相匹配。

结论 (Conclusion)

It is a common misconception that SVMs are only suitable for working with classification data.


However, we have seen in this example that the SVM model has been quite effective at predicting ADR values for the neural network.


Many thanks for reading, and any questions or feedback appreciated.


The GitHub repository for this example, as well as other relevant references are available below.


Disclaimer: This article is written on an “as is” basis and without warranty. It was written with the intention of providing an overview of data science concepts, and should not be interpreted as professional advice in any way.

免责声明:本文按“原样”撰写,不作任何担保。 它旨在提供数据科学概念的概述,并且不应以任何方式解释为专业建议。

