

The world of AI is as exciting as it is misunderstood. Buzz words like “Machine Learning” and “Artificial Intelligence” end up skewing not only the general understanding of their capabilities but also key differences between their functionality against other models. In this article, I want to discuss the key differences between a linear regression model and a standard feed-forward neural network. To do this, I will be using the same dataset (which can be found here: https://archive.ics.uci.edu/ml/datasets/Energy+efficiency) for each model and compare the differences in architecture and outcome in Python.

We are looking at the Energy Efficiency dataset from UCI. In the context of the data, we are working with each column is defined as the following:

  • X1 — Relative Compactness

  • X2 — Surface Area

  • X3 — Wall Area

  • X4 — Roof Area

  • X5 — Overall Height

  • X6 — Orientation

  • X7 — Glazing Area

  • X8 — Glazing Area Distribution

  • y1 — Heating Load

  • y2 — Cooling Load

Where our goal is to predict the heating and cooling load based on the X1-X8.


Let’s take a look at our dataset in Python…


X1     X2     X3      X4   X5  X6   X7  X8     Y1     Y2
0 0.98 514.5 294.0 110.25 7.0 2 0.0 0 15.55 21.33
1 0.98 514.5 294.0 110.25 7.0 3 0.0 0 15.55 21.33
2 0.98 514.5 294.0 110.25 7.0 4 0.0 0 15.55 21.33
3 0.98 514.5 294.0 110.25 7.0 5 0.0 0 15.55 21.33
4 0.90 563.5 318.5 122.50 7.0 2 0.0 0 20.84 28.28

Now, let's plot each of these variables against one another to get a better idea of whats going on within our data…


import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

data = pd.read_excel('assets/ENB2012_data.xlsx')

There is a lot going on in the plot above so let’s break it down step by step. Initially, when plotting this data I am looking for linear relationships and considering dimensionality reduction. Mainly the issue of multicollinearity which can inflate our model’s explainability and hurt its overall robustness.

What stands out immediately in the data above is a strong positive linear relationship between the two dependent variables and a strong negative linear relationship between relative compactness and surface area (which makes sense if you think about it).


Dimensionality/feature reduction is beyond the purpose and scope of this article, nevertheless I felt it was worth mentioning.


Next, let’s create a correlation heatmap so we can get some more insight…


import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

data = pd.read_excel('assets/ENB2012_data.xlsx')
cmap = sns.diverging_palette(0, 255, as_cmap=True)
sns.heatmap(data.corr(), cmap=cmap)

Now, why is this important? The correlation heatmap we plotted gives us immediate insight into whether or not there are linear relationships in the data with respect to each feature. Obviously, as the number of features increases drastically this process will have to be automated — but again that is outside the scope of this article. By understanding whether or not there are strong linear relationships within our data we can take appropriate steps to combine features, reduce dimensionality, and pick an appropriate model. Recall a linear regression model operates on a linear relationship assumption where a neural network can identify non-linear relationships.

What do I mean when I say the model can identify linear and non-linear (in the case of linear regression and a neural network respectively) relationships in data? The graph below gives three examples: a positive linear relationship, a negative linear relationship, and a non-linear relationship.

Photo Credit

This is why we conduct our initial data analysis (pairplots, heatmaps, etc…) so we can determine the most appropriate model to use on a case by case basis. If there were a single answer and a universal dominant model we wouldn’t need data scientists, machine learning engineers, or AI researchers.

In our regression model, we are weighting every feature in every observation and determining the error against the observed output. Let’s build a linear regression in Python and look at the results within this particular dataset.

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression

data = pd.read_excel('assets/ENB2012_data.xlsx')

X = data.drop(axis=1, columns=['Y1', 'Y2'])
y = pd.concat([data['Y1'], data['Y2']], axis=1)

model = LinearRegression()
model.fit(X, y)

r_sq = model.score(X, y)
r_sq = 0.9028334357025505

Our model can explain ~90% of the variation — that's pretty good considering we’ve done nothing with our dataset.


To compare the two models we will be looking at the mean squared error…


import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

data = pd.read_excel('assets/ENB2012_data.xlsx')

X = data.drop(axis=1, columns=['Y1', 'Y2'])
y = pd.concat([data['Y1'], data['Y2']], axis=1)

model = LinearRegression()
model.fit(X, y)
y_hat = model.predict(X)

r_sq = model.score(X, y)

mse = mean_squared_error(y, y_hat)
r_sq = 0.9028334357025505
mse = 9.331137808925114

Now let’s do the exact same thing with a simple sequential neural network. A sequential neural network is just a sequence of linear combinations as a result of matrix operations. However, there is a non-linear component in the form of an activation function that allows for the identification of non-linear relationships. For this example, we will be using ReLU for our activation function. Ironically, this is a linear function as we haven’t normalized or standardized our data sigmoid and tanh won’t be of much use to us. (This, yet again, is another component that must be selected on a case by case basis based on our data.)

import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers import Dense

data = pd.read_excel('assets/ENB2012_data.xlsx')

X = data.drop(axis=1, columns=['Y1', 'Y2'])
y = pd.concat([data['Y1'], data['Y2']], axis=1)

network = Sequential()
network.add(Dense(8, input_shape=(8,), activation='relu'))
network.add(Dense(6, activation='relu'))
network.add(Dense(6, activation='relu'))
network.add(Dense(4, activation='relu'))
network.add(Dense(2, activation='relu'))

network.compile('adam', loss='mse', metrics=['mse'])
network.fit(X, y, epochs=1000)
Epoch 1000/100032/768 [>.............................] - ETA: 0s - loss: 5.8660 - mse: 5.8660
768/768 [==============================] - 0s 58us/step - loss: 6.7354 - mse: 6.7354

The neural network reduces MSE by almost 30%.


After discussing with a number of professionals 9/10 times the regression model would be preferred over any other machine learning or artificial intelligence algorithm. Why is this the case even if the ML and AI algorithms have a higher degree of accuracy? Most of the time you are delivering a model to a client or need to act based on the output of the model and have to speak to the why. It is relatively easy to explain a linear model, its assumptions, and why the output is what it is. Trying to do that with a neural network would be not only exhausting but extremely confusing to those not involved in the development process.

