Welcome to your first practice lab! In this lab, you will implement linear regression with one variable to predict profits for a restaurant franchise.
First, let’s run the cell below to import all the packages that you will need during this assignment.
utils.py
contains helper functions for this assignment. You do not need to modify code in this file.import numpy as np
import matplotlib.pyplot as plt
from utils import *
import copy
import math
%matplotlib inline
Suppose you are the CEO of a restaurant franchise and are considering different cities for opening a new outlet.
Can you use the data to help you identify which cities may potentially give your business higher profits?
You will start by loading the dataset for this task.
load_data()
function shown below loads the data into variables x_train
and y_train
x_train
is the population of a cityy_train
is the profit of a restaurant in that city. A negative value for profit indicates a loss.X_train
and y_train
are numpy arrays.# load the dataset
x_train, y_train = load_data()
Before starting on any task, it is useful to get more familiar with your dataset.
The code below prints the variable x_train
and the type of the variable.
# print x_train
print("Type of x_train:",type(x_train))
print("First five elements of x_train are:\n", x_train[:5])
Type of x_train:
First five elements of x_train are:
[6.1101 5.5277 8.5186 7.0032 5.8598]
x_train
is a numpy array that contains decimal values that are all greater than zero.
Now, let’s print y_train
# print y_train
print("Type of y_train:",type(y_train))
print("First five elements of y_train are:\n", y_train[:5])
Type of y_train:
First five elements of y_train are:
[17.592 9.1302 13.662 11.854 6.8233]
Similarly, y_train
is a numpy array that has decimal values, some negative, some positive.
Another useful way to get familiar with your data is to view its dimensions.
Please print the shape of x_train
and y_train
and see how many training examples you have in your dataset.
print ('The shape of x_train is:', x_train.shape)
print ('The shape of y_train is: ', y_train.shape)
print ('Number of training examples (m):', len(x_train))
The shape of x_train is: (97,)
The shape of y_train is: (97,)
Number of training examples (m): 97
The city population array has 97 data points, and the monthly average profits also has 97 data points. These are NumPy 1D arrays.
It is often useful to understand the data by visualizing it.
# Create a scatter plot of the data. To change the markers to red "x",
# we used the 'marker' and 'c' parameters
plt.scatter(x_train, y_train, marker='x', c='r')
# Set the title
plt.title("Profits vs. Population per city")
# Set the y-axis label
plt.ylabel('Profit in $10,000')
# Set the x-axis label
plt.xlabel('Population of City in 10,000s')
plt.show()
Your goal is to build a linear regression model to fit this data.
In this practice lab, you will fit the linear regression parameters ( w , b ) (w,b) (w,b) to your dataset.
The model function for linear regression, which is a function that maps from x
(city population) to y
(your restaurant’s monthly profit for that city) is represented as
f w , b ( x ) = w x + b f_{w,b}(x) = wx + b fw,b(x)=wx+b
To train a linear regression model, you want to find the best ( w , b ) (w,b) (w,b) parameters that fit your dataset.
To compare how one choice of ( w , b ) (w,b) (w,b) is better or worse than another choice, you can evaluate it with a cost function J ( w , b ) J(w,b) J(w,b)
The choice of ( w , b ) (w,b) (w,b) that fits your data the best is the one that has the smallest cost J ( w , b ) J(w,b) J(w,b).
To find the values ( w , b ) (w,b) (w,b) that gets the smallest possible cost J ( w , b ) J(w,b) J(w,b), you can use a method called gradient descent.
The trained linear regression model can then take the input feature x x x (city population) and output a prediction f w , b ( x ) f_{w,b}(x) fw,b(x) (predicted monthly profit for a restaurant in that city).
Gradient descent involves repeated steps to adjust the value of your parameter ( w , b ) (w,b) (w,b) to gradually get a smaller and smaller cost J ( w , b ) J(w,b) J(w,b).
As you may recall from the lecture, for one variable, the cost function for linear regression J ( w , b ) J(w,b) J(w,b) is defined as
J ( w , b ) = 1 2 m ∑ i = 0 m − 1 ( f w , b ( x ( i ) ) − y ( i ) ) 2 J(w,b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})^2 J(w,b)=2m1i=0∑m−1(fw,b(x(i))−y(i))2
f w , b ( x ( i ) ) = w x ( i ) + b f_{w,b}(x^{(i)}) = wx^{(i)} + b fw,b(x(i))=wx(i)+b
This is the equation for a line, with an intercept b b b and a slope w w w
Please complete the compute_cost()
function below to compute the cost J ( w , b ) J(w,b) J(w,b).
Complete the compute_cost
below to:
Iterate over the training examples, and for each example, compute:
The prediction of the model for that example
f w b ( x ( i ) ) = w x ( i ) + b f_{wb}(x^{(i)}) = wx^{(i)} + b fwb(x(i))=wx(i)+b
The cost for that example c o s t ( i ) = ( f w b − y ( i ) ) 2 cost^{(i)} = (f_{wb} - y^{(i)})^2 cost(i)=(fwb−y(i))2
Return the total cost over all examples
J ( w , b ) = 1 2 m ∑ i = 0 m − 1 c o s t ( i ) J(\mathbf{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} cost^{(i)} J(w,b)=2m1i=0∑m−1cost(i)
If you get stuck, you can check out the hints presented after the cell below to help you with the implementation.
# UNQ_C1
# GRADED FUNCTION: compute_cost
def compute_cost(x, y, w, b):
"""
Computes the cost function for linear regression.
Args:
x (ndarray): Shape (m,) Input to the model (Population of cities)
y (ndarray): Shape (m,) Label (Actual profits for the cities)
w, b (scalar): Parameters of the model
Returns
total_cost (float): The cost of using w,b as the parameters for linear regression
to fit the data points in x and y
"""
# number of training examples
m = x.shape[0]
# You need to return this variable correctly
total_cost = 0
### START CODE HERE ###
total_cost = ((w * x + b - y) ** 2).mean() / 2
### END CODE HERE ###
return total_cost
In this case, you can iterate over all the examples in x
using a for loop and add the cost
from each iteration to a variable (cost_sum
) initialized outside the loop.
Then, you can return the total_cost
as cost_sum
divided by 2m
.
You can check if your implementation was correct by running the following test code:
# Compute cost with some initial values for paramaters w, b
initial_w = 2
initial_b = 1
cost = compute_cost(x_train, y_train, initial_w, initial_b)
print(type(cost))
print(f'Cost at initial w (zeros): {cost:.3f}')
# Public tests
from public_tests import *
compute_cost_test(compute_cost)
Cost at initial w (zeros): 75.203
[92mAll tests passed!
Expected Output:
Cost at initial w (zeros): 75.203 |
In this section, you will implement the gradient for parameters w , b w, b w,b for linear regression.
As described in the lecture videos, the gradient descent algorithm is:
repeat until convergence: { 0000 b : = b − α ∂ J ( w , b ) ∂ b 0000 w : = w − α ∂ J ( w , b ) ∂ w } \begin{align*}& \text{repeat until convergence:} \; \lbrace \newline \; & \phantom {0000} b := b - \alpha \frac{\partial J(w,b)}{\partial b} \newline \; & \phantom {0000} w := w - \alpha \frac{\partial J(w,b)}{\partial w} \tag{1} \; & \newline & \rbrace\end{align*} repeat until convergence:{0000b:=b−α∂b∂J(w,b)0000w:=w−α∂w∂J(w,b)}(1)
where, parameters w , b w, b w,b are both updated simultaniously and where
∂ J ( w , b ) ∂ b = 1 m ∑ i = 0 m − 1 ( f w , b ( x ( i ) ) − y ( i ) ) (2) \frac{\partial J(w,b)}{\partial b} = \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)}) \tag{2} ∂b∂J(w,b)=m1i=0∑m−1(fw,b(x(i))−y(i))(2)
∂ J ( w , b ) ∂ w = 1 m ∑ i = 0 m − 1 ( f w , b ( x ( i ) ) − y ( i ) ) x ( i ) (3) \frac{\partial J(w,b)}{\partial w} = \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) -y^{(i)})x^{(i)} \tag{3} ∂w∂J(w,b)=m1i=0∑m−1(fw,b(x(i))−y(i))x(i)(3)
m is the number of training examples in the dataset
f w , b ( x ( i ) ) f_{w,b}(x^{(i)}) fw,b(x(i)) is the model’s prediction, while y ( i ) y^{(i)} y(i), is the target value
You will implement a function called compute_gradient
which calculates ∂ J ( w ) ∂ w \frac{\partial J(w)}{\partial w} ∂w∂J(w), ∂ J ( w ) ∂ b \frac{\partial J(w)}{\partial b} ∂b∂J(w)
Please complete the compute_gradient
function to:
Iterate over the training examples, and for each example, compute:
The prediction of the model for that example
f w b ( x ( i ) ) = w x ( i ) + b f_{wb}(x^{(i)}) = wx^{(i)} + b fwb(x(i))=wx(i)+b
The gradient for the parameters w , b w, b w,b from that example
∂ J ( w , b ) ∂ b ( i ) = ( f w , b ( x ( i ) ) − y ( i ) ) \frac{\partial J(w,b)}{\partial b}^{(i)} = (f_{w,b}(x^{(i)}) - y^{(i)}) ∂b∂J(w,b)(i)=(fw,b(x(i))−y(i))
∂ J ( w , b ) ∂ w ( i ) = ( f w , b ( x ( i ) ) − y ( i ) ) x ( i ) \frac{\partial J(w,b)}{\partial w}^{(i)} = (f_{w,b}(x^{(i)}) -y^{(i)})x^{(i)} ∂w∂J(w,b)(i)=(fw,b(x(i))−y(i))x(i)
Return the total gradient update from all the examples
∂ J ( w , b ) ∂ b = 1 m ∑ i = 0 m − 1 ∂ J ( w , b ) ∂ b ( i ) \frac{\partial J(w,b)}{\partial b} = \frac{1}{m} \sum\limits_{i = 0}^{m-1} \frac{\partial J(w,b)}{\partial b}^{(i)} ∂b∂J(w,b)=m1i=0∑m−1∂b∂J(w,b)(i)
∂ J ( w , b ) ∂ w = 1 m ∑ i = 0 m − 1 ∂ J ( w , b ) ∂ w ( i ) \frac{\partial J(w,b)}{\partial w} = \frac{1}{m} \sum\limits_{i = 0}^{m-1} \frac{\partial J(w,b)}{\partial w}^{(i)} ∂w∂J(w,b)=m1i=0∑m−1∂w∂J(w,b)(i)
If you get stuck, you can check out the hints presented after the cell below to help you with the implementation.
# UNQ_C2
# GRADED FUNCTION: compute_gradient
def compute_gradient(x, y, w, b):
"""
Computes the gradient for linear regression
Args:
x (ndarray): Shape (m,) Input to the model (Population of cities)
y (ndarray): Shape (m,) Label (Actual profits for the cities)
w, b (scalar): Parameters of the model
Returns
dj_dw (scalar): The gradient of the cost w.r.t. the parameters w
dj_db (scalar): The gradient of the cost w.r.t. the parameter b
"""
# Number of training examples
m = x.shape[0]
# You need to return the following variables correctly
dj_dw = 0
dj_db = 0
### START CODE HERE ###
dj_dw = ((w * x + b - y) * x).mean()
dj_db = (w * x + b - y).mean()
### END CODE HERE ###
return dj_dw, dj_db
Run the cells below to check your implementation of the compute_gradient
function with two different initializations of the parameters w w w, b b b.
# Compute and display gradient with w initialized to zeroes
initial_w = 0
initial_b = 0
tmp_dj_dw, tmp_dj_db = compute_gradient(x_train, y_train, initial_w, initial_b)
print('Gradient at initial w, b (zeros):', tmp_dj_dw, tmp_dj_db)
compute_gradient_test(compute_gradient)
Gradient at initial w, b (zeros): -65.32884974555671 -5.839135051546393
Using X with shape (4, 1)
[92mAll tests passed!
Now let’s run the gradient descent algorithm implemented above on our dataset.
Expected Output:
Gradient at initial , b (zeros) | -65.32884975 -5.83913505154639 |
# Compute and display cost and gradient with non-zero w
test_w = 0.2
test_b = 0.2
tmp_dj_dw, tmp_dj_db = compute_gradient(x_train, y_train, test_w, test_b)
print('Gradient at test w, b:', tmp_dj_dw, tmp_dj_db)
Gradient at test w, b: -47.41610118114433 -4.007175051546392
Expected Output:
Gradient at test w | -47.41610118 -4.007175051546391 |
You will now find the optimal parameters of a linear regression model by using batch gradient descent. Recall batch refers to running all the examples in one iteration.
You don’t need to implement anything for this part. Simply run the cells below.
A good way to verify that gradient descent is working correctly is to look
at the value of J ( w , b ) J(w,b) J(w,b) and check that it is decreasing with each step.
Assuming you have implemented the gradient and computed the cost correctly and you have an appropriate value for the learning rate alpha, J ( w , b ) J(w,b) J(w,b) should never increase and should converge to a steady value by the end of the algorithm.
def gradient_descent(x, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters):
"""
Performs batch gradient descent to learn theta. Updates theta by taking
num_iters gradient steps with learning rate alpha
Args:
x : (ndarray): Shape (m,)
y : (ndarray): Shape (m,)
w_in, b_in : (scalar) Initial values of parameters of the model
cost_function: function to compute cost
gradient_function: function to compute the gradient
alpha : (float) Learning rate
num_iters : (int) number of iterations to run gradient descent
Returns
w : (ndarray): Shape (1,) Updated values of parameters of the model after
running gradient descent
b : (scalar) Updated value of parameter of the model after
running gradient descent
"""
# number of training examples
m = len(x)
# An array to store cost J and w's at each iteration — primarily for graphing later
J_history = []
w_history = []
w = copy.deepcopy(w_in) #avoid modifying global w within function
b = b_in
for i in range(num_iters):
# Calculate the gradient and update the parameters
dj_dw, dj_db = gradient_function(x, y, w, b )
# Update Parameters using w, b, alpha and gradient
w = w - alpha * dj_dw
b = b - alpha * dj_db
# Save cost J at each iteration
if i<100000: # prevent resource exhaustion
cost = cost_function(x, y, w, b)
J_history.append(cost)
# Print cost every at intervals 10 times or as many iterations if < 10
if i% math.ceil(num_iters/10) == 0:
w_history.append(w)
print(f"Iteration {i:4}: Cost {float(J_history[-1]):8.2f} ")
return w, b, J_history, w_history #return w and J,w history for graphing
Now let’s run the gradient descent algorithm above to learn the parameters for our dataset.
# initialize fitting parameters. Recall that the shape of w is (n,)
initial_w = 0.
initial_b = 0.
# some gradient descent settings
iterations = 1500
alpha = 0.01
w,b,_,_ = gradient_descent(x_train ,y_train, initial_w, initial_b,
compute_cost, compute_gradient, alpha, iterations)
print("w,b found by gradient descent:", w, b)
Iteration 0: Cost 6.74
Iteration 150: Cost 5.31
Iteration 300: Cost 4.96
Iteration 450: Cost 4.76
Iteration 600: Cost 4.64
Iteration 750: Cost 4.57
Iteration 900: Cost 4.53
Iteration 1050: Cost 4.51
Iteration 1200: Cost 4.50
Iteration 1350: Cost 4.49
w,b found by gradient descent: 1.166362350335582 -3.6302914394043597
Expected Output:
w, b found by gradient descent | 1.16636235 -3.63029143940436 |
We will now use the final parameters from gradient descent to plot the linear fit.
Recall that we can get the prediction for a single example f ( x ( i ) ) = w x ( i ) + b f(x^{(i)})= wx^{(i)}+b f(x(i))=wx(i)+b.
To calculate the predictions on the entire dataset, we can loop through all the training examples and calculate the prediction for each example. This is shown in the code block below.
m = x_train.shape[0]
predicted = np.zeros(m)
for i in range(m):
predicted[i] = w * x_train[i] + b
We will now plot the predicted values to see the linear fit.
# Plot the linear fit
plt.plot(x_train, predicted, c = "b")
# Create a scatter plot of the data.
plt.scatter(x_train, y_train, marker='x', c='r')
# Set the title
plt.title("Profits vs. Population per city")
# Set the y-axis label
plt.ylabel('Profit in $10,000')
# Set the x-axis label
plt.xlabel('Population of City in 10,000s')
Text(0.5, 0, 'Population of City in 10,000s')
Your final values of w , b w,b w,b can also be used to make predictions on profits. Let’s predict what the profit would be in areas of 35,000 and 70,000 people.
The model takes in population of a city in 10,000s as input.
Therefore, 35,000 people can be translated into an input to the model as np.array([3.5])
Similarly, 70,000 people can be translated into an input to the model as np.array([7.])
predict1 = 3.5 * w + b
print('For population = 35,000, we predict a profit of $%.2f' % (predict1*10000))
predict2 = 7.0 * w + b
print('For population = 70,000, we predict a profit of $%.2f' % (predict2*10000))
For population = 35,000, we predict a profit of $4519.77
For population = 70,000, we predict a profit of $45342.45
Expected Output:
For population = 35,000, we predict a profit of | $4519.77 |