在统计学中,线性回归(Linear regression)是利用称为线性回归方程的最小二乘函数对一个或多个自变量和因变量之间关系进行建模的一种回归分析。这种函数是一个或多个称为回归系数的模型参数的线性组合。只有一个自变量的情况称为简单回归,大于一个自变量情况的叫做多元回归。(这反过来又应当由多个相关的因变量预测的多元线性回归区别,而不是一个单一的标量变量。)




  1. 如果目标是预测或者映射,线性回归可以用来对观测数据集的和X的值拟合出一个预测模型。当完成这样一个模型以后,对于一个新增的X值,在没有给定与它相配对的y的情况下,可以用这个拟合过的模型预测出一个y值。
  2. 给定一个变量y和一些变量X1...Xp,这些变量有可能与y相关,线性回归分析可以用来量化y与Xj之间相关性的强度,评估出与y不相关的Xj,并识别出哪些Xj的子集包含了关于y的冗余信息。



#!/usr/bin/env python
# filename: linear regression

import tensorflow as tf
import numpy
import matplotlib.pyplot as plt
rng = numpy.random

# Parameters
learning_rate = 0.01
training_epochs = 1000
display_step = 50

# Training Data
train_X = numpy.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,
train_Y = numpy.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,
n_samples = train_X.shape[0]

# tf Graph Input
X = tf.placeholder("float")
Y = tf.placeholder("float")
# Set model weights
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")

# Construct a linear model
pred = tf.add(tf.multiply(X, W), b)

# Mean squared error
cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
# Gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

# Start training
with tf.Session() as sess:
    # Fit all training data
    for epoch in range(training_epochs):
        for (x, y) in zip(train_X, train_Y):
            sess.run(optimizer, feed_dict={X: x, Y: y})
        #Display logs per epoch step
        if (epoch+1) % display_step == 0:
            c = sess.run(cost, feed_dict={X: train_X, Y:train_Y})
            print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c), \
                "W=", sess.run(W), "b=", sess.run(b))
    print("Optimization Finished!")
    training_cost = sess.run(cost, feed_dict={X: train_X, Y: train_Y})
    print("Training cost=", training_cost, "W=", sess.run(W), "b=", sess.run(b), '\n')
    #Graphic display
    plt.plot(train_X, train_Y, 'ro', label='Original data')
    plt.plot(train_X, sess.run(W) * train_X + sess.run(b), label='Fitted line')


Epoch: 0050 cost= 0.290731698 W= -0.00836047 b= 2.6571696
Epoch: 0100 cost= 0.266084820 W= 0.006981368 b= 2.5468016
Epoch: 0150 cost= 0.244280651 W= 0.02141102 b= 2.4429953
Epoch: 0200 cost= 0.224992946 W= 0.03498184 b= 2.3453684
Epoch: 0250 cost= 0.207929820 W= 0.047745574 b= 2.2535467
Epoch: 0300 cost= 0.192835465 W= 0.05974988 b= 2.167189
Epoch: 0350 cost= 0.179481432 W= 0.07104068 b= 2.085963
Epoch: 0400 cost= 0.167667642 W= 0.081660114 b= 2.0095677
Epoch: 0450 cost= 0.157216385 W= 0.091648005 b= 1.9377158
Epoch: 0500 cost= 0.147970721 W= 0.10104162 b= 1.8701389
Epoch: 0550 cost= 0.139791384 W= 0.109876454 b= 1.8065817
Epoch: 0600 cost= 0.132555142 W= 0.11818599 b= 1.7468032
Epoch: 0650 cost= 0.126153350 W= 0.12600137 b= 1.69058
Epoch: 0700 cost= 0.120489597 W= 0.13335216 b= 1.637699
Epoch: 0750 cost= 0.115478866 W= 0.14026573 b= 1.5879631
Epoch: 0800 cost= 0.111046173 W= 0.1467677 b= 1.5411884
Epoch: 0850 cost= 0.107124634 W= 0.15288284 b= 1.497197
Epoch: 0900 cost= 0.103655063 W= 0.15863435 b= 1.4558206
Epoch: 0950 cost= 0.100585289 W= 0.1640441 b= 1.4169033
Epoch: 1000 cost= 0.097869352 W= 0.16913211 b= 1.3803006
Optimization Finished!
Training cost= 0.09786935 W= 0.16913211 b= 1.3803006 

2. 使用eager:

#!/usr/bin/env python
# filename: linear regression eager api

from __future__ import absolute_import, division, print_function
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import tensorflow.contrib.eager as tfe

# Set Eager API

# Training Data
train_X = [3.3, 4.4, 5.5, 6.71, 6.93, 4.168, 9.779, 6.182, 7.59, 2.167,
           7.042, 10.791, 5.313, 7.997, 5.654, 9.27, 3.1]
train_Y = [1.7, 2.76, 2.09, 3.19, 1.694, 1.573, 3.366, 2.596, 2.53, 1.221,
           2.827, 3.465, 1.65, 2.904, 2.42, 2.94, 1.3]
n_samples = len(train_X)
# Parameters
learning_rate = 0.01
display_step = 100
num_steps = 1000

# Weight and Bias
W = tfe.Variable(np.random.randn())
b = tfe.Variable(np.random.randn())
# Linear regression (Wx + b)
def linear_regression(inputs):
    return inputs * W + b
# Mean square error
def mean_square_fn(model_fn, inputs, labels):
    return tf.reduce_sum(tf.pow(model_fn(inputs) - labels, 2)) / (2 * n_samples)

# SGD Optimizer
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
# Compute gradients
grad = tfe.implicit_gradients(mean_square_fn)

# Initial cost, before optimizing
print("Initial cost= {:.9f}".format(
    mean_square_fn(linear_regression, train_X, train_Y)),
    "W=", W.numpy(), "b=", b.numpy())
# Training
for step in range(num_steps):
    optimizer.apply_gradients(grad(linear_regression, train_X, train_Y))
    if (step + 1) % display_step == 0 or step == 0:
        print("Epoch:", '%04d' % (step + 1), "cost=",
              "{:.9f}".format(mean_square_fn(linear_regression, train_X, train_Y)),
              "W=", W.numpy(), "b=", b.numpy())
# Graphic display
plt.plot(train_X, train_Y, 'ro', label='Original data')
plt.plot(train_X, np.array(W * train_X + b), label='Fitted line')


Initial cost= 0.351724952 W= 0.120354496 b= 2.2911706
Epoch: 0001 cost= 0.254229784 W= 0.08552867 b= 2.2844243
Epoch: 0100 cost= 0.183475479 W= 0.0665762 b= 2.1107843
Epoch: 0200 cost= 0.160497516 W= 0.08774154 b= 1.9607316
Epoch: 0300 cost= 0.142475009 W= 0.10648614 b= 1.8278408
Epoch: 0400 cost= 0.128339261 W= 0.12308694 b= 1.7101487
Epoch: 0500 cost= 0.117252141 W= 0.13778897 b= 1.605918
Epoch: 0600 cost= 0.108556032 W= 0.15080954 b= 1.513608
Epoch: 0700 cost= 0.101735294 W= 0.16234097 b= 1.4318552
Epoch: 0800 cost= 0.096385524 W= 0.1725536 b= 1.3594522
Epoch: 0900 cost= 0.092189491 W= 0.18159817 b= 1.2953304
Epoch: 1000 cost= 0.088898353 W= 0.18960832 b= 1.2385421

