【Tensorflow】训练多特征模型

说明:

这个学习代码来自于Google的COLAB, 原链接点击here.

课程目的:

1、用多个特征代替单个特征,来提高模型的有效性
2、调试输入数据的异常值
3、用测试集验证模型是否过度拟合于验证集

准备环境,如果不清楚可以看我写的第一篇博客,点击此处

【Tensorflow】训练多特征模型_第1张图片

特征预处理:

【Tensorflow】训练多特征模型_第2张图片

【Tensorflow】训练多特征模型_第3张图片

检查数据:

【Tensorflow】训练多特征模型_第4张图片

【Tensorflow】训练多特征模型_第5张图片

【Tensorflow】训练多特征模型_第6张图片

【Tensorflow】训练多特征模型_第7张图片

打乱代码:

california_housing_dataframe = california_housing_dataframe.reindex(np.random.permutation(california_housing_dataframe.index))

【Tensorflow】训练多特征模型_第8张图片

【Tensorflow】训练多特征模型_第9张图片

模型代码:

def train_model(learning_rate, steps, batch_size, training_examples, 
                training_targets, validation_examples, validation_targets):
    """ Trains a linear regression model of multiple features. 
    
    In addition to training, this function also prints training progress information, 
    as well as a plot of tht training and validation loss over time.
    Args:
        learning_rate: A float, the learning rate
        steps: A non-zero int, the total number of training steps. A training step
         consists of a forward and backward pass using a single batch. 
        batch_size: A non-zero int, the batch size 
        training_examples: A dataframe containing one or more columns from california_housing_dataframe to use 
        as input feature for training 
        trainging_targets: A dataframe containing exactly one column from california_housing_dataframe to use 
        as a target for training. 
        validation_examples: A dataframe containing one or more columns from california_housing_dataframe to use 
        as input feature for validation 
        validation_targets: A dataframe containing exactly one column from california_housing_dataframe to use 
        as a target for validation. 
    """
     
    # step1: initialize some data and prepare input function 
    
    periods = 10 
    steps_per_period = steps / periods
    
    # Create a linear regressor object.
    my_optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
    my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer, 5.0)
    linear_regressor  = tf.estimator.LinearRegressor(
        feature_columns=construct_feature_columns(training_examples),
        optimizer=my_optimizer
    )
    
    # Create input function 
    training_input_fn = lambda: my_input_fn(features=training_examples, 
                                            targets=training_targets["median_house_value"], 
                                            batch_size=batch_size)
    predict_training_input_fn = lambda: my_input_fn(features=training_examples, 
                                                    targets=training_targets["median_house_value"], 
                                                    batch_size=batch_size,
                                                    num_epochs=1,
                                                    shuffle=False)
    predict_validation_input_fn = lambda: my_input_fn(features=validation_examples, 
                                                      targets=validation_targets["median_house_value"], 
                                                      batch_size=batch_size,
                                                      num_epochs=1,
                                                      shuffle=False)
    
    # 
    print('Training model...')
    print('RMSE( on the training data):')
    
    training_rmse = []
    validation_rmse = []
    for period in range(0, periods):
        linear_regressor.train(
            input_fn=training_input_fn,
            steps=steps_per_period,
        )
        
        # 2. Take a break and compute predictions
        training_predictions = linear_regressor.predict(input_fn=predict_training_input_fn)
        training_predictions = np.array([item['predictions'][0] for item in training_predictions])
        
        validation_predictions = linear_regressor.predict(input_fn=predict_validation_input_fn)
        validation_predictions = np.array([item['predictions'][0] for item in validation_predictions])       
        
    
        # Compute the training and validition loss 
        training_root_mean_squared_error = math.sqrt(
            metrics.mean_squared_error(training_predictions, training_targets))
        validation_root_mean_squared_error = math.sqrt(
            metrics.mean_squared_error(validation_predictions, validation_targets))
        
        # Occasionally print the current loss 
        print('Period %02d : %.02f' % (period, training_root_mean_squared_error))
        
        # Add the loss metrics from this period to our list. 
        training_rmse.append(training_root_mean_squared_error)
        validation_rmse.append(validation_root_mean_squared_error)
        
    print('Model training finished')
    
    # Output a graph of loss metrics over periods.
    plt.ylabel('RMSE')
    plt.xlabel('Periods')
    plt.title('Root Mean Squared Error vs Periods')
    plt.tight_layout()
    plt.plot(training_rmse, label="training")
    plt.plot(validation_rmse, label="validation")
    plt.legend()
    
    return linear_regressor

【Tensorflow】训练多特征模型_第10张图片

【Tensorflow】训练多特征模型_第11张图片

【Tensorflow】训练多特征模型_第12张图片

个人几点总结:¶

1、数据训练分析关键步骤: 基础检查---特征预处理---异常剔除---编写输入函数---构建tf特征列---模型训练---模型优化---模型测试集测试---模型优化
2、在输入方法测试模型时候注意两个参数 num_epochs=1和shuffle=False,因为测试集上模型运行一遍就OK。
3、在分割训练集合和验证集合数据时候,一定要将原始数据打乱
4、在输入my_input_fn时候,targets要加上特征,应该是targets['median_house_value'],这样就是Pandas的series格式
5、Google公开的代码无论是范例或者开源代码,代码的质量都很高,这也是我为什么要自己花时间重写一遍。

你可能感兴趣的:(机器学习)