Kaggle房价预测

Kaggle房价预测
作为深度学习基础篇章的总结,我们将对本章内容学以致用。下面,让我们动手实战一个Kaggle比赛:房价预测。本节将提供未经调优的数据的预处理、模型的设计和超参数的选择。我们希望读者通过动手操作、仔细观察实验现象、认真分析实验结果并不断调整方法,得到令自己满意的结果。

%matplotlib inline
import torch
import torch.nn as nn
import numpy as np
import pandas as pd
import sys
sys.path.append("/home/")
import d2lzh1981 as d2l
print(torch.__version__)
torch.set_default_tensor_type(torch.FloatTensor)

test_data = pd.read_csv("/home/houseprices2807/house-prices-advanced-regression-techniques/test.csv")
train_data = pd.read_csv("/home/kehouseprices2807/house-prices-advanced-regression-techniques/train.csv")
all_features = pd.concat((train_data.iloc[:, 1:-1], test_data.iloc[:, 1:]))

#预处理数据,我们对连续数值的特征做标准化(standardization):
#设该特征在整个数据集上的均值为 μ ,标准差为 σ 。那么,我们可以将该特征的每个值先减去 μ 再除以 σ 得到标准化后的每个特征值。对于缺失的特征值,我们将其替换成该特征的均值
numeric_features = all_features.dtypes[all_features.dtypes != 'object'].index
all_features[numeric_features] = all_features[numeric_features].apply(
    lambda x: (x - x.mean()) / (x.std()))
# 标准化后,每个数值特征的均值变为0,所以可以直接用0来替换缺失值
all_features[numeric_features] = all_features[numeric_features].fillna(0)

all_features = pd.get_dummies(all_features, dummy_na=True)

n_train = train_data.shape[0]
train_features = torch.tensor(all_features[:n_train].values, dtype=torch.float)
test_features = torch.tensor(all_features[n_train:].values, dtype=torch.float)
train_labels = torch.tensor(train_data.SalePrice.values, dtype=torch.float).view(-1, 1)

loss = torch.nn.MSELoss()

def get_net(feature_num):
    net = nn.Linear(feature_num, 1)
    for param in net.parameters():
        nn.init.normal_(param, mean=0, std=0.01)
    return net

def log_rmse(net, features, labels):
    with torch.no_grad():
        # 将小于1的值设成1,使得取对数时数值更稳定
        clipped_preds = torch.max(net(features), torch.tensor

你可能感兴趣的:(深度学习pytorch,深度学习,机器学习,python,人工智能,神经网络)