Data Pre_process

  • Analysing variables
1.variable names    2.variables' type(numerical/categorical)
3.variables' segment    4.expectation(label)
5.variables' correlation matrix
  • Analysing Label
  1. statistics summary: label.describe()
histogram: sns.distplot(label, fit=norm) 
            fig = plt.figure()
            res = stats.probplot(train['SalePrice'], plot=plt)
            plt.show()
(Kurtosis and skewness. Deviate from the normal distribution.)
  1. Relationship with variables
scatter plots   # numerical variables visualization
    fig, ax = plt.subplots()
    ax.scatter(x = train['GrLivArea'], y = train['SalePrice'])
    plt.ylabel('SalePrice', fontsize=13)
    plt.xlabel('GrLivArea', fontsize=13)
    plt.show()
    
box plots       # categorical variables visualization

你可能感兴趣的:(Data Pre_process)