小白学习R语言——回归分析实例之男女身高体重

R读取数据

1,read .csv

data<-read.csv("E:\\necessary\\huba\\R\\table.csv")

默认header=F

2,read .txt

data<-read.table("E:\\necessary\\huba\\R\\table.txt")

note:如果.txt里面含有中文,需要补充encoding = "UTF-8"

实例分析四步走(数据输入->研究分布->样本检测-〉回归预测)

1,数据输入

> c1<-read.table("E:\\necessary\\huba\\R\\table.txt",col.names=c("name","sex","age","height","weight"),row.names = "name")

2,研究身高分布。

a,身高分布





b,身高与其他的关系

> pairs(cbind(height,weight,age))


从上图不能明确的看出各自的关系,故逐个分析

> oldpar=par(mfcol=c(1,3))
There were 11 warnings (use warnings() to see them)
> boxplot(weight~sex,ylab="weight")
> boxplot(height~sex,ylab="height")

> boxplot(age~sex,ylab="age")
> par(oldpar)


从上图可知,男女身高有明显差异。

3,样本检验

> attach(c1)

> t.test(height~sex,conf.level=0.99)


Welch Two Sample t-test


data:  height by sex
t = -0.79241, df = 2.1575, p-value =
0.5059
alternative hypothesis: true difference in means is not equal to 0
99 percent confidence interval:
 -90.68703  75.68703
sample estimates:
mean in group F mean in group M 
          160.0           167.5 

p>0.01(检验水平),故男女身高差异显著。

4,回归分析

> lm.fit1=lm(weight~height,data=c1)
> lm.fit1


Call:
lm(formula = weight ~ height, data = c1)


Coefficients:
(Intercept)       height  
   -85.3553       0.8868

拟合模型方程为weight=-85.3553+0.8868height

> summary(lm.fit1)


Call:
lm(formula = weight ~ height, data = c1)


Residuals:
     王      李      张      陈      赵 
 2.3289 -1.5395 -5.4079  5.1579 -0.5395 


Coefficients:
            Estimate Std. Error t value
(Intercept) -85.3553    38.6565  -2.208
height        0.8868     0.2368   3.745
            Pr(>|t|)  
(Intercept)   0.1143  
height        0.0332 *
---
Signif. codes:  
  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1
  ‘ ’ 1


Residual standard error: 4.616 on 3 degrees of freedom
Multiple R-squared:  0.8238, Adjusted R-squared:  0.765 
F-statistic: 14.02 on 1 and 3 DF,  p-value: 0.03323

复相关系数平方为0.8238,p=0.03323,模型是显著的。

#观察回归效果

> oldpar=par(mfrow=c(2,2),mar=c(2.5,2,1.5,0.2),mgp=c(1.2,0.2,0))
> plot(lm.fit1)

#加入其他变量能否改善模型预报能力

> add1(lm.fit1,~.+age+sex)
Single term additions


Model:
weight ~ height
       Df Sum of Sq    RSS    AIC
             63.934 16.742
age     1    11.624 52.311 17.739
sex     1    13.268 50.667 17.579

AIC越小越好。

预测

> predict(lm.fit1)
      王       李       张       陈       赵 
47.67105 56.53947 65.40789 69.84211 56.53947 
> plot(weight)
> plot(lm.fit1,col="blue",pch=8)

也可加入新数据预测

> new.data=data.frame(height=c(150,160),sex=c("M","F"))
> predict(lm.fit1,new.data)




你可能感兴趣的:(R语言)