R读取数据
1,read .csv
data<-read.csv("E:\\necessary\\huba\\R\\table.csv")
默认header=F
2,read .txt
data<-read.table("E:\\necessary\\huba\\R\\table.txt")
note:如果.txt里面含有中文,需要补充encoding = "UTF-8"
实例分析四步走(数据输入->研究分布->样本检测-〉回归预测)
1,数据输入
> c1<-read.table("E:\\necessary\\huba\\R\\table.txt",col.names=c("name","sex","age","height","weight"),row.names = "name")
2,研究身高分布。
a,身高分布
b,身高与其他的关系
> pairs(cbind(height,weight,age))
从上图不能明确的看出各自的关系,故逐个分析
> oldpar=par(mfcol=c(1,3))
There were 11 warnings (use warnings() to see them)
> boxplot(weight~sex,ylab="weight")
> boxplot(height~sex,ylab="height")
> boxplot(age~sex,ylab="age")
> par(oldpar)
从上图可知,男女身高有明显差异。
3,样本检验
> attach(c1)
> t.test(height~sex,conf.level=0.99)
Welch Two Sample t-test
data: height by sex
t = -0.79241, df = 2.1575, p-value =
0.5059
alternative hypothesis: true difference in means is not equal to 0
99 percent confidence interval:
-90.68703 75.68703
sample estimates:
mean in group F mean in group M
160.0 167.5
p>0.01(检验水平),故男女身高差异显著。
4,回归分析
> lm.fit1=lm(weight~height,data=c1)
> lm.fit1
Call:
lm(formula = weight ~ height, data = c1)
Coefficients:
(Intercept) height
-85.3553 0.8868
拟合模型方程为weight=-85.3553+0.8868height
> summary(lm.fit1)
Call:
lm(formula = weight ~ height, data = c1)
Residuals:
王 李 张 陈 赵
2.3289 -1.5395 -5.4079 5.1579 -0.5395
Coefficients:
Estimate Std. Error t value
(Intercept) -85.3553 38.6565 -2.208
height 0.8868 0.2368 3.745
Pr(>|t|)
(Intercept) 0.1143
height 0.0332 *
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1
‘ ’ 1
Residual standard error: 4.616 on 3 degrees of freedom
Multiple R-squared: 0.8238, Adjusted R-squared: 0.765
F-statistic: 14.02 on 1 and 3 DF, p-value: 0.03323
复相关系数平方为0.8238,p=0.03323,模型是显著的。
#观察回归效果
> oldpar=par(mfrow=c(2,2),mar=c(2.5,2,1.5,0.2),mgp=c(1.2,0.2,0))
> plot(lm.fit1)
#加入其他变量能否改善模型预报能力
> add1(lm.fit1,~.+age+sex)
Single term additions
Model:
weight ~ height
Df Sum of Sq RSS AIC
<none> 63.934 16.742
age 1 11.624 52.311 17.739
sex 1 13.268 50.667 17.579
AIC越小越好。
预测
> predict(lm.fit1)
王 李 张 陈 赵
47.67105 56.53947 65.40789 69.84211 56.53947
> plot(weight)
> plot(lm.fit1,col="blue",pch=8)
也可加入新数据预测
> new.data=data.frame(height=c(150,160),sex=c("M","F"))
> predict(lm.fit1,new.data)