本文主要参考:Paul Teetor《R语言经典实例》一书
在R语言中,包含的包中有各种应用函数;
1.install.packages(‘packagename’)//安装R包
library(package_name)//载入包,对于base包可省略
2.library(help=”package_name”)//显示包的帮助,包括包的版本和包中函数
数据:对于R语言自带数据大多存在于utils包中,可以直接使用;
show(Titanic)
对于其他安装包中的数据载入
data(Cars93,package=”MASS”)
查看安装包的内容
data(“MASS”)
将从MASS包中载入Cars93数据
3.help(name)=?mean//help文档;
尤其的,help(name)可以用来进一步学习数据包/函数包中的属性内容、
help.start()//启动html文档
args(functionname)//返回函数的参数列表
args(lm)
function (formula, data, subset, weights, na.action, method = "qr",
model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE,
contrasts = NULL, offset, ...)
NULL
example(functionname)//查看函数的使用实例
help.search(“name”)=??name//搜索本地计算机上已安装的帮助文档
help.search(“name”)//搜索所有包含该函数的R包列表
help(name,package=”packagename”)//在help.search()返回函数所在包中查询帮助文档
help(package=”tseries’)//查看安装的软件包的内容
>help(package="tseries")
关于程辑包‘tseries’的信息
描述:
Package: tseries
Version: 0.10-34
Title: Time Series Analysis and Computational Finance
Authors@R: c(person("Adrian", "Trapletti", role = "aut", email
= "adrian@trapletti.org"), person("Kurt", "Hornik",
role = c("aut", "cre"), email =
"Kurt.Hornik@R-project.org"), person("Blake",
"LeBaron", role = "ctb", comment = "BDS test
code"))
Description: Time series analysis and computational finance.
Depends: R (>= 2.10.0)
Suggests: its
Imports: graphics, stats, utils, quadprog, zoo
License: GPL-2
Packaged: 2015-02-20 12:43:13 UTC; hornik
Author: Adrian Trapletti [aut], Kurt Hornik [aut, cre],
Blake LeBaron [ctb] (BDS test code)
Maintainer: Kurt Hornik @R-project.org>
RSiteSearch(“key phrase”)//通过网络搜索信息
RSiteSearch("RSS");//返回对RSS的网络解释
4.回归分析中的英文:
sst:sum of square of Total=TSS:total sum of squares
sse:sum of square for Error=RSS:residual sum of squares
ssr:sum of square for regression=ESS:explained sum of squares
residual:残余的,残差;
Residual Standard Error:标准差;
intercept:截距;
Multiple R-squared:??;adjusted R-squared:??;
df:degree of freedom;anova:analysis of variance;//方差分析;
AIC:Akaike information criterion,最小信息准则;
RSS:'1.residual sum of squares表示一组统计量的残差平方和;'
'2.root of sum of squares表示一组统计量平方和的开方值;'
**这两种说法有一定的语境**
sum of sq:偏回归平方和,注意平回归平方和表示剔除该变量后回归平方和减少的数值;
5.数据对象: 基本类型:numeric(),character(),complex(),logical()
复合类型/递归类型:list,function,expression mode()//返回类型;length()//返回长度;attributes(object)//返回对象的
1:3//1,2,3;
所有非基本属性(基本属性包括mode和length);attr(object,name)//用于选取
指定的属性;attr(z,"dim")<-c(10,10)//将z的dim属性重写为10,10,即将z属性
变为一个10*10矩阵;数组的维数dim(z)的升序是从第一位开始,即z[1,1],z[2,1]
z[3,1],z[1,2]...但对于矩阵而言维数的意义相同,第一个表示行,第二个表示列;
6.向量赋值时,当向量索引超过定义长度时,赋值后的向量被延长,超出的长度未被赋值的元素值为NA;
7.表达: 2*1:3//2,4,6;
R语言在定义中设置属性值,如:
x<-array(1:20,dim<-c(4,5))//表示将array的维数设为4*5,数组元素是1:20;
i<-array(c(1:3,3:1),dim<-c(3,2));//输出为
[,1][,2]
[1,] 1 3
[2,] 2 2
[3,] 3 1
x[i]=0;//
类似的
matrix(矩阵值,维数)//生成矩阵
8.step()逐步回归法的实例
> summary(lm1 <- lm(Fertility ~ ., data = swiss))
Call:
lm(formula = Fertility ~ ., data = swiss)
Residuals:
Min 1Q Median 3Q Max
-15.2743 -5.2617 0.5032 4.1198 15.3213
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 66.91518 10.70604 6.250 1.91e-07 ***
Agriculture -0.17211 0.07030 -2.448 0.01873 *
Examination -0.25801 0.25388 -1.016 0.31546
Education -0.87094 0.18303 -4.758 2.43e-05 ***
Catholic 0.10412 0.03526 2.953 0.00519 **
Infant.Mortality 1.07705 0.38172 2.822 0.00734 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.165 on 41 degrees of freedom
Multiple R-squared: 0.7067, Adjusted R-squared: 0.671
F-statistic: 19.76 on 5 and 41 DF, p-value: 5.594e-10
> slm1 <- step(lm1)
Start: AIC=190.69
Fertility ~ Agriculture + Examination + Education + Catholic +
Infant.Mortality
Df Sum of Sq RSS AIC
- Examination 1 53.03 2158.1 189.86
2105.0 190.69
- Agriculture 1 307.72 2412.8 195.10
- Infant.Mortality 1 408.75 2513.8 197.03
- Catholic 1 447.71 2552.8 197.75
- Education 1 1162.56 3267.6 209.36
Step: AIC=189.86
Fertility ~ Agriculture + Education + Catholic + Infant.Mortality
Df Sum of Sq RSS AIC
2158.1 189.86
- Agriculture 1 264.18 2422.2 193.29
- Infant.Mortality 1 409.81 2567.9 196.03
- Catholic 1 956.57 3114.6 205.10
- Education 1 2249.97 4408.0 221.43
> summary(slm1)
Call:
lm(formula = Fertility ~ Agriculture + Education + Catholic +
Infant.Mortality, data = swiss)
Residuals:
Min 1Q Median 3Q Max
-14.6765 -6.0522 0.7514 3.1664 16.1422
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 62.10131 9.60489 6.466 8.49e-08 ***
Agriculture -0.15462 0.06819 -2.267 0.02857 *
Education -0.98026 0.14814 -6.617 5.14e-08 ***
Catholic 0.12467 0.02889 4.315 9.50e-05 ***
Infant.Mortality 1.07844 0.38187 2.824 0.00722 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.168 on 42 degrees of freedom
Multiple R-squared: 0.6993, Adjusted R-squared: 0.6707
F-statistic: 24.42 on 4 and 42 DF, p-value: 1.717e-10
> slm1$anova
Step Df Deviance Resid. Df Resid. Dev AIC
1 NA NA 41 2105.043 190.6913
2 - Examination 1 53.02656 42 2158.069 189.8606
>
数据类型转换
#1.基本(原子)数据类型之间的转换
as.character(x)
as.complex(x)
as.numeric(x)
as.integer(x)
as.logical(x)
#2.结构化数据类型间转换
as.data.frame(x)
as.list(x)
as.matrix(x)
as.vector(x)
#更多具体变换见《R语言经典实例》P156
添加某一列,假设d为list
#注意该方法对data_frame类型的数据有效,因此要先转换为data_frame类型
d0<-as.data.frame(d)
d0$column_name<-column_value
#这样d0中就加入了column_name这列,值为向量column_name
删除某一列/列
#删除第一行
datatest<-datatest[-1]
#删除第二行
datatest<-datatest[-2]
#删除第一列
datatest<-datatest[,-1]
#删除第二列
datatest<-datatest[,-2]
按某一列排序
#按datatest的第五列,升序排列
datatest[order(datatest[,5],decreasing=F),]
#同理可类推其他排序方法