变量重命名
names()
, colnames()
, rownames()
> score <- data.frame(student=c('A', 'B', 'C', 'D'),gender=c('M', 'M', 'F', 'F'))
> score$math <- c(90, 70, 80, 60)
> score$Eng = c(88, 78, 69, 98)
> score$Chinese = c(66, 5, NA, 88)
> score
student gender math Eng Chinese
1 A M 90 88 66
2 B M 70 78 5
3 C F 80 69 NA
4 D F 60 98 88
> colnames(score)
[1] "student" "gender" "math" "Eng" "Chinese"
> names(score)
[1] "student" "gender" "math" "Eng" "Chinese"
> row.names(score)
[1] "1" "2" "3" "4"
> rownames(score)
[1] "1" "2" "3" "4"
缺省值分析
is.na()
, anyNA()
, na.omit()
, complete.cases()
> is.na(score)
student gender math Eng Chinese
[1,] FALSE FALSE FALSE FALSE FALSE
[2,] FALSE FALSE FALSE FALSE FALSE
[3,] FALSE FALSE FALSE FALSE TRUE
[4,] FALSE FALSE FALSE FALSE FALSE
> anyNA(score)
[1] TRUE
> na.omit(score)
student gender math Eng Chinese
1 A M 90 88 66
2 B M 70 78 5
4 D F 60 98 88
> complete.cases(score)
[1] TRUE TRUE FALSE TRUE
> score[complete.cases(score),]
student gender math Eng Chinese
1 A M 90 88 66
2 B M 70 78 5
4 D F 60 98 88
数据排序
sort()
, rank()
, order()
> sort(score$math)
[1] 60 70 80 90
> rank(c(3,4, 2, 5))
[1] 2 3 1 4
> order(score$math)
[1] 4 2 3 1
> score[order(score$math),]
student gender math Eng Chinese
4 D F 60 98 88
2 B M 70 78 5
3 C F 80 69 NA
1 A M 90 88 66
> score[order(-score$math),]
student gender math Eng Chinese
1 A M 90 88 66
3 C F 80 69 NA
2 B M 70 78 5
4 D F 60 98 88
随机抽样
library(sampling)
, srswr()
, srswor()
, sample()
-
srswr()
: 放回简单随机抽样 -
srswor()
: 不放回简单随机抽样 -
sample()
: 实现放回简单抽样和不放回简单抽样,也可对数据进行随机分组。
数值运算函数
- 数学函数:
abs()
,sqrt()
,ceiling()
,floor()
,round()
,signif()
- 统计函数:
mean()
,median()
,sd()
,var()
,quantile()
,range()
,min()
,max()
,scale()
,diff()
,difftime()
- 概率函数:
r
开头的函数表示生成随机数,p
分布函数(distribution),d
密度函数(density),q
分位数函数(quantile)
分布名称 | 缩写 | 分布的参数名称 |
---|---|---|
Beta分布 | beta | shape1, shape2 |
Logistic分布 | logis | location=0, scale=1 |
二项分布 | binom | size, prob |
多项分布 | multinom | size, prob |
柯西分布 | cauchy | location=0, scale=1 |
负二项分布 | nbinom | size, prob |
(非中心)卡方分布 | chisq | df |
正态分布 | norm | mean=0, sd=1 |
指数分布 | exp | rate=1 |
泊松分布 | pois | lambda |
F分布 | f | df1, df2 |
Wilcoxon符号秩分布 | signrank | n |
Gamma分布 | gamma | shape, scale=1 |
t分布 | t | df |
几何分布 | geom | prob |
均匀分布 | unif | min=0, max=1 |
超几何分布 | hyper | m, n, k |
Weibull分布 | weibull | shape, scale=1 |
对数正态分布 | lnorm | meanlog=0, sdlog=1 |
Wilcoxon秩和分布 | wilcox | m, n |
> data <- rnorm(20)
> dnorm(data)
[1] 0.36762205 0.33357599 0.13761063 0.06410838 0.33682181 0.25152871
[7] 0.30627951 0.04998531 0.38130948 0.39163732 0.39544120 0.38736158
[13] 0.36849002 0.39643249 0.36951560 0.34844062 0.16627181 0.11735244
[19] 0.05490796 0.22007537
> data
[1] 0.4043795 0.5982409 1.4590329 1.9121933 -0.5818294 0.9604787
[7] -0.7270744 2.0381794 0.3006840 -0.1922527 -0.1327753 -0.2427269
[13] -0.3985050 0.1123474 -0.3914685 0.5202863 -1.3230214 1.5643753
[19] -1.9915614 -1.0907306
> pnorm(data)
[1] 0.65703315 0.72516038 0.92772198 0.97207430 0.28034080 0.83159281
[7] 0.23359018 0.97923400 0.61817226 0.42377215 0.44718556 0.40410848
[13] 0.34512899 0.54472604 0.34772548 0.69856798 0.09291412 0.94113527
[19] 0.02320960 0.13769571
> qnorm(0.9, mean=0, sd=1)
[1] 1.281552