R语言入门与数据分析(1)

①Ctrl+Shift+c 注释当前行,选中后再使用会注释选中区域,两次则会取消单行注释

②Alt± 赋值号

③Ctrl + L 刷新控制台(清空屏幕)

④仅支持单行注释#

1.显示当前工作路径
> getwd() [1] "C:/Users/sxl/Documents"

2.改变当前工作路径

> setwd(dir = "c:/Users/sxl/Desktop/R")

3.查看目录下包含的文件(两者功能一样)

> list.files()
 [1] "desktop.ini"        "jongde"             "KingsoftData"       "MATLAB"             "My Music"          
 [6] "My Pictures"        "My Videos"          "OneNote 笔记本"     "Tencent Files"      "WeChat Files"      
[11] "手机模拟大师"       "自定义 Office 模板"
> dir()
 [1] "desktop.ini"        "jongde"             "KingsoftData"       "MATLAB"             "My Music"          
 [6] "My Pictures"        "My Videos"          "OneNote 笔记本"     "Tencent Files"      "WeChat Files"      
[11] "手机模拟大师"       "自定义 Office 模板"

4.对一个变量赋值(也可以用=赋值,但后面会和等于号混合,不推荐使用)
也可以6 -> x,不推荐使用

> x <- 3
> x
[1] 3

强制赋值给局部变量

> x <<- 5
> x
[1] 5

5.可直接利用函数求值,也可以赋值

> sum (1,2,3,4,5)
[1] 15
> y <- sum (1,2,3,4,5)
> y
[1] 15

6.求算术平均值

> z <- mean(1,2,3,4,5)
> z
[1] 1

7.列出当前变量及变量信息
若只有str(x),则是只列出此变量信息

> ls()
[1] "x" "y" "z"
> ls.str()
x :  num 3
y :  num 15
z :  num 1

ls不能列出以.开头的文件(隐藏文件)
解决:

> ls(all.names = TRUE)
[1] ".Random.seed" "x"            "y"            "z"   

8.删除变量或函数

> rm  (x)  #删除x变量
> x
错误: 找不到对象'x'
> rm (y,z)  #删除y和z
> rm (list = ls())  #删除所有变量

9.列出历史命令

> history(25)  #最近使用的25条命令

10.保存当前工作空间(不会保存图片)

> save.image()

11.退出:q() 或者菜单栏

12.在线安装包(注意加引号)也可以用源码安装,相关联的包也要一起安装

> install.packages("vcd")

13.显示库所在位置

> .libPaths()
[1] "D:/R/R-4.0.2/library"
> library()  #显示库里面的包

14.查看包的详细信息

> help(package = "vcd")

15.列出包的基础内容(eg.数据集)

> library(help = "vcd")

16.列出包中所有函数

> ls(package:“vcd")

17.包中寻找数据集

> data (package:"vcd")

18.require() : require(package)将加载名为package的命名空间,并添加到包的搜索列表中,与library(package)一致。加载前对搜索列表进行检查并更新,如果package不存在(不可用),则返回FALSE而不报错,如果存在则返回TRUE。

> require (vcd)  #最开始执行,不然以上命令会出现错误

19.删除包

> detach("package:vcd")

20.将R包彻底从硬盘上删除

> remove.package("vcd")

21.R包的批量移植

> installed.package()  #列出当前的所有包

22.帮助文档

> help.start()
如果什么都不发生的话,你应该自己打开‘http://127.0.0.1:12755/doc/html/index.html’
Making 'packages.html' ... done
> help(sum)
> ?plot   #也可以查看函数信息
> args(plot)  #可以在终端显示函数参数信息
> example(mean)  #查询普通函数
> example("hist")  #查询绘图函数
> help(package = ggplot2)  #查看包
> vignette()  #查看包教程、简介等内容

23.搜索热图
热点图是通过使用不同的标志将图或页面上的区域按照受关注程度的不同加以标注并呈现的一种分析手段,标注的手段一般采用颜色的深浅、点的疏密以及呈现比重的形式,不管使用哪种方式最终得到的效果是一样的,那就是,眼前豁然开朗。

> help.search("heatmap")
> ??heatmap

24.列出包含字符的文件

> apropos("sum")
 [1] ".colSums"                ".rowSums"                ".rs.callSummary"         ".rs.summarizeDir"        ".rs.tutorial.onResume"  
 [6] ".tryResumeInterrupt"     "colSums"                 "contr.sum"               "cumsum"                  "format.summaryDefault"  
[11] "marginSums"              "print.summary.table"     "print.summary.warnings"  "print.summaryDefault"    "rowsum"                 
[16] "rowsum.data.frame"       "rowsum.default"          "rowSums"                 "sum"                     "summary"                
[21] "Summary"                 "summary.aov"             "summary.connection"      "summary.data.frame"      "Summary.data.frame"     
[26] "summary.Date"            "Summary.Date"            "summary.default"         "Summary.difftime"        "summary.factor"         
[31] "Summary.factor"          "summary.glm"             "summary.lm"              "summary.manova"          "summary.matrix"         
[36] "Summary.numeric_version" "Summary.ordered"         "summary.POSIXct"         "Summary.POSIXct"         "summary.POSIXlt"        
[41] "Summary.POSIXlt"         "summary.proc_time"       "summary.srcfile"         "summary.srcref"          "summary.stepfun"        
[46] "summary.table"           "summary.warnings"        "summaryRprof" 
> apropos ("sum",mod = "function")      #只搜索函数    

25.利用关键字在线在网站上查找资料

> RSiteSearch("matlab")

26.查看包中可用的数据集

> data(package = "car")
> data(package = .packages(all.available = TRUE))
> data(Chile,package = "car")
> Chile

27.向量:类似于数学上的集合,由一个或多个元素构成(用c表示)
用于存储数值型、字符型或逻辑型数据的一维数组

> x <- c(1,2,3,4,5)
> x
[1] 1 2 3 4 5
> print(x)   #和直接输入x等价
[1] 1 2 3 4 5
> y <- c("one","two","three")  #注意加引号
> z <- c(TRUE,FALSE,T,F)   #不能首字母大写,会报错
> c(1:100)  #等差数列
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32
 [33]  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64
 [65]  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96
 [97]  97  98  99 100
> seq(from = 1, to = 100)
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32
 [33]  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64
 [65]  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96
 [97]  97  98  99 100
> seq(from = 1, to = 100, by = 2)
 [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87
[45] 89 91 93 95 97 99
> seq(from = 1, to = 100, length.out = 10) #输出10个值
 [1]   1  12  23  34  45  56  67  78  89 100
 > rep(2,5)   #重复5次2
[1] 2 2 2 2 2
> rep(x,5)  #将x向量重复5次
 [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
> rep(x,each = 5)   #5个相同数重复
 [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5
> rep(x,each = 5,times = 2)  #重复两次
 [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5
> a <- c(1,2,"one")
> a
[1] "1"   "2"   "one"
> mode(a)   #查看数据类型,把数字转换为字符型
[1] "character"
> rep(x,c(2,4,6,1,3))   #看重复几次
 [1] 1 1 2 2 2 2 3 3 3 3 3 3 4 5 5 5

28.向量索引

> length(x)   #数长度
[1] 100
> x[1]   #求值(注意下标从1开始)
[1] 1
> x[0]   #若输入0,则显示以下
integer(0)
> x[-19]    #利用负整数,除了这个数以外的数都输出
 [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34
[34]  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67
[67]  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100
> x[c(4:18)]   #输出4—18的数
 [1]  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18
> x[c(1,23,45,67,89)]   #输出对应位置的数
[1]  1 23 45 67 89
> x[c(11,11,23,23,5,90,2)]   #可多次访问同一个元素
[1] 11 11 23 23  5 90  2
> y[y > 5]   #找大于5的数
[1]  6  7  8  9 10
> y[y > 5 & y < 9]   #查找范围的数
[1] 6 7 8

29.字符查找

> z <- c("one","two","three","fore","five")
> z
[1] "one"   "two"   "three" "fore"  "five" 
> "one" %in% z   #表示字符是否在向量中
[1] TRUE
> z[z %in% c("one","two")]   #相当于z[T,T,F,F,F] 还是逻辑索引,所以输出前两个值
[1] "one" "two"
> z %in% c("one","two")
[1]  TRUE  TRUE FALSE FALSE FALSE
> "one" %in% z
[1] TRUE
> z["one" %in% z]  #字符在向量中,并且输出z(自己的理解,没找到答案)
[1] "one"   "two"   "three" "fore"  "five" 
> k <- z %in% c("one","two")   #赋给一个值
> z(k)
Error in z(k) : 没有"z"这个函数
> z[k]
[1] "one" "two"

30.使用names函数为每个向量添加名称

> y <- 1:10
> names(y) <- c("one","two","three","four","five","six","seven","eight","nine","ten")
> y
  one   two three  four  five   six seven eight  nine   ten 
    1     2     3     4     5     6     7     8     9    10 
> names(y)
 [1] "one"   "two"   "three" "four"  "five"  "six"   "seven" "eight" "nine"  "ten"  
> euro
        ATS         BEF         DEM         ESP         FIM         FRF         IEP         ITL         LUF         NLG         PTE 
  13.760300   40.339900    1.955830  166.386000    5.945730    6.559570    0.787564 1936.270000   40.339900    2.203710  200.482000 
> euro("ATS")
Error in euro("ATS") : 没有"euro"这个函数
> y["one"]
one 
  1 
> euro["ATS"]
    ATS 
13.7603 

31.修改向量

> x
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32
 [33]  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64
 [65]  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96
 [97]  97  98  99 100
> x[101] <- 101  #添加向量
> x
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32
 [33]  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64
 [65]  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96
 [97]  97  98  99 100 101
 > v <- 1:3    #批量赋值
> v[c(4,5,6)] <- c(4,5,6)
> v
[1] 1 2 3 4 5 6
> v[20] <- 4  #把4赋给v[20],数个数扩展到了20个,没有赋值的为NA
> v
 [1]  1  2  3  4  5  6 NA NA NA NA NA NA NA NA NA NA NA NA NA  4
> append(x = v,values = 99, after = 5)  #在第5个元素后插入值99
 [1]  1  2  3  4  5 99  6 NA NA NA NA NA NA NA NA NA NA NA NA NA  4
> append(x = v,values = 99, after = 0)  #after = 0,则在头部插入数据
 [1] 99  1  2  3  4  5  6 NA NA NA NA NA NA NA NA NA NA NA NA NA  4
> rm(v)  #删除整个向量
> v
错误: 找不到对象'v'
> y[-c(1:3)]  #删除某个元素
 four  five   six seven eight  nine   ten 
    4     5     6     7     8     9    10 
> y <- y[-c(1:3)]
> y
 four  five   six seven eight  nine   ten 
    4     5     6     7     8     9    10 
> y["four"] <- 100   #修改变量的值
> y
 four  five   six seven eight  nine   ten 
  100     5     6     7     8     9    10 

32.向量运算
幂运算:两个**
求余运算:两个%%
整除运算:%/%
包含运算符:%in%
判断两个向量是否相等:== (若为=,则是赋值,会改变向量的值
向量函数:
abs(x):返回绝对值
sqrt(x):返回平方根
log(16,base = 2):求对数,以2为底,求16的对数
log(16):默认以2为底
log10 (10):以10为底的对数
exp(x):计算向量中每个元素的指数

> ceiling(c(-2,3,3,3.1415))   #返回不小于x的整数
[1] -2  3  3  4
> floor(c(-2,3,3,3.1415))  #返回不大于x的最大整数
[1] -2  3  3  3
> trunc(c(-2,3,3.1415))   #返回整数
[1] -2  3  3
> round(c(-2,3,3.1415))   #四舍五入
[1] -2  3  3
> round(c(-2,3,3.1415),digits = 2)  #四舍五入,保留位数
[1] -2.00  3.00  3.14
> signif(c(-2,3,3.1415),digits = 2)   #保留有效数字
[1] -2.0  3.0  3.1
> sin(x)    #三角函数
 [1]  0.8414710 -0.5365729 -0.8462204  0.5290827  0.8509035 -0.5215510 -0.8555200  0.5139785  0.8600694 -0.5063656
> cos(x)
 [1]  0.5403023  0.8438540 -0.5328330 -0.8485703  0.5253220  0.8532201 -0.5177698 -0.8578031  0.5101770  0.8623189

统计函数

> vec <- 1:100   #数值函数
> vec
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32
 [33]  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64
 [65]  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96
 [97]  97  98  99 100
 > sum(vec)  #计算总和
[1] 5050
> max(vec)   #返回最大值
[1] 100
> min(vec)  #返回最小值
[1] 1
> range(vec)  #返回最大、最小值
[1]   1 100
> mean(vec)  #返回均值
[1] 50.5
> var(vec)  #返回返回方差
[1] 841.6667
> round(var(vec),digits = 2)
[1] 841.67
> round(sd(vec),digits = 2)  #返回标准差
[1] 29.01
> prod(vec)  #返回连乘积
[1] 9.332622e+157
> median(vec)  #返回中位数
[1] 50.5
> quantile(vec)  #返回分位数
    0%    25%    50%    75%   100% 
  1.00  25.75  50.50  75.25 100.00 

#找索引值
> t <- c(1,4,2,5,7,9,6)
> t
[1] 1 4 2 5 7 9 6
> which.max(t)
[1] 6
> which(t ==7)
[1] 5
> which(t > 5)
[1] 5 6 7

> t[which(t > 5)]  #返回的元素值
[1] 7 9 6

33.矩阵与数组
(1)构建矩阵以及按顺序排列

> x <- 1:20
> x
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
> m <- matrix(x,mrow = 4,mcol = 5)
Error in matrix(x, mrow = 4, mcol = 5) : 参数没有用(mrow = 4, mcol = 5)
> m <- matrix(x,nrow = 4,ncol = 5)   #构建矩阵
> m <- matrix(1:20,4,5)    #按列排
> m
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    5    9   13   17
[2,]    2    6   10   14   18
[3,]    3    7   11   15   19
[4,]    4    8   12   16   20
> m <- matrix(x,nrow = 4,ncol = 6)  #个数要符合矩阵的个数
Warning message:
In matrix(x, nrow = 4, ncol = 6) : 数据长度[20]不是矩阵列数[6]的整倍数
> m <- matrix(x,nrow = 4,ncol = 5,byrow = T)  #按行排
> m
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    6    7    8    9   10
[3,]   11   12   13   14   15
[4,]   16   17   18   19   20
> m <- matrix(x,nrow = 4,ncol = 5,byrow = F)  #按列排
> m
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    5    9   13   17
[2,]    2    6   10   14   18
[3,]    3    7   11   15   19
[4,]    4    8   12   16   20

(2)行列标签

> rnames <- c("R1","R2","R3","R4")
>  rnames
[1] "R1" "R2" "R3" "R4"
> cnames <- c("C1","C2","C3","C4","C5")
> cnames
[1] "C1" "C2" "C3" "C4" "C5"
> dimnames(m) <- list(rnames,cnames)
> m
   C1 C2 C3 C4 C5
R1  1  5  9 13 17
R2  2  6 10 14 18
R3  3  7 11 15 19
R4  4  8 12 16 20

(3)返回矩阵维数

> dim(x)
NULL
> dim(x) <- c(4,5)
> x
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    5    9   13   17
[2,]    2    6   10   14   18
[3,]    3    7   11   15   19
[4,]    4    8   12   16   20

(4)R中的数组就是多维矩阵,创建数组

> x <- 1:20
> dim(x) <- c(2,2,5)  #可看成是一个长宽高的空间 数据
> x
, , 1

     [,1] [,2]
[1,]    1    3
[2,]    2    4

, , 2

     [,1] [,2]
[1,]    5    7
[2,]    6    8

, , 3

     [,1] [,2]
[1,]    9   11
[2,]   10   12

, , 4

     [,1] [,2]
[1,]   13   15
[2,]   14   16

, , 5

     [,1] [,2]
[1,]   17   19
[2,]   18   20

(5)利用array创建数组并带标签

> ?array
> dim1 <- c("A1","A2")
> DIM2 <- C("B1","B2","B3")
Error in C("B1", "B2", "B3") : 不能把对象解释成因子
> dim2 <- c("B1","B2","B3")
> dim3 <- c("C1","C2","C3","C4")
> z <- array(1:24,c(2,3,4,dimnames = list(dim1,dim2,dim3))
+ z
错误: unexpected symbol in:
"z <- array(1:24,c(2,3,4,dimnames = list(dim1,dim2,dim3))
z"
> z <- array(1:24,c(2,3,4),dimnames = list(dim1,dim2,dim3,)
+ 
+ z
错误: unexpected symbol in:
"
z"
> z <- array(1:24,c(2,3,4),dimnames = list(dim1,dim2,dim3))
> z
, , C1

   B1 B2 B3
A1  1  3  5
A2  2  4  6

, , C2

   B1 B2 B3
A1  7  9 11
A2  8 10 12

, , C3

   B1 B2 B3
A1 13 15 17
A2 14 16 18

, , C4

   B1 B2 B3
A1 19 21 23
A2 20 22 24

(6)访问矩阵值

> m <- matrix(1:20,4,5,byrow = T)
> m
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    6    7    8    9   10
[3,]   11   12   13   14   15
[4,]   16   17   18   19   20
> m[1,2]   #第一行第二列
[1] 2
> m[1,c(2,3,4)]  #第一行第2,3,4列
[1] 2 3 4
> m[c(2:4),c(2,3)]
     [,1] [,2]
[1,]    7    8
[2,]   12   13
[3,]   17   18
> m[2,]  #第二行
[1]  6  7  8  9 10
> m[,2]    #第二列
[1]  2  7 12 17
> m[2]  #按列数,第二个值
[1] 6
> m[-1,2]  #去除第一行的第二列
[1]  7 12 17

#通过访问名字来访问元素值
> dimnames(m) = list (rnames,cnames)
> m
   C1 C2 C3 C4 C5
R1  1  2  3  4  5
R2  6  7  8  9 10
R3 11 12 13 14 15
R4 16 17 18 19 20
> m["R1","R2"]
Error in m["R1", "R2"] : 下标出界
> m["R1","C2"]
[1] 2

(7)矩阵运算

> m + 1  #所有元素加1
   C1 C2 C3 C4 C5
R1  2  3  4  5  6
R2  7  8  9 10 11
R3 12 13 14 15 16
R4 17 18 19 20 21
> n <- matrix(1:20,5,4)
> n
     [,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]    2    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20
> m\
错误: unexpected input in "m\"
> m
   C1 C2 C3 C4 C5
R1  1  2  3  4  5
R2  6  7  8  9 10
R3 11 12 13 14 15
R4 16 17 18 19 20
> m + n   #矩阵相加要行列相等
Error in m + n : 非整合陈列
> m[,1]
R1 R2 R3 R4 
 1  6 11 16 
> t <- m[,1]
> sum(t)   #求第一列的和
[1] 34
> colSums(m)   #计算每一列的和
C1 C2 C3 C4 C5 
34 38 42 46 50 
> rowSum(m)
Error in rowSum(m) : 没有"rowSum"这个函数
> rowSums(m)  #计算每一行的和
R1 R2 R3 R4 
15 40 65 90 

#计算平均值
> colMeans(m)
  C1   C2   C3   C4   C5 
 8.5  9.5 10.5 11.5 12.5 
> rowMeans(m)
R1 R2 R3 R4 
 3  8 13 18 
 > n*t   #两矩阵内积
     [,1] [,2] [,3]
[1,]    2   20   56
[2,]    6   30   72
[3,]   12   42   90
> n %*% t   #两矩阵外积
     [,1] [,2] [,3]
[1,]   42   78  114
[2,]   51   96  141
[3,]   60  114  168

> diag(n)   #返回对角矩阵的值
[1] 1 5 9

> t(n)   #函数t对矩阵进行转置
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

34.列表
同:模式和向量类似,都是一维数据集合
异:向量只能存储一种数据类型,列表中的对象可以是R中的任何数据结构,甚至列表本身
(1)建立列表

> a <- 1:20
> b <- matrix(1:24,4,6)
> c = mtcars
> d <- "This is a test list"
> > mlist <- list(a,b,c,d)
> mlist
[[1]]
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

[[2]]
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    5    9   13   17   21
[2,]    2    6   10   14   18   22
[3,]    3    7   11   15   19   23
[4,]    4    8   12   16   20   24

[[3]]
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

[[4]]
[1] "This is a test list"

(2)列表命名

> mlist <- list(first = a,second = b,third = c,forth = d)
> mlist

(3)访问列表

> mlist[1]   #访问单个向量
$first
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

> mlist[c(1,4)]   #访问多个向量
$first
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

$forth
[1] "This is a test list"

> state.center[c("x","y")]   #通过名字访问列表
$x
 [1]  -86.7509 -127.2500 -111.6250  -92.2992 -119.7730 -105.5130  -72.3573  -74.9841  -81.6850  -83.3736 -126.2500 -113.9300  -89.3776
[14]  -86.0808  -93.3714  -98.1156  -84.7674  -92.2724  -68.9801  -76.6459  -71.5800  -84.6870  -94.6043  -89.8065  -92.5137 -109.3200
[27]  -99.5898 -116.8510  -71.3924  -74.2336 -105.9420  -75.1449  -78.4686 -100.0990  -82.5963  -97.1239 -120.0680  -77.4500  -71.1244
[40]  -80.5056  -99.7238  -86.4560  -98.7857 -111.3300  -72.5450  -78.2005 -119.7460  -80.6665  -89.9941 -107.2560

$y
 [1] 32.5901 49.2500 34.2192 34.7336 36.5341 38.6777 41.5928 38.6777 27.8744 32.3329 31.7500 43.5648 40.0495 40.0495 41.9358 38.4204
[17] 37.3915 30.6181 45.6226 39.2778 42.3645 43.1361 46.3943 32.6758 38.3347 46.8230 41.3356 39.1063 43.3934 39.9637 34.4764 43.1361
[33] 35.4195 47.2517 40.2210 35.5053 43.9078 40.9069 41.5928 33.6190 44.3365 35.6767 31.3897 39.1063 44.2508 37.5630 47.4231 38.4204
[49] 44.5937 43.0504

> mlist$first  #利用mlist$会出现向量列表,需要哪个向量选择即可
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

(4)

> mlist[1]
$first
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

> mlist[[1]]   #输出的是数据本身类型,[[]]主要用于获取列表(list)中的元素,是向量的子集
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
> class (mlist[1])
[1] "list"
> class (mlist[[1]])
[1] "integer"

(5)给列表赋值

> mlist[[5]] <- iris
> mlist
$first
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

$second
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    5    9   13   17   21
[2,]    2    6   10   14   18   22
[3,]    3    7   11   15   19   23
[4,]    4    8   12   16   20   24
#后面还有内容,后面添加[[5]]的值

(6)删除列表

> mlist[-5]
> mlist[[5]] <- NULL

35.数据框:表格式的数据结构,旨在模拟数据集
数据集:由数据构成的一个矩形数组,行表示观测,列表示变量
数据框每一列必须同一类型,每一行可以不同
(1)合并成一个数据框(state.name以及后面的都是已经存在的,且序列相同)

> state <- data.frame(state.name,state.abb,state.region,state.x77)
> state
                   state.name state.abb  state.region Population Income Illiteracy Life.Exp Murder HS.Grad Frost   Area
Alabama               Alabama        AL         South       3615   3624        2.1    69.05   15.1    41.3    20  50708
Alaska                 Alaska        AK          West        365   6315        1.5    69.31   11.3    66.7   152 566432
Arizona               Arizona        AZ          West       2212   4530        1.8    70.55    7.8    58.1    15 113417
Arkansas             Arkansas        AR         South       2110   3378        1.9    70.66   10.1    39.9    65  51945
California         California        CA          West      21198   5114        1.1    71.71   10.3    62.6    20 156361
Colorado             Colorado        CO          West       2541   4884        0.7    72.06    6.8    63.9   166 103766
Connecticut       Connecticut        CT     Northeast       3100   5348        1.1    72.48    3.1    56.0   139   4862
Delaware             Delaware        DE         South        579   4809        0.9    70.06    6.2    54.6   103   1982
F

(2)访问数据框

> state[1]   #输出数据框的第一列
                   state.name
Alabama               Alabama
Alaska                 Alaska
Arizona               Arizona
A


> state[c(2,4)]   #输出数据框的第2、4列
               state.abb Population
Alabama               AL       3615
Alaska                AK        365
Arizona               AZ       2212

> state[-c(2,4)]    #去掉这部分内容

> state[,"state.abb"]    #访问对应列
 [1] "AL" "AK" "AZ" "AR" "CA" "CO" "CT" "DE" "FL" "GA" "HI" "ID" "IL" "IN" "IA" "KS" "KY" "LA" "ME" "MD" "MA" "MI" "MN" "MS" "MO" "MT"
[27] "NE" "NV" "NH" "NJ" "NM" "NY" "NC" "ND" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT" "VT" "VA" "WA" "WV" "WI" "WY"

> state["Alabama",]  #该行详细信息
        state.name state.abb state.region Population Income Illiteracy Life.Exp Murder HS.Grad Frost  Area
Alabama    Alabama        AL        South       3615   3624        2.1    69.05   15.1    41.3    20 50708

> state$state.region  #利用$
 [1] South         West          West          South         West          West          Northeast     South         South        
[10] South         West          West          North Central North Central North Central North Central South         South        
[19] Northeast     South         Northeast     North Central North Central South         North Central West          North Central
[28] West          Northeast     Northeast     West          Northeast     South         North Central North Central South        
[37] West          Northeast     Northeast     South         North Central South         South         West          Northeast    
[46] South         West          South         North Central West         
Levels: Northeast South North Central West


> attach(mtcars)   #利用attach访问对象
> mpg    #直接输入列名
 [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3
[27] 26.0 30.4 15.8 19.7 15.0 21.4
> hp
 [1] 110 110  93 110 175 105 245  62  95 123 123 180 180 180 205 215 230  66  52  65  97 150 150 245 175  66  91 113 264 175 335 109
> rownames(mtcars)
 [1] "Mazda RX4"           "Mazda RX4 Wag"       "Datsun 710"          "Hornet 4 Drive"      "Hornet Sportabout"   "Valiant"            
 [7] "Duster 360"          "Merc 240D"           "Merc 230"            "Merc 280"            "Merc 280C"           "Merc 450SE"         
[13] "Merc 450SL"          "Merc 450SLC"         "Cadillac Fleetwood"  "Lincoln Continental" "Chrysler Imperial"   "Fiat 128"           
[19] "Honda Civic"         "Toyota Corolla"      "Toyota Corona"       "Dodge Challenger"    "AMC Javelin"         "Camaro Z28"         
[25] "Pontiac Firebird"    "Fiat X1-9"           "Porsche 914-2"       "Lotus Europa"        "Ford Pantera L"      "Ferrari Dino"       
[31] "Maserati Bora"       "Volvo 142E"         
> colnames(mtcars)
 [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear" "carb"
> cyl
 [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
> hp
 [1] 110 110  93 110 175 105 245  62  95 123 123 180 180 180 205 215 230  66  52  65  97 150 150 245 175  66  91 113 264 175 335 109
> detach(mtcars)     #取消加载
> hp
错误: 找不到对象'hp'

> with(mtcars,{mpg})  #用with函数也可以直接访问,直接在大括号内输入列名
 [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3
[27] 26.0 30.4 15.8 19.7 15.0 21.4

36.变量分类:名义型变量、有序型变量、连续型变量
因子:名义型变量和有序型变量称为因子
因子的应用:计算频数、独立性检验、相关性检验、方差分析、主成分分析、因子分析等
(1)

> mtcars$cyl    #cyl这一列作为因子
 [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
> table(mtcars$cyl)   #频数统计   4  6  8 三种类型

 4  6  8 
11  7 14 

(2)定义因子

> f <- factor(c("red","red","green","red","blue","green","blue","blue"))
> f
[1] red   red   green red   blue  green blue  blue 
Levels: blue green red

(3)有序序列作为因子

> week <- factor(c("Mon","Fri","Thu","Wed","Mon","Fri","Sun"))
> week
[1] Mon Fri Thu Wed Mon Fri Sun
Levels: Fri Mon Sun Thu Wed

(4)指定level

> week <- factor(c("Mon","Fri","Thu","Wed","Mon","Fri","Sun"),order = TRUE, level = c("Mon","Tue","Wed","Thu","Fri","Sat","Sun"))
> week
[1] Mon Fri Thu Wed Mon Fri Sun
Levels: Mon < Tue < Wed < Thu < Fri < Sat < Sun

(5)向量转换为因子

> fcyl = factor(mtcars$cyl)
> fcyl
 [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
Levels: 4 6 8

(6)作图结果

> plot(mtcars$cyl)  #向量作出来是点状图
> plot(factor(mtcars$cyl))   #因子作出来是条形图

(7)分组

> num <- 1:100
> num
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32
 [33]  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64
 [65]  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96
 [97]  97  98  99 100
> cut (num,c(seq(0,100,10)))   #10个为1组,有规律分组可用cut函数
  [1] (0,10]   (0,10]   (0,10]   (0,10]   (0,10]   (0,10]   (0,10]   (0,10]   (0,10]   (0,10]   (10,20]  (10,20]  (10,20]  (10,20] 
 [15] (10,20]  (10,20]  (10,20]  (10,20]  (10,20]  (10,20]  (20,30]  (20,30]  (20,30]  (20,30]  (20,30]  (20,30]  (20,30]  (20,30] 
 [29] (20,30]  (20,30]  (30,40]  (30,40]  (30,40]  (30,40]  (30,40]  (30,40]  (30,40]  (30,40]  (30,40]  (30,40]  (40,50]  (40,50] 
 [43] (40,50]  (40,50]  (40,50]  (40,50]  (40,50]  (40,50]  (40,50]  (40,50]  (50,60]  (50,60]  (50,60]  (50,60]  (50,60]  (50,60] 
 [57] (50,60]  (50,60]  (50,60]  (50,60]  (60,70]  (60,70]  (60,70]  (60,70]  (60,70]  (60,70]  (60,70]  (60,70]  (60,70]  (60,70] 
 [71] (70,80]  (70,80]  (70,80]  (70,80]  (70,80]  (70,80]  (70,80]  (70,80]  (70,80]  (70,80]  (80,90]  (80,90]  (80,90]  (80,90] 
 [85] (80,90]  (80,90]  (80,90]  (80,90]  (80,90]  (80,90]  (90,100] (90,100] (90,100] (90,100] (90,100] (90,100] (90,100] (90,100]
 [99] (90,100] (90,100]
Levels: (0,10] (10,20] (20,30] (30,40] (40,50] (50,60] (60,70] (70,80] (80,90] (90,100]

37.有缺省数据时
不同缺失值差别:
NA:存在的值,但是不知道是多少
NaN:不存在
inf:存在,是无穷大或者无穷小,表示不可能的值

> NA == 0
[1] NA
> a <- c(NA,1:49)
> sum(a)   #函数结果会显示NA
[1] NA
> mean(a)
[1] NA
> sum(a,na,rm = TRUE)   #加上na.rm = TRUE则可以正确显示数值
错误: 找不到对象'na'
> sum(a,na.rm = TRUE)
[1] 1225
> mean(a,na.rm = TRUE)
[1] 25
> a
 [1] NA  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
[45] 44 45 46 47 48 49

有缺省数据时,可以使用na.omit()函数删除NA

> c <- c(NA,1:20,NA,NA)
> c
 [1] NA  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 NA NA
> d <- na.omit(c)
> d
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
attr(,"na.action")
[1]  1 22 23
attr(,"class")
[1] "omit"
> is.na(d)
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> sum(d)
[1] 210
> mean(d)
[1] 10.5
> na.omit(sleep)  #可以删除sleep数据集里面的NA行,但是可能会对实验数据造成影响

38.字符串
(1)计算字符串长度

> nchar("Hello World")
[1] 11
> month.name
 [1] "January"   "February"  "March"     "April"     "May"       "June"      "July"      "August"    "September" "October"   "November" 
[12] "December" 
> nchar(month.name)   #返回每个字符串的字符数
 [1] 7 8 5 5 3 4 4 6 9 7 8 8
> length(month.name)   #返回字符串个数
[1] 12
> nchar(c(12,3,345))
[1] 2 1 3

(2)paste函数:粘贴字符串,将多个字符串合并成一个

> paste(c("Everybody","loves","states"))
[1] "Everybody" "loves"     "states"   
> paste("Everybody","loves","states")
[1] "Everybody loves states"
> paste("Everybody","loves","states",sep = "-")  #设置分隔符
[1] "Everybody-loves-states"

(3)分别连接字符串

> names <- c("Moe","Larry","Curly")
> paste(names,"love stats")
[1] "Moe love stats"   "Larry love stats" "Curly love stats"

(4)截取字符串,大小写转换

> substr(month.name,1,3)     #截取字符到第三个字符
 [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
> temp <- substr(x - mi=onth.name,start = 1,stop = 3)
错误: 意外的'=' in "temp <- substr(x - mi="
> temp <- substr(x = month.name,start = 1,stop = 3)
> toupper(temp)     #转换为大写
 [1] "JAN" "FEB" "MAR" "APR" "MAY" "JUN" "JUL" "AUG" "SEP" "OCT" "NOV" "DEC"
> tolower(temp)    #转换为小写
 [1] "jan" "feb" "mar" "apr" "may" "jun" "jul" "aug" "sep" "oct" "nov" "dec"

#恢复首字母大写
> gsub("^(\\w)","\\U\\1",tolower(temp))   #gsub是全局改变,sub是单个改变
 [1] "Ujan" "Ufeb" "Umar" "Uapr" "Umay" "Ujun" "Ujul" "Uaug" "Usep" "Uoct" "Unov" "Udec"
 #^:首字母  \\w:字符集的简写,代表所有小写字符
 # \\U:所有转化为大写      1:表示只转换一次
> gsub("^(\\w)","\\U\\1",tolower(temp),perl = TURE)   #利用正则表达式
Error in gsub("^(\\w)", "\\U\\1", tolower(temp), perl = TURE) : 
  找不到对象'TURE'
> gsub("^(\\w)","\\U\\1",tolower(temp),perl = TRUE)
 [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"

#转换为首字母小写
> gsub("^(\\w)","\\L\\1",toupper(temp),perl = TRUE)
 [1] "jAN" "fEB" "mAR" "aPR" "mAY" "jUN" "jUL" "aUG" "sEP" "oCT" "nOV" "dEC"

(5)查找字符串

> x <- c("b","A+","AC")
> x
[1] "b"  "A+" "AC"
> grep("A+",x,fixed = T)    #匹配到第二个字符串
[1] 2
> grep("A+",x,fixed = F)   #+表示到无穷个字符串,AC也满足
[1] 2 3
> match("AC",x)   #也用于查找字符串,但不支持正则表达式
[1] 3

(6)分割字符串

> path <- "/usr/local/bin/R"
> strsplit(path,"/")   #path:字符串   /:分隔符   返回对是列表,不是向量
[[1]]
[1] ""      "usr"   "local" "bin"   "R"    

> strsplit(c(path,path),"/")   #一次分割两个路径
[[1]]
[1] ""      "usr"   "local" "bin"   "R"    

(7)生成字符串所有组合,即笛卡尔积

> face <- 1:13
> suit <- c("spades","clubs","hearts","diamonds")
> face
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13
> suit
[1] "spades"   "clubs"    "hearts"   "diamonds"
> outer(suit,face,FUN = paste)
     [,1]         [,2]         [,3]         [,4]         [,5]         [,6]         [,7]         [,8]         [,9]         [,10]        
[1,] "spades 1"   "spades 2"   "spades 3"   "spades 4"   "spades 5"   "spades 6"   "spades 7"   "spades 8"   "spades 9"   "spades 10"  
[2,] "clubs 1"    "clubs 2"    "clubs 3"    "clubs 4"    "clubs 5"    "clubs 6"    "clubs 7"    "clubs 8"    "clubs 9"    "clubs 10"   
[3,] "hearts 1"   "hearts 2"   "hearts 3"   "hearts 4"   "hearts 5"   "hearts 6"   "hearts 7"   "hearts 8"   "hearts 9"   "hearts 10"  
[4,] "diamonds 1" "diamonds 2" "diamonds 3" "diamonds 4" "diamonds 5" "diamonds 6" "diamonds 7" "diamonds 8" "diamonds 9" "diamonds 10"
     [,11]         [,12]         [,13]        
[1,] "spades 11"   "spades 12"   "spades 13"  
[2,] "clubs 11"    "clubs 12"    "clubs 13"   
[3,] "hearts 11"   "hearts 12"   "hearts 13"  
[4,] "diamonds 11" "diamonds 12" "diamonds 13"
> outer(suit,face,FUN = paste,sep = "-")
     [,1]         [,2]         [,3]         [,4]         [,5]         [,6]         [,7]         [,8]         [,9]         [,10]        
[1,] "spades-1"   "spades-2"   "spades-3"   "spades-4"   "spades-5"   "spades-6"   "spades-7"   "spades-8"   "spades-9"   "spades-10"  
[2,] "clubs-1"    "clubs-2"    "clubs-3"    "clubs-4"    "clubs-5"    "clubs-6"    "clubs-7"    "clubs-8"    "clubs-9"    "clubs-10"   
[3,] "hearts-1"   "hearts-2"   "hearts-3"   "hearts-4"   "hearts-5"   "hearts-6"   "hearts-7"   "hearts-8"   "hearts-9"   "hearts-10"  
[4,] "diamonds-1" "diamonds-2" "diamonds-3" "diamonds-4" "diamonds-5" "diamonds-6" "diamonds-7" "diamonds-8" "diamonds-9" "diamonds-10"
     [,11]         [,12]         [,13]        
[1,] "spades-11"   "spades-12"   "spades-13"  
[2,] "clubs-11"    "clubs-12"    "clubs-13"   
[3,] "hearts-11"   "hearts-12"   "hearts-13"  
[4,] "diamonds-11" "diamonds-12" "diamonds-13"

39.时间与日期

> Sys.Date()   #显示当前时间
[1] "2020-11-01"
> class(Sys.date())
Error in Sys.date() : 没有"Sys.date"这个函数
> class(Sys.Date())
[1] "Date"
> a = "2017-01-01"
> as.Date(a)
[1] "2017-01-01"
> as.Date(a,foemat = "%Y-%m-%d")   #表示年月日
[1] "2017-01-01"
> class(as.Date(a,foemat = "%Y-%m-%d"))   #为date类了
[1] "Date"

> seq(as.Date("2017-01-01"),as.Date("2017-07-05"),by = 5)  #创建连续的时间点,间隔为5
 [1] "2017-01-01" "2017-01-06" "2017-01-11" "2017-01-16" "2017-01-21" "2017-01-26" "2017-01-31" "2017-02-05" "2017-02-10" "2017-02-15"
[11] "2017-02-20" "2017-02-25" "2017-03-02" "2017-03-07" "2017-03-12" "2017-03-17" "2017-03-22" "2017-03-27" "2017-04-01" "2017-04-06"
[21] "2017-04-11" "2017-04-16" "2017-04-21" "2017-04-26" "2017-05-01" "2017-05-06" "2017-05-11" "2017-05-16" "2017-05-21" "2017-05-26"
[31] "2017-05-31" "2017-06-05" "2017-06-10" "2017-06-15" "2017-06-20" "2017-06-25" "2017-06-30" "2017-07-05"

> sales <- round(runif(48,min = 50,max = 100))  #生成50——100的随机数,round函数用来取整数
> sales
 [1] 97 76 57 76 82 64 92 71 64 96 70 57 97 58 56 67 95 60 85 70 84 55 95 66 95 63 78 54 81 60 65 80 82 55 92 83 75 58 86 55 56 73 87 94
[45] 86 79 66 73
#frequency = 1:代表年   4:季度  12:月份
> ts(sales,start = c(2010,5),end = c(2014,4),frequency = 1)  #将向量转换为时间序列,两个向量要用c
Time Series:
Start = 2014 
End = 2017 
Frequency = 1 
[1] 97 76 57 76
> ts(sales,start = c(2010,5),end = c(2014,4),frequency = 12)
     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2010                  97  76  57  76  82  64  92  71
2011  64  96  70  57  97  58  56  67  95  60  85  70
2012  84  55  95  66  95  63  78  54  81  60  65  80
2013  82  55  92  83  75  58  86  55  56  73  87  94
2014  86  79  66  73                                
> ts(sales,start = c(2010,5),end = c(2014,4),frequency = 4)
     Qtr1 Qtr2 Qtr3 Qtr4
2011   97   76   57   76
2012   82   64   92   71
2013   64   96   70   57
2014   97   58   56   67

40.R获取数据三种途径:
(1)利用键盘输入数据
① 手动每个变量赋值

#定义5个变量
> patientID <- c(1,2,3,4)
> admdate <- c("10/15/2009","11/01/2009","10/21/2009","10/28/2009")
> age <- c(25,34,28,52)
> diabetes <- c("Type1","Type2","Type1","Type1")
> status <- c("Poor","Improved","Excellent","Poor")
#5个变量用数据框表示出来
> data <- data.frame(patientID,admidate,age,diabetes,status)
Error in data.frame(patientID, admidate, age, diabetes, status) : 
  找不到对象'admidate'
> data <- data.frame(patientID,admdate,age,diabetes,status)
> data    #显示数据
  patientID    admdate age diabetes    status
1         1 10/15/2009  25    Type1      Poor
2         2 11/01/2009  34    Type2  Improved
3         3 10/21/2009  28    Type1 Excellent
4         4 10/28/2009  52    Type1      Poor

② 定义变量后打开文本编辑框,输入数据,可复制

> data2 <- data.frame(patientID = character(0),admdate  = character(0),age = numeric(0),diabetes = character(),status = character())
> date2
错误: 找不到对象'date2'
> data2
[1] patientID admdate   age       diabetes  status   
<0> (0-长度的row.names)
> data2 <- edit(data2)  #打开文本编辑框
> data2
  patientID    admdate age diabetes status
1         1 10/15/2009  NA     <NA>   <NA>
2         2       <NA>  NA     <NA>   <NA>
3         3       <NA>  NA     <NA>   <NA>
4         4       <NA>  NA     <NA>   <NA>

> fix(data2)     #直接输入保存数据
> data2
  patientID    admdate age diabetes status
1         1 10/15/2009  NA    Type1   <NA>
2         2       <NA>  NA     <NA>   <NA>
3         3       <NA>  NA     <NA>   <NA>
4         4       <NA>  NA     <NA>   <NA>

(2)通过读取存储在外部文件上的数据

(3)通过访问数据库系统来获取数据
通过ODBC访问数据库(开放数据库:Open Database Connectivity)
41.读入文件
(1)读取、显示前后行信息,截取行数信息

> x <- read.table("input.txt")   #直接读取文件,文件要在当前目录下,若不在当前路径,也可直接访问文件目录,必须是完整路径,否则报错
> x <- read.table("D:/RWork/RData/input.txt")  # 文件必须解压,否则也要报错
> head(x)   #默认显示前6行
> tail(x)   #默认显示后6行
> head(x,n = 10)  #显示前10行

> x <- read.table("D:/RWork/RData/input.csv",sep = ",")  #若不使用分隔符,显示信息会比较混乱
> x
                    V1   V2  V3    V4  V5   V6    V7    V8 V9 V10  V11  V12
1                       mpg cyl  disp  hp drat    wt  qsec vs  am gear carb
2            Mazda RX4   21   6   160 110  3.9  2.62 16.46  0   1    4    4
3        Mazda RX4 Wag   21   6   160 110  3.9 2.875 17.02  0   1    4    4
4           Datsun 710 22.8   4   108  93 3.85  2.32 18.61  1   1    4    1
5       Hornet 4 Drive 21.4   6   258 110 3.08 3.215 19.44  1   0    3    1
6

> x <- read.table("D:/RWork/RData/input.csv",sep = ",",header = T)  #若读取头部分有变量名,则header = T,上下两张图的变量名改变比较变化
> x
                     X  mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1            Mazda RX4 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
2        Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
3           Datsun 710 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
4       Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
5    Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
6              Valiant 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1

> x <- read.table("D:/RWork/RData/input 1.txt",sep = ",",header = T,skip = 5)   #从第6行开始读入数据,前5行为注释
> x
    Ozone.Solar.R.Wind.Temp.Month.Day
1                 1 41 190 7.4 67 5 1
2                   2 36 118 8 72 5 2
3                3 12 149 12.6 74 5 3
4                4 18 313 11.5 62 5 4
5                 5 NA NA 14.3 56 5 5
6                 6 28 NA 14.9 66 5 6

> x <- read.table("D:/RWork/RData/input 1.txt",sep = ",",header = T,skip = 50,nrow = 200)   #读取第6行到250行的数据,nrow是要读取的行数
> x
    X45.NA.332.13.8.80.6.14
1    46 NA 322 11.5 79 6 15
2    47 21 191 14.9 77 6 16
3    48 37 284 20.7 72 6 17
4      49 20 37 9.2 65 6 18

> read.fwf("D:/RWork/RData/fwf.txt",width = c(3,3))  #自己设置宽度
    V1  V2
1  "st ate
2  "1"  "A
3  "2"  "A

> x <- read.table("D:/RWork/RData/input.csv",sep = ",",header = T,skip = 50,nrows = 100,stringAsFactors = F)   #文件中有空的字符会设为NA

(2)文件不在本机上,读取网络文件,通过协议读取
Ⅰ.读取剪切内容

> x <- read.table("clipboard",header = T, sep = ",")   #直接复制剪切板上的内容,就是复制完还没有粘贴
> x
> readClipboard()    #直接读取剪切板上的信息

Ⅱ.打开压缩文件

> RSiteSearch("Matlab")   #网页查找
> x <- read.table(gzfile("D:/RWork/RData/input.txt.gz"))
> x
                     X  mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1            Mazda RX4 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
2        Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
3           Datsun 710 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
4       Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1

Ⅲ.读取不标准的文件格式
scan函数的用法

> readLines("D:/RWork/RData/input.csv",n = 15)   #读取各行,并以字符串的形式返回结果,限制读取15行
 [1] "\"\",\"mpg\",\"cyl\",\"disp\",\"hp\",\"drat\",\"wt\",\"qsec\",\"vs\",\"am\",\"gear\",\"carb\""
 [2] "\"Mazda RX4\",21,6,160,110,3.9,2.62,16.46,0,1,4,4"                                            
 [3] "\"Mazda RX4 Wag\",21,6,160,110,3.9,2.875,17.02,0,1,4,4"                                       
 [4] "\"Datsun 710\",22.8,4,108,93,3.85,2.32,18.61,1,1,4,1"   

> world.series <- scan    ("http://lib.stat.cmu.edu/datasets/wseries",skip=35,nlines = 23,
 what = list(year=integer(0),pattern=character(0)))   #可以读取字符串,也可以读取数值
> x <- scan("scan.txt",what=list (character(3),numeric(0),numeric(0)))
> x <- scan("scan.txt",what=list (X1=character(3),X2=numeric(0),X3=numeric(0)))

42.写入文件

> write.table(x,file = "C:/Users/Desktop/newfile.csv",sep = ",")
 #避免每次写入都要列出一列行号
> write.table(x,file = "C:/Users/Desktop/newfile.csv",sep = ",",row.names = FALSE)   
#append:是否追加文件,是则为TRUE,在末尾追加,默认情况下会为字符串添加双引号,若要去掉双引号,则把quote = FALSE
> write.table(x,file=newfile.csv,sep="\t",quote=FALSE,append=FALSE,na="NA")   
#把文件写成压缩文件
> write.table(mtcars,gzfile("newfile.txt.gz"))

43.读写excel文件
(1)excel文件另存为csv格式
(2)R中读取
Ⅰ.在excel表中复制内容
Ⅱ.readClipboard() 将剪切板上内容复制下来

> readClipboard()
 [1] "\tmpg\tcyl\tdisp\thp\tdrat\twt\tqsec\tvs\tam\tgear\tcarb"            
 [2] "Mazda RX4\t21\t6\t160\t110\t3.9\t2.62\t16.46\t0\t1\t4\t4"            
 [3] "Mazda RX4 Wag\t21\t6\t160\t110\t3.9\t2.875\t17.02\t0\t1\t4\t4"       
 [4] "Datsun 710\t22.8\t4\t108\t93\t3.85\t2.32\t18.61\t1\t1\t4\t1"         
 [5] "Hornet 4 Drive\t21.4\t6\t258\t110\t3.08\t3.215\t19.44\t1\t0\t3\t1"   

Ⅲ.调整格式

> read.table("clipboard",sep = "\t",header = T )
                     X  mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1            Mazda RX4 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
2        Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
3           Datsun 710 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
4       Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
5    Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2

(3)两步读取

> ex <- loadWorkbook ("data.xlsx")
> edata <- readWorksheet(ex,1)   #ex:变量    1:第一个工作表
> head(edata)
> edata <- readWorksheet(ex,1,startRow=0,starCol=0,endRow=50,endCol=3)  #指定表范围
> readWorksheetFromFile ("data.xlsx",1,startRow=0,starCol=0,
                       endRow=50,endCol=3,header=TRUE)  #一步读取

(4)四步写excel

 > wb <- loadWorkbook("file.xlsx",create=TRUE)
> createSheet(wb,"Sheet 1")
> writeWorksheet(wb,data=mtcars,sheet = "Sheet 1")
> saveWorkbook()

(5)一步写

> writeWorksheetToFile("file.xlsx",data = mtcars,sheet = "Sheet 1")
> vignette("XLConnect")

44.读取R文件

> load(file = "C:/Users/wangtong/Desktop/RData/Ch02.R")    #存储文件到桌面
> save(iris,iris3,file = "iris.Rdata")
> save.image()   #保存当前工作路径

45,数据转换
(1)矩阵转换为数据框

> is.data.frame(cars32)   #判断是否是数据框
[1] TRUE    #是数据框 
> is.data.frame(state.x77)
[1] FALSE     #不是数据框,是矩阵
> data.x77 <- as.data.frame(state.x77)   #强制转换为数据框
> is.data.frame(dstate.x77)
[1] TRUE    #转换成功

(2)数据框转换为矩阵

> as.matrix(data.frame(state.region,dstate.x77))   #转换为字符串类型的矩阵
> methods(is)   #查看包含的所有函数

(3)向量的转换
Ⅰ.给向量添加维度

> x <- state.abb
> x
 [1] "AL" "AK" "AZ" "AR" "CA" "CO" "CT" "DE" "FL" "GA" "HI" "ID" "IL" "IN" "IA" "KS" "KY" "LA" "ME" "MD" "MA" "MI" "MN" "MS" "MO" "MT"
[27] "NE" "NV" "NH" "NJ" "NM" "NY" "NC" "ND" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT" "VT" "VA" "WA" "WV" "WI" "WY"
> dim(x) <- c(5,10)
> x
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] "AL" "CO" "HI" "KS" "MA" "MT" "NM" "OK" "SD" "VA" 
[2,] "AK" "CT" "ID" "KY" "MI" "NE" "NY" "OR" "TN" "WA" 
[3,] "AZ" "DE" "IL" "LA" "MN" "NV" "NC" "PA" "TX" "WV" 
[4,] "AR" "FL" "IN" "ME" "MS" "NH" "ND" "RI" "UT" "WI" 
[5,] "CA" "GA" "IA" "MD" "MO" "NJ" "OH" "SC" "VT" "WY" 

Ⅱ.向量转换为因子

> x <- state.abb
> as.factor(x)
 [1] AL AK AZ AR CA CO CT DE FL GA HI ID IL IN IA KS KY LA ME MD MA MI MN MS MO MT NE NV NH NJ NM NY NC ND OH OK OR PA RI SC SD TN TX UT
[45] VT VA WA WV WI WY
50 Levels: AK AL AR AZ CA CO CT DE FL GA HI IA ID IL IN KS KY LA MA MD ME MI MN MO MS MT NC ND NE NH NJ NM NV NY OH OK OR PA RI ... WY

Ⅲ.向量变成列表

> as.list(x)
[[1]]
[1] "AL"

[[2]]
[1] "AK"

[[3]]
[1] "AZ"


Ⅳ.组成数据框

> state <- data.frame(x,state.region,state.x77)
> state$Income    #访问行
> state["Nevada",]    #访问列,注意加,
        x state.region Population Income Illiteracy Life.Exp Murder HS.Grad Frost   Area
Nevada NV         West        590   5149        0.5    69.03   11.5    65.2   188 109889
> is.data.frame(state["Nevada",])   #判断行是不是一个数据框
[1] TRUE
> y <- is.data.frame(state["Nevada",])
> y
[1] TRUE
> y <- state['Nevada',]
> y
        x state.region Population Income Illiteracy Life.Exp Murder HS.Grad Frost   Area
Nevada NV         West        590   5149        0.5    69.03   11.5    65.2   188 109889
> unname(y)  #不要列名
                                                      
Nevada NV West 590 5149 0.5 69.03 11.5 65.2 188 109889
> unlist(y)   #转换为向量
           x state.region   Population       Income   Illiteracy     Life.Exp       Murder      HS.Grad        Frost         Area 
        "NV"          "4"        "590"       "5149"        "0.5"      "69.03"       "11.5"       "65.2"        "188"     "109889" 

46.在数据表中找固定的行列

> who2 <- who[c(1,3,5,8),c(2,14,16,18)]  #行,列
> View(who2)   #形成表格
#取子集
> who3 <- who[which(who$Continent == 7),]
> View(who3)
> who4 <- who[which(who$CountryID > 50 & who$CountryID <= 100)]
> View(who4)
 #与上面实现功能一样
> who4 <- subset(who,who$CountryID > 50 & who$CountryID <= 100)
> View(who4)

47.sample随机抽样
(1)无返回抽样

> x <- 1:100
> sample(x,30)    #在x中抽取30个,无返回抽样,即每个值只出现一次
 [1] 40 88  8 80 26 21 55 72 95 77 20 19 11 73 25 82 84 59 71 10 69 74 16  1 38  5 48 52 12 27

(2)有放回抽样:即有重复值

> sample(x,60,replace = T)
 [1]  37  49  51  56  29  48  48  65   1  31  88  61 100  59  93  30  57  98  34  38  50  83  80  99  29  42  70 100  38  89   7  97  86
[34]  22   9  92   9  87  97  23  79  39  64   5  57  75  98  11   4  31   8  84  31  38  14  73  52  69   3  27
> sort (sample(x,60,replace = T))
 [1]  5  8  9 10 13 14 15 15 15 17 18 23 25 26 29 29 30 32 39 41 41 42 44 45 45 48 49 49 54 56 60 60 61 62 63 65 66 67 67 68 68 69 70 73
[45] 73 77 77 81 82 82 82 83 85 90 91 91 94 96 99 99

48.删除固定行
(1)删除对应的列

> mtcars[,-1:-5]
                       wt  qsec vs am gear carb
Mazda RX4           2.620 16.46  0  1    4    4
Mazda RX4 Wag       2.875 17.02  0  1    4    4
Datsun 710          2.320 18.61  1  1    4    1
Hornet 4 Drive      3.215 19.44  1  0    3    1
Hornet Sportabout   3.440 17.02  0  0    3    2

(2)清空对应行

> mtcars$mpg <- NULL
> head(mtcars)
                  cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       6  160 110 3.90 2.875 17.02  0  1    4    4

49.给数据框增加行或列
(1)增加列

> state.division   #美国地区  比如像我们的华北、华南……
 [1] East South Central Pacific            Mountain           West South Central Pacific            Mountain          
 [7] New England        South Atlantic     South Atlantic     South Atlantic     Pacific            Mountain          
> USArrests
               Murder Assault UrbanPop Rape
Alabama          13.2     236       58 21.2
Alaska           10.0     263       48 44.5
Arizona           8.1     294       80 31.0

> data.frame(state.division,USArrests)   #直接添加一个数据框
                   state.division Murder Assault UrbanPop Rape
Alabama        East South Central   13.2     236       58 21.2
Alaska                    Pacific   10.0     263       48 44.5
Arizona                  Mountain    8.1     294       80 31.0

> cbind(USArrests,state.division)  #也可以直接使用函数cbind,合并列
               Murder Assault UrbanPop Rape     state.division
Alabama          13.2     236       58 21.2 East South Central
Alaska           10.0     263       48 44.5            Pacific
Arizona           8.1     294       80 31.0           Mountain

(2)增加行(要具有相同的列才可以合并)

> data1 <- head(USArrests,20)   #取前20行
> data2 <- tail(USArrests,20)   #取后20行
> rbind(data1,data2)     #合并,具有相同的列
               Murder Assault UrbanPop Rape
Alabama          13.2     236       58 21.2
Alaska           10.0     263       48 44.5
Arizona           8.1     294       80 31.0
Arkansas          8.8     190       50 19.5
California        9.0     276       91 40.6

50.处理重复行数据

> data1 <- head(USArrests,30)
> data2 <- tail(USArrests,30)
> data4 <- rbind(data1,data2)
> data4
               Murder Assault UrbanPop Rape
Alabama          13.2     236       58 21.2
Alaska           10.0     263       48 44.5
Arizona           8.1     294       80 31.0
Arkansas          8.8     190       50 19.5
> rownames(data4)   #查看行名
 [1] "Alabama"        "Alaska"         "Arizona"        "Arkansas"       "California"     "Colorado"       "Connecticut"   
 [8] "Delaware"       "Florida"        "Georgia"        "Hawaii"         "Idaho"          "Illinois"       "Indiana"       
[15] "Iowa"           "Kansas"         "Kentucky"       "Louisiana"      "Maine"          "Maryland"       "Massachusetts" 
[22] "Michigan"       "Minnesota"      "Mississippi"    "Missouri"       "Montana"        "Nebraska"       "Nevada"        
[29] "New Hampshire"  "New Jersey"     "Massachusetts1" "Michigan1"      "Minnesota1"     "Mississippi1"   "Missouri1"     
[36] "Montana1"       "Nebraska1"      "Nevada1"        "New Hampshire1" "New Jersey1"    "New Mexico"     "New York"      
[43] "North Carolina" "North Dakota"   "Ohio"           "Oklahoma"       "Oregon"         "Pennsylvania"   "Rhode Island"  
[50] "South Carolina" "South Dakota"   "Tennessee"      "Texas"          "Utah"           "Vermont"        "Virginia"      
[57] "Washington"     "West Virginia"  "Wisconsin"      "Wyoming"       
> length(rownames(data4))   #行名个数,说明完全合并,没有去除重复项
[1] 60

> duplicated(data4)   #判断数据框中哪些是重复项,重复项返回TRUE
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE

> data4[duplicated(data4),]   #取出重复部分
               Murder Assault UrbanPop Rape
Massachusetts1    4.4     149       85 16.3
Michigan1        12.1     255       74 35.1
Minnesota1        2.7      72       66 14.9
Mississippi1     16.1     259       44 17.1
Missouri1         9.0     178       70 28.2
Montana1          6.0     109       53 16.4
Nebraska1         4.3     102       62 16.5
Nevada1          12.2     252       81 46.0
New Hampshire1    2.1      57       56  9.5
New Jersey1       7.4     159       89 18.8

> data4[!duplicated(data4),]   #取出非重复部分
               Murder Assault UrbanPop Rape
Alabama          13.2     236       58 21.2
Alaska           10.0     263       48 44.5
Arizona           8.1     294       80 31.0
Arkansas          8.8     190       50 19.5
California        9.0     276       91 40.6

> length(rownames(data4[!duplicated(data4),]))  #去除重复行
[1] 50

> unique(data4)  #以上步骤可以直接用这条命令,直接去除重复项
               Murder Assault UrbanPop Rape
Alabama          13.2     236       58 21.2
Alaska           10.0     263       48 44.5
Arizona           8.1     294       80 31.0
Arkansas          8.8     190       50 19.5

51.数据框的翻转
(1)整体

> sractm <- t(mtcars)  #整体翻转,行列互换

(2)行反向

> letters   #letters是向量
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
> rev(letters)
 [1] "z" "y" "x" "w" "v" "u" "t" "s" "r" "q" "p" "o" "n" "m" "l" "k" "j" "i" "h" "g" "f" "e" "d" "c" "b" "a"

> women
   height weight
1      58    115
2      59    117
3      60    120
4      61    123
5      62    126
6      63    129
7      64    132
8      65    135
9      66    139
10     67    142
11     68    146
12     69    150
13     70    154
14     71    159
15     72    164
> rownames(women)
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14" "15"
> rev(rownames(women))
 [1] "15" "14" "13" "12" "11" "10" "9"  "8"  "7"  "6"  "5"  "4"  "3"  "2"  "1" 
> women[rev(rownames(women)),]
   height weight
15     72    164
14     71    159
13     70    154
12     69    150
11     68    146
10     67    142
9      66    139
8      65    135
7      64    132
6      63    129
5      62    126
4      61    123
3      60    120
2      59    117
1      58    115

52.修改数据框值

> women$height
 [1] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
> women$height*2.54
 [1] 147.32 149.86 152.40 154.94 157.48 160.02 162.56 165.10 167.64 170.18 172.72 175.26 177.80 180.34 182.88
> data.frame(women$height^2.54,women$height)
   women.height.2.54 women.height
1           30137.49           58
2           31474.88           59
3           32847.64           60
4           34256.09           61


> transform(women,height = height*2.54)  #直接一步到位修改height值
   height weight
1  147.32    115
2  149.86    117
3  152.40    120
4  154.94    123
5  157.48    126

> transform(women,cm = height*2.54)   #也可以自己再定义一列
   height weight     cm
1      58    115 147.32
2      59    117 149.86
3      60    120 152.40
4      61    123 154.94
5      62    126 157.48

53.数据框排序(只能用于向量)

> sort(rivers)   #返回对应值
  [1]  135  202  210  210  215  217  230  230  233  237  246  250  250  250  255  259  260  260  265  268  270  276  280  280  280  281
 [27]  286  290  291  300  300  300  301  306  310  310  314  315  320  325  327  329  330  332  336  338  340  350  350  350  350  352
 [53]  360  360  360  360  375  377  380  380  383  390  390  392  407  410  411  420  420  424  425  430  431  435  444  445  450  460
 [79]  460  465  470  490  500  500  505  524  525  525  529  538  540  545  560  570  600  600  600  605  610  618  620  625  630  652
[105]  671  680  696  710  720  720  730  735  735  760  780  800  840  850  870  890  900  900  906  981 1000 1038 1054 1100 1171 1205
[131] 1243 1270 1306 1450 1459 1770 1885 2315 2348 2533 3710
> order(rivers)  #返回对应值所在位置,即索引,可以直接访问数据框
  [1]   8  17  39 108 129  52  36  42  91 117 133  34  56  87  76  55  41  75  37 127 138 107  13  30  72  53  29  19  49  61 103 124
 [33] 126  46  94 123 116  14   2   3  35  18  11  65  12  81  51  27  60  78 111  54  43 112 119 134  97 105 102 104  96  33  47   4
 [65]  28  73  88  48 110 122 106 139  77  92 125 100   6  74  95   9  57  93  84 136  22   5  31 132 135 113 120  99  62  59  10  21
 [97]  45  86 118  80 128  64  40 130 140  58  85  50  32 137  44   1  90  79  71 109  24  38  15  26  63 131  16  82  20 121  89 114
[129]  67 115  25  98  83  23   7 141 101  69  66  70  68


> order(mtcars$drat)
 [1]  6 22 15 16 12 13 14  4 25  5 23  7 17 31 30  8 21 24 28  3  1  2  9 10 11 18 26 32 20 29 27 19
> mtcars[order(mtcars$drat),]   #直接返回数据值,sort不行
                    cyl  disp  hp drat    wt  qsec vs am gear carb
Valiant               6 225.0 105 2.76 3.460 20.22  1  0    3    1
Dodge Challenger      8 318.0 150 2.76 3.520 16.87  0  0    3    2

> order(mtcars$drat,mtcars$disp)   #排量小的在前面
 [1]  6 22 15 16 12 13 14  4 25 23  5  7 17 31 30  8 21 24 28  3  1  2  9 10 11 18 26 32 20 29 27 19

54.对数据框的数学计算
(1)复杂求计算

> WorldPhones
     N.Amer Europe Asia S.Amer Oceania Africa Mid.Amer
1951  45939  21574 2876   1815    1646     89      555
1956  60423  29990 4708   2568    2366   1411      733
1957  64721  32510 5230   2695    2526   1546      773
1958  68484  35218 6662   2845    2691   1663      836
1959  71799  37598 6856   3000    2868   1769      911
1960  76036  40341 8220   3145    3054   1905     1008
1961  79831  43173 9053   3338    3224   2005     1076
> worldphones <- as.data.frame(WorldPhones)   #矩阵转换成数据框
> rs <- rowSums(worldphones)
> rs
  1951   1956   1957   1958   1959   1960   1961 
 74494 102199 110001 118399 124801 133709 141700 
 > cm <- colMeans(worldphones)
> cm
    N.Amer     Europe       Asia     S.Amer    Oceania     Africa   Mid.Amer 
66747.5714 34343.4286  6229.2857  2772.2857  2625.0000  1484.0000   841.7143 
> total <- cbind(worldphones,Total = rs)
> rbind (total,cm)
       N.Amer   Europe     Asia   S.Amer Oceania Africa  Mid.Amer     Total
1951 45939.00 21574.00 2876.000 1815.000    1646     89  555.0000  74494.00
1956 60423.00 29990.00 4708.000 2568.000    2366   1411  733.0000 102199.00
1957 64721.00 32510.00 5230.000 2695.000    2526   1546  773.0000 110001.00
1
> rbind (total,cm)
       N.Amer   Europe     Asia   S.Amer Oceania Africa  Mid.Amer     Total
1951 45939.00 21574.00 2876.000 1815.000    1646     89  555.0000  74494.00
1956 60423.00 29990.00 4708.000 2568.000    2366   1411  733.0000 102199.00
1957 64721.00 32510.00 5230.000 2695.000    2526   1546  773.0000 110001.00
8    66747.57 34343.43 6229.286 2772.286    2625   1484  841.7143  66747.57

(2)简单求计算

> apply(WorldPhones,MARGIN = 1,FUN = sum)  #对行求和,1代表行
  1951   1956   1957   1958   1959   1960   1961 
 74494 102199 110001 118399 124801 133709 141700 
> apply(WorldPhones,MARGIN = 2,FUN = mean)   #对列求平均值,2代表列
    N.Amer     Europe       Asia     S.Amer    Oceania     Africa   Mid.Amer 
66747.5714 34343.4286  6229.2857  2772.2857  2625.0000  1484.0000   841.7143 

55.lapply and sapply
apply:对应的是数据框
lapply:返回列表
sapply:返回向量或者矩阵
tapply:处理因子

> state.center  #列表值
$x
 [1]  -86.7509 -127.2500 -111.6250  -92.2992 -119.7730 -105.5130  -72.3573  -74.9841  -81.6850  -83.3736 -126.2500 -113.9300  -89.3776
[14]  -86.0808  -93.3714  -98.1156  -84.7674  -92.2724  -68.9801  -76.6459  -71.5800  -84.6870  -94.6043  -89.8065  -92.5137 -109.3200
[27]  -99.5898 -116.8510  -71.3924  -74.2336 -105.9420  -75.1449  -78.4686 -100.0990  -82.5963  -97.1239 -120.0680  -77.4500  -71.1244
[40]  -80.5056  -99.7238  -86.4560  -98.7857 -111.3300  -72.5450  -78.2005 -119.7460  -80.6665  -89.9941 -107.2560

$y
 [1] 32.5901 49.2500 34.2192 34.7336 36.5341 38.6777 41.5928 38.6777 27.8744 32.3329 31.7500 43.5648 40.0495 40.0495 41.9358 38.4204
[17] 37.3915 30.6181 45.6226 39.2778 42.3645 43.1361 46.3943 32.6758 38.3347 46.8230 41.3356 39.1063 43.3934 39.9637 34.4764 43.1361
[33] 35.4195 47.2517 40.2210 35.5053 43.9078 40.9069 41.5928 33.6190 44.3365 35.6767 31.3897 39.1063 44.2508 37.5630 47.4231 38.4204
[49] 44.5937 43.0504

> lapply(state.center,FUN = length)  #返回长度
$x
[1] 50

$y
[1] 50
> sapply(state.center,FUN = length)   #返回的是向量值
 x  y 
50 50 
> class(sapply(state.center,FUN = length))
[1] "integer"

> tapply(state.name,state.division,FUN = length)  #查询美国每个区包括多少个州
       New England    Middle Atlantic     South Atlantic East South Central West South Central East North Central West North Central 
                 6                  3                  8                  4                  4                  5                  7 
          Mountain            Pacific 
                 8                  5 


56.数据的中心化和标准化
数据中心化:指数据集中各项数据减去数据集的均值
数据标准化:是指在中心化之后在除以数据集的标准差,即数据集中的各项数据减去数据集的均值再除以数据集的标准差。

> x <- c(1,2,3,6,3)
> mean(x)
[1] 3
> x - mean(x)   #中心化后相差还是有点大
[1] -2 -1  0  3  0
> sd(x)   #计算标准差
[1] 1.870829
> (x - mean(x))/sd(x)  #标准化
[1] -1.0690450 -0.5345225  0.0000000  1.6035675  0.0000000


> x <- scale(state.x77,scale = T,center = T)  #利用scale函数中心化和标准化
> head(x)
           Population     Income Illiteracy   Life Exp     Murder    HS Grad      Frost       Area
Alabama    -0.1414316 -1.3211387   1.525758 -1.3621937  2.0918101 -1.4619293 -1.6248292 -0.2347183
Alaska     -0.8693980  3.0582456   0.541398 -1.1685098  1.0624293  1.6828035  0.9145676  5.8093497
Arizona    -0.4556891  0.1533029   1.033578 -0.2447866  0.1143154  0.6180514 -1.7210185  0.5002047
Arkansas   -0.4785360 -1.7214837   1.197638 -0.1628435  0.7373617 -1.6352611 -0.7591257 -0.2202212
California  3.7969790  1.1037155  -0.114842  0.6193415  0.7915396  1.1751891 -1.6248292  1.0034903
Colorado   -0.3819965  0.7294092  -0.771082  0.8800698 -0.1565742  1.3361400  1.1838976  0.3870991
> head(state.x77)
           Population Income Illiteracy Life Exp Murder HS Grad Frost   Area
Alabama          3615   3624        2.1    69.05   15.1    41.3    20  50708
Alaska            365   6315        1.5    69.31   11.3    66.7   152 566432
Arizona          2212   4530        1.8    70.55    7.8    58.1    15 113417
Arkansas         2110   3378        1.9    70.66   10.1    39.9    65  51945
California      21198   5114        1.1    71.71   10.3    62.6    20 156361
Colorado         2541   4884        0.7    72.06    6.8    63.9   166 103766
> heatmap(x)

57.merge函数的使用(reshape2的使用)

> x <- data.frame(k1 = c(NA,NA,3,4,5), k2 = c(1,NA,NA,4,5),data = 1:5)
> y <- data.frame(k1 = c(NA,2,NA,4,5), k2 = c(NA,NA,3,4,5),data = 1:5)
> x
  k1 k2 data
1 NA  1    1
2 NA NA    2
3  3 NA    3
4  4  4    4
5  5  5    5
> y
  k1 k2 data
1 NA NA    1
2  2 NA    2
3 NA  3    3
4  4  4    4
5  5  5    5

> merge(x,y,by = "k1")  #合并k1,交集
  k1 k2.x data.x k2.y data.y
1  4    4      4    4      4
2  5    5      5    5      5
3 NA    1      1   NA      1
4 NA    1      1    3      3
5 NA   NA      2   NA      1
6 NA   NA      2    3      3

> merge(x,y,by = "k2",incomparables = T)  #合并k2,除去NA的情况
  k2 k1.x data.x k1.y data.y
1  4    4      4    4      4
2  5    5      5    5      5
3 NA   NA      2   NA      1
4 NA   NA      2    2      2
5 NA    3      3   NA      1
6 NA    3      3    2      2

> merge(x,y,by = c("k1","k2"))   #合并k1,k2
  k1 k2 data.x data.y
1  4  4      4      4
2  5  5      5      5
3 NA NA      2      1

58.gather函数:调整列,可以使固定列不变,其他列转换
(能把一个变量名含有变量的二维表转换成一个规范的二维表)

> tdata <- mtcars[1:10,1:3]  #取这函数的一部分
> > tdata <- data.frame(name = rownames(tdata),tdata)
> tdata
                               name  mpg cyl  disp
Mazda RX4                 Mazda RX4 21.0   6 160.0
Mazda RX4 Wag         Mazda RX4 Wag 21.0   6 160.0
Datsun 710               Datsun 710 22.8   4 108.0
Hornet 4 Drive       Hornet 4 Drive 21.4   6 258.0
Hornet Sportabout Hornet Sportabout 18.7   8 360.0
Valiant                     Valiant 18.1   6 225.0
Duster 360               Duster 360 14.3   8 360.0
Merc 240D                 Merc 240D 24.4   4 146.7
Merc 230                   Merc 230 22.8   4 140.8
Merc 280                   Merc 280 19.2   6 167.6

> gather(tdata,key = "Key",value = "Value",cyl,disp,mpg)
                name  Key Value
1          Mazda RX4  cyl   6.0
2      Mazda RX4 Wag  cyl   6.0
3         Datsun 710  cyl   4.0
4     Hornet 4 Drive  cyl   6.0
5  Hornet Sportabout  cyl   8.0
6            Valiant  cyl   6.0
7         Duster 360  cyl   8.0
8          Merc 240D  cyl   4.0
9           Merc 230  cyl   4.0
10          Merc 280  cyl   6.0
11         Mazda RX4 disp 160.0
12     Mazda RX4 Wag disp 160.0
13        Datsun 710 disp 108.0
14    Hornet 4 Drive disp 258.0
15 Hornet Sportabout disp 360.0
16           Valiant disp 225.0
17        Duster 360 disp 360.0
18         Merc 240D disp 146.7
19          Merc 230 disp 140.8
20          Merc 280 disp 167.6
21         Mazda RX4  mpg  21.0
22     Mazda RX4 Wag  mpg  21.0
23        Datsun 710  mpg  22.8
24    Hornet 4 Drive  mpg  21.4
25 Hornet Sportabout  mpg  18.7
26           Valiant  mpg  18.1
27        Duster 360  mpg  14.3
28         Merc 240D  mpg  24.4
29          Merc 230  mpg  22.8
30          Merc 280  mpg  19.2

> gather(tdata,key = "Key",value = "Value",cyl,-disp)  #除去disp这一列
                name  mpg  disp Key Value
1          Mazda RX4 21.0 160.0 cyl     6
2      Mazda RX4 Wag 21.0 160.0 cyl     6
3         Datsun 710 22.8 108.0 cyl     4
4     Hornet 4 Drive 21.4 258.0 cyl     6
5  Hornet Sportabout 18.7 360.0 cyl     8
6            Valiant 18.1 225.0 cyl     6
7         Duster 360 14.3 360.0 cyl     8
8          Merc 240D 24.4 146.7 cyl     4
9           Merc 230 22.8 140.8 cyl     4
10          Merc 280 19.2 167.6 cyl     6
> gather(tdata,key = "Key",value = "Value",2:4)  #以防输错行名,可以用数字代替
                name  Key Value
1          Mazda RX4  mpg  21.0
2      Mazda RX4 Wag  mpg  21.0
3         Datsun 710  mpg  22.8
4     Hornet 4 Drive  mpg  21.4
5  Hornet Sportabout  mpg  18.7
6            Valiant  mpg  18.1
7         Duster 360  mpg  14.3
8          Merc 240D  mpg  24.4
9           Merc 230  mpg  22.8
10          Merc 280  mpg  19.2
11         Mazda RX4  cyl   6.0
12     Mazda RX4 Wag  cyl   6.0
13        Datsun 710  cyl   4.0
14    Hornet 4 Drive  cyl   6.0
15 Hornet Sportabout  cyl   8.0
16           Valiant  cyl   6.0
17        Duster 360  cyl   8.0
18         Merc 240D  cyl   4.0
19          Merc 230  cyl   4.0
20          Merc 280  cyl   6.0
21         Mazda RX4 disp 160.0
22     Mazda RX4 Wag disp 160.0
23        Datsun 710 disp 108.0
24    Hornet 4 Drive disp 258.0
25 Hornet Sportabout disp 360.0
26           Valiant disp 225.0
27        Duster 360 disp 360.0
28         Merc 240D disp 146.7
29          Merc 230 disp 140.8
30          Merc 280 disp 167.6


59.spread函数:用来扩展表,把某一列的值(键值对)分开拆成多列。
key是原来要拆的那一列的名字(变量名),value是拆出来的那些列的值应该填什么(填原表的哪一列)

> tdata <- gather(tdata,key = "Key",value = "Value",2:4)
> tdata
                name  Key Value
1          Mazda RX4  mpg  21.0
2      Mazda RX4 Wag  mpg  21.0
3         Datsun 710  mpg  22.8
4     Hornet 4 Drive  mpg  21.4
5  Hornet Sportabout  mpg  18.7
> spread(tdata,key = "Key",value = "Value")
                name cyl  disp  mpg
1         Datsun 710   4 108.0 22.8
2         Duster 360   8 360.0 14.3
3     Hornet 4 Drive   6 258.0 21.4
4  Hornet Sportabout   8 360.0 18.7
5          Mazda RX4   6 160.0 21.0
6      Mazda RX4 Wag   6 160.0 21.0
7           Merc 230   4 140.8 22.8
8          Merc 240D   4 146.7 24.4
9           Merc 280   6 167.6 19.2
10           Valiant   6 225.0 18.1

60.separate函数:负责分割数据,把一个变量中就包含两个变量的数据分来

> df <- data.frame(x = c(NA,"a.b","a.d","b.c"))
> df
     x
1 <NA>
2  a.b
3  a.d
4  b.c
> separate(df,col = x,into = c("A","B"))  #创建新的列,一列分成几列
     A    B
1 <NA> <NA>
2    a    b
3    a    d
4    b    c

> df <- data.frame(x = c(NA,"a.b-c","a-d","b-c"))
> separate(df,x,into = c("A","B"),sep = "-")
     A    B
1 <NA> <NA>
2  a.b    c
3    a    d
4    b    c

61.unite函数:合并列

> unite(x,col = "AB",A,B,sep = "-")    
     AB
1 NA-NA
2 a.b-c
3   a-d
4   b-c

62.dplyr包:数据格式的转换
(1)filter() 函数可以基于观测的值筛选出一个观测子集

> dplyr::filter(iris,Sepal.Length > 7)  #除去长度小于7的
   Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
1           7.1         3.0          5.9         2.1 virginica
2           7.6         3.0          6.6         2.1 virginica
3           7.3         2.9          6.3         1.8 virginica
4           7.2         3.6          6.1         2.5 virginica
5           7.7         3.8          6.7         2.2 virginica
6           7.7         2.6          6.9         2.3 virginica

(2)distinct函数:去除重复项

> dplyr::distinct(rbind(iris[1:10,],iris[1:15,])) #除去重复项
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
11          5.4         3.7          1.5         0.2  setosa
12          4.8         3.4          1.6         0.2  setosa
13          4.8         3.0          1.4         0.1  setosa
14          4.3         3.0          1.1         0.1  setosa
15          5.8         4.0          1.2         0.2  setosa

(3)slice函数:切片,可以取出任意行

> dplyr::slice(iris,10:15)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          4.9         3.1          1.5         0.1  setosa
2          5.4         3.7          1.5         0.2  setosa
3          4.8         3.4          1.6         0.2  setosa
4          4.8         3.0          1.4         0.1  setosa
5          4.3         3.0          1.1         0.1  setosa
6          5.8         4.0          1.2         0.2  setosa

(4)sample_n函数:随机取样

> dplyr::sample_n(iris,10)  #随机抽取10行
   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1           5.0         2.0          3.5         1.0 versicolor
2           6.7         3.1          4.7         1.5 versicolor
3           5.0         3.4          1.5         0.2     setosa
4           7.1         3.0          5.9         2.1  virginica
5           6.8         3.2          5.9         2.3  virginica
6           5.1         3.8          1.5         0.3     setosa
7           4.9         3.6          1.4         0.1     setosa
8           6.5         3.0          5.2         2.0  virginica
9           5.0         3.3          1.4         0.2     setosa
10          5.0         2.3          3.3         1.0 versicolor

(5)sample_frac函数:按比例随机选取

> dplyr::sample_frac(iris,0.1)
   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1           5.8         2.6          4.0         1.2 versicolor
2           5.1         3.5          1.4         0.3     setosa
3           5.1         3.8          1.9         0.4     setosa
4           6.7         3.3          5.7         2.5  virginica
5           6.8         2.8          4.8         1.4 versicolor
6           6.0         2.2          5.0         1.5  virginica
7           6.9         3.2          5.7         2.3  virginica
8           5.3         3.7          1.5         0.2     setosa
9           4.9         3.6          1.4         0.1     setosa
10          5.5         2.6          4.4         1.2 versicolor
11          6.9         3.1          5.1         2.3  virginica
12          5.7         3.0          4.2         1.2 versicolor
13          5.7         2.9          4.2         1.3 versicolor
14          6.7         3.1          4.4         1.4 versicolor
15          6.0         2.7          5.1         1.6 versicolor

(6)arrange函数:排序

> dplyr::arrange(iris,Sepal.Length)  #按花萼长度排序,升序
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1            4.3         3.0          1.1         0.1     setosa
2            4.4         2.9          1.4         0.2     setosa
3            4.4         3.0          1.3         0.2     setosa

> dplyr::arrange(iris,desc(Sepal.Length)) #降序排列
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1            7.9         3.8          6.4         2.0  virginica
2            7.7         3.8          6.7         2.2  virginica
3            7.7         2.6          6.9         2.3  virginica
4

(7)统计函数summarise

> summarise(iris,avg = mean(Sepal.Length))  #计算花萼的平均长度
       avg
1 5.843333

> summarise(iris,sum = sum(Sepal.Length))
    sum
1 876.5

(8)%>%:链式操作符,用于实现将一个函数的输出传递给下一个函数,作为下一个函数的输入(相当于,管道)快捷键:ctrl+shift+M

> head(mtcars,20) %>% tail(10)  #取的是11——20行
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1

(9)group_by:分组

> dplyr::group_by(iris,Species)  #分成了3组
# A tibble: 150 x 5
# Groups:   Species [3]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
 1          5.1         3.5          1.4         0.2 setosa 
 2          4.9         3            1.4         0.2 setosa 
 3          4.7         3.2          1.3         0.2 setosa 
 4          4.6         3.1          1.5         0.2 setosa 
 5          5           3.6          1.4         0.2 setosa 
 6          5.4         3.9          1.7         0.4 setosa 
 7          4.6         3.4          1.4         0.3 setosa 
 8          5           3.4          1.5         0.2 setosa 
 9          4.4         2.9          1.4         0.2 setosa 
10          4.9         3.1          1.5         0.1 setosa 
# ... with 140 more rows

> iris %>% group_by(Species)  #与上一条命令执行结果一样

> iris %>% group_by(Species) %>% summarise(avg = mean(Sepal.Width)) //统计平均宽度
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 2
  Species      avg
  <fct>      <dbl>
1 setosa      3.43
2 versicolor  2.77
3 virginica   2.97
> iris %>% group_by(Species) %>% summarise(avg = mean(Sepal.Width)) %>% arrange(avg)
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 2
  Species      avg
  <fct>      <dbl>
1 versicolor  2.77
2 virginica   2.97
3 setosa      3.43

(10)mutate函数:增加新的列

> dplyr::mutate(iris,new = Sepal.Length+Petal.Length)  #花萼、花瓣长度总和
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species  new
1            5.1         3.5          1.4         0.2     setosa  6.5
2            4.9         3.0          1.4         0.2     setosa  6.3
3            4.7         3.2          1.3         0.2     setosa  6.0
4            4.6         3.1          1.5         0.2     setosa  6.1
5

63.dplyr包的双表格

> a=data.frame(x1=c("A","B","C"),x2=c(1,2,3))
> b=data.frame(x1=c("A","B","D"),x3=c(T,F,T))
> a
  x1 x2
1  A  1
2  B  2
3  C  3
> b
  x1    x3
1  A  TRUE
2  B FALSE
3  D  TRUE
> dplyr::left_join(a,b,by="x1")  #左链接,x1给的是a的值 b中没有
C这一列,则给的是NA
  x1 x2    x3
1  A  1  TRUE
2  B  2 FALSE
3  C  3    NA
> dplyr::right_join(a,b,by="x1") #右链接,x1是b中给的
  x1 x2    x3
1  A  1  TRUE
2  B  2 FALSE
3  D NA  TRUE

> dplyr::full_join(a,b,by="x1")  #全链接,取x1的并集
  x1 x2    x3
1  A  1  TRUE
2  B  2 FALSE
3  C  3    NA
4  D NA  TRUE

> dplyr::semi_join(a,b,by="x1")  #半链接,把x1中的交集取出来
  x1 x2
1  A  1
2  B  2

> dplyr::anti_join(a,b,by="x1")  #反链接,把x1中的补集取出来
  x1 x2
1  C  3

64.数据集的合并

> mtcars <- mutate(mtcars,Model = rownames(mtcars))  #多添加一行
> mtcars
    mpg cyl  disp  hp drat    wt  qsec vs am gear carb               Model
1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4           Mazda RX4
2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4       Mazda RX4 Wag
3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1          Datsun 710
4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1      Hornet 4 Drive
> first <- slice(mtcars,1:20)  #取1-20行
> second <- slice(mtcars,10:30)  #取10-30行
> intersect(first,second)  #取两个交集
> union_all(first,second)  #取两个并集
> setdiff(first,second)   #取first补集
> setdiff(second,first)  #取second补集

你可能感兴趣的:(R语言)