data.table简介

data.table语法介绍

因为这篇文章主要是data.table,所以在详细对比之前,先来介绍一下dplyr的情况

dplyr的优点在于语法优雅,符合人的逻辑,简单易懂;而data.table则在于语法简介,运行速度快,对于大数据来说非常强大,但是语法有时候也不太容易理解



dplyr包经常用的函数

  • select(),选择列
  • filter(),筛选行
  • mutate(),增加新列,类似于transform
  • group_by,分组
  • summarise(),汇总数据


data.table

data.table的通用格式为DT[i,j,by],i代表行,j代表列,by代表分组依据

这里的话我们选用iris数据集来进行说明

> DT <- data.table(iris)
> set.seed(45L)
> DT[,c("V1","V2"):=list(LETTERS[1:3],c(1L,2L))]
> names(DT) <- tolower(names(DT))
> head(DT)
   sepal.length sepal.width petal.length petal.width species v1 v2
1:          5.1         3.5          1.4         0.2  setosa  A  1
2:          4.9         3.0          1.4         0.2  setosa  B  2
3:          4.7         3.2          1.3         0.2  setosa  C  1
4:          4.6         3.1          1.5         0.2  setosa  A  2
5:          5.0         3.6          1.4         0.2  setosa  B  1
6:          5.4         3.9          1.7         0.4  setosa  C  2

1.通过i来筛选行

  • 通过行数

选取3到5行的数据

> DT[3:5,] #or DT[3:5]
   sepal.length sepal.width petal.length petal.width species v1 v2
1:          4.7         3.2          1.3         0.2  setosa  C  1
2:          4.6         3.1          1.5         0.2  setosa  A  2
3:          5.0         3.6          1.4         0.2  setosa  B  1
  • 通过特定条件
    这里是用"=="这种方式,这种方式虽然简单易懂,但是会遍历整个数组,速度会有点慢,所以建议设置键,后面会有讲到
> head(DT[species=='setosa'])
   sepal.length sepal.width petal.length petal.width species v1 v2
1:          5.1         3.5          1.4         0.2  setosa  A  1
2:          4.9         3.0          1.4         0.2  setosa  B  2
3:          4.7         3.2          1.3         0.2  setosa  C  1
4:          4.6         3.1          1.5         0.2  setosa  A  2
5:          5.0         3.6          1.4         0.2  setosa  B  1
6:          5.4         3.9          1.7         0.4  setosa  C  2
> tail(DT[species=='setosa'])
   sepal.length sepal.width petal.length petal.width species v1 v2
1:          5.1         3.8          1.9         0.4  setosa  C  1
2:          4.8         3.0          1.4         0.3  setosa  A  2
3:          5.1         3.8          1.6         0.2  setosa  B  1
4:          4.6         3.2          1.4         0.2  setosa  C  2
5:          5.3         3.7          1.5         0.2  setosa  A  1
6:          5.0         3.3          1.4         0.2  setosa  B  2
> head(DT[species %in% c("setosa","versicolor")]) #这两代表或的意思
   sepal.length sepal.width petal.length petal.width species v1 v2
1:          5.1         3.5          1.4         0.2  setosa  A  1
2:          4.9         3.0          1.4         0.2  setosa  B  2
3:          4.7         3.2          1.3         0.2  setosa  C  1
4:          4.6         3.1          1.5         0.2  setosa  A  2
5:          5.0         3.6          1.4         0.2  setosa  B  1
6:          5.4         3.9          1.7         0.4  setosa  C  2
> tail(DT[species %in% c("setosa","versicolor")])
   sepal.length sepal.width petal.length petal.width    species v1 v2
1:          5.6         2.7          4.2         1.3 versicolor  B  1
2:          5.7         3.0          4.2         1.2 versicolor  C  2
3:          5.7         2.9          4.2         1.3 versicolor  A  1
4:          6.2         2.9          4.3         1.3 versicolor  B  2
5:          5.1         2.5          3.0         1.1 versicolor  C  1
6:          5.7         2.8          4.1         1.3 versicolor  A  2
> head(DT[sepal.length %between% c(4.5,5)])
   sepal.length sepal.width petal.length petal.width species v1 v2
1:          4.9         3.0          1.4         0.2  setosa  B  2
2:          4.7         3.2          1.3         0.2  setosa  C  1
3:          4.6         3.1          1.5         0.2  setosa  A  2
4:          5.0         3.6          1.4         0.2  setosa  B  1
5:          4.6         3.4          1.4         0.3  setosa  A  1
6:          5.0         3.4          1.5         0.2  setosa  B  2
> tail(DT[sepal.length %between% c(4.5,5)])
   sepal.length sepal.width petal.length petal.width    species v1 v2
1:          4.6         3.2          1.4         0.2     setosa  C  2
2:          5.0         3.3          1.4         0.2     setosa  B  2
3:          4.9         2.4          3.3         1.0 versicolor  A  2
4:          5.0         2.0          3.5         1.0 versicolor  A  1
5:          5.0         2.3          3.3         1.0 versicolor  A  2
6:          4.9         2.5          4.5         1.7  virginica  B  1

2.通过j来对列进行操作

2.1 选取列

  • 选取一列

.()相当于list()

> head(DT[,sepal.width]) #以向量形式展现
[1] 3.5 3.0 3.2 3.1 3.6 3.9
> head(DT[,.(sepal.width)]) #数据框的形式展现
   sepal.width
1:         3.5
2:         3.0
3:         3.2
4:         3.1
5:         3.6
6:         3.9
  • 选取多列
> head(DT[,.(sepal.width,sepal.length)])
   sepal.width sepal.length
1:         3.5          5.1
2:         3.0          4.9
3:         3.2          4.7
4:         3.1          4.6
5:         3.6          5.0
6:         3.9          5.4
  • 用列数来选取行
> head(DT[,1,with=FALSE]) #选取第一列
   sepal.length
1:          5.1
2:          4.9
3:          4.7
4:          4.6
5:          5.0
6:          5.4
> head(DT[,2,with=FALSE]) #选取第二列
   sepal.width
1:         3.5
2:         3.0
3:         3.2
4:         3.1
5:         3.6
6:         3.9
> head(DT[,3,with=FALSE]) #选取第三列
   petal.length
1:          1.4
2:          1.4
3:          1.3
4:          1.5
5:          1.4
6:          1.7

2.2 在j上使用函数

> DT[,sum(sepal.width)]
[1] 458.6
> DT[,.(sum(sepal.width))]
      V1
1: 458.6
> DT[,.(SUM=sum(sepal.width))] #可以重命名
     SUM
1: 458.6
  • 选取列和使用函数可以一起用
    如果列的长度不一,则会循环对齐
> head(DT[,.(sepal.width,sd=sd(sepal.width))])
   sepal.width        sd
1:         3.5 0.4358663
2:         3.0 0.4358663
3:         3.2 0.4358663
4:         3.1 0.4358663
5:         3.6 0.4358663
6:         3.9 0.4358663
  • 多个表达式可以包含在大括号中
> DT[,{print(head(sepal.width))
+   plot(sepal.width)
+   NULL}]
[1] 3.5 3.0 3.2 3.1 3.6 3.9
#这里应该是一副散点图,在代码块不好展示图(主要是懒)
NULL

3.根据分组来操作j

  • 对species中的每一类来计算sepal.length的和
> DT[,.(SUM=sum(sepal.length),by=species)]
       SUM        by
  1: 876.5    setosa
  2: 876.5    setosa
  3: 876.5    setosa
  4: 876.5    setosa
  5: 876.5    setosa
 ---                
146: 876.5 virginica
147: 876.5 virginica
148: 876.5 virginica
149: 876.5 virginica
150: 876.5 virginica

#注意by加.()和没加.()的区别
> DT[,.(SUM=sum(sepal.length)),by=.(species)]
      species   SUM
1:     setosa 250.3
2: versicolor 296.8
3:  virginica 329.4
  • 对多列进行分组
> DT[,.(SUM=sum(sepal.width)),by=.(species,v1)]
      species v1  SUM
1:     setosa  A 59.0
2:     setosa  B 58.6
3:     setosa  C 53.8
4: versicolor  C 46.5
5: versicolor  A 45.5
6: versicolor  B 46.5
7:  virginica  B 51.4
8:  virginica  C 49.6
9:  virginica  A 47.7
  • 在by中使用函数
> DT[,.(SUM=sum(sepal.length)),by=sign(v2-1)]
   sign   SUM
1:    0 438.0
2:    1 438.5
  • 指定i行子集进行分组汇总
> DT[1:40,.(SUM=sum(sepal.length)),by=species]
   species   SUM
1:  setosa 201.5
  • 使用.N来计算每个分组的个数
> DT[,.(count=.N),by=species]
      species count
1:     setosa    50
2: versicolor    50
3:  virginica    50

4.使用:=来增加,更改,减少列

注意:用了:=这种方法,会直接在原数据集上进行更改,所以DT <- DT[,:=]是不需要的,直接DT[,:=]就可以了

  • 更新一列
> dt <- copy(DT)
> head(dt)
   sepal.length sepal.width petal.length petal.width species v1 v2
1:          5.1         3.5          1.4         0.2  setosa  A  1
2:          4.9         3.0          1.4         0.2  setosa  B  2
3:          4.7         3.2          1.3         0.2  setosa  C  1
4:          4.6         3.1          1.5         0.2  setosa  A  2
5:          5.0         3.6          1.4         0.2  setosa  B  1
6:          5.4         3.9          1.7         0.4  setosa  C  2
> head(dt[,v1:=round(exp(v2),2)])
   sepal.length sepal.width petal.length petal.width species v1 v2
1:          5.1         3.5          1.4         0.2  setosa  3  1
2:          4.9         3.0          1.4         0.2  setosa  7  2
3:          4.7         3.2          1.3         0.2  setosa  3  1
4:          4.6         3.1          1.5         0.2  setosa  7  2
5:          5.0         3.6          1.4         0.2  setosa  3  1
6:          5.4         3.9          1.7         0.4  setosa  7  2
  • 增加多列
> dt[,c("h1","h2"):=.(round(exp(v2)),LETTERS[4:6])]
> head(dt)
   sepal.length sepal.width petal.length petal.width species v1 v2 h1 h2
1:          5.1         3.5          1.4         0.2  setosa  3  1  3  D
2:          4.9         3.0          1.4         0.2  setosa  7  2  7  E
3:          4.7         3.2          1.3         0.2  setosa  3  1  3  F
4:          4.6         3.1          1.5         0.2  setosa  7  2  7  D
5:          5.0         3.6          1.4         0.2  setosa  3  1  3  E
6:          5.4         3.9          1.7         0.4  setosa  7  2  7  F

# 上面可以可以写成,因为展示方便,修改是只选取了第5至第9列数据
> head(dt[,':='(h1=round(exp(v2)),h2=LETTERS[4:6])][,5:9])
   species v1 v2 h1 h2
1:  setosa  A  1  3  D
2:  setosa  B  2  7  E
3:  setosa  C  1  3  F
4:  setosa  A  2  7  D
5:  setosa  B  1  3  E
6:  setosa  C  2  7  F
  • 删除列
> dt[,':='(h1=NULL,h2=NULL)]
> head(dt)
   sepal.length sepal.width petal.length petal.width species v1 v2
1:          5.1         3.5          1.4         0.2  setosa  A  1
2:          4.9         3.0          1.4         0.2  setosa  B  2
3:          4.7         3.2          1.3         0.2  setosa  C  1
4:          4.6         3.1          1.5         0.2  setosa  A  2
5:          5.0         3.6          1.4         0.2  setosa  B  1
6:          5.4         3.9          1.7         0.4  setosa  C  2

也可以写成下面这种
----------
> head(dt[,c("h1","h2"):=NULL])
   sepal.length sepal.width petal.length petal.width species v1 v2
1:          5.1         3.5          1.4         0.2  setosa  A  1
2:          4.9         3.0          1.4         0.2  setosa  B  2
3:          4.7         3.2          1.3         0.2  setosa  C  1
4:          4.6         3.1          1.5         0.2  setosa  A  2
5:          5.0         3.6          1.4         0.2  setosa  B  1
6:          5.4         3.9          1.7         0.4  setosa  C  2

  • 修改特定条件下的值
> dt[sepal.length>4&v1=='A',v2:=3]

> head(dt[,.(v2)])
   v2
1:  3
2:  2
3:  1
4:  3
5:  1
6:  2

5.设置索引列并进行操作

  • 在创建数据框时就直接设定索引列
data <- data.table(a=c('A','B','C','A','A','B'),b=rnorm(6),key="a")
> head(data)
   a          b
1: A  0.3407997
2: A -0.7460474
3: A -0.8981073
4: B -0.7033403
5: B -0.3347941
6: C -0.3795377
  • 有数据框之后再设定
> dt <- data.table(a=c('A','B','C','A','A','B'),b=rnorm(6))
> dt
   a          b
1: A -0.5013782
2: B -0.1745357
3: C  1.8090374
4: A -0.2301050
5: A -1.1304182
6: B  0.2159889

#仔细对比两个dt的值

> setkey(dt,a) #会自动对键值列进行排序
> dt
   a          b
1: A -0.5013782
2: A -0.2301050
3: A -1.1304182
4: B -0.1745357
5: B  0.2159889
6: C  1.8090374
  • 查看数据框时候有key
> key(dt)
[1] "a"
> haskey(dt)
[1] TRUE
> attributes(dt)
$names
[1] "a" "b"

$row.names
[1] 1 2 3 4 5 6

$class
[1] "data.table" "data.frame"

$.internal.selfref


$sorted
[1] "a"

> attributes(dt)$sorted
[1] "a"
  • 设置a列为索引列后取a列中值为B的行
> dt['B']
   a          b
1: B -0.1745357
2: B  0.2159889
  • 设置索引之后取a列中值为B的第一行
> dt['B',mult='first'] #mult参数默认为"all"
   a          b
1: B -0.1745357
  • 设置索引之后取a列中值为B的最后一行
> dt['B',mult='last']
   a         b
1: B 0.2159889
  • 设置a列为索引列后取a列中值为A或B的行
> dt[c('A','B')]
   a          b
1: A -0.5013782
2: A -0.2301050
3: A -1.1304182
4: B -0.1745357
5: B  0.2159889
  • nomatch参数用于给定在没有匹配到值得时候该给予什么值,默认为NA,也可以设置为0,0代表对于没有匹配到的行将不会返回
> dt[c('A','D')]
   a          b
1: A -0.5013782
2: A -0.2301050
3: A -1.1304182
4: D         NA
----------
> dt[c('A','D'),nomatch=0]
   a          b
1: A -0.5013782
2: A -0.2301050
3: A -1.1304182

  • by=.EACHI参数允许按每一个已知i的子集分组,使用前必须先设置键值列
> dt[c('A','B'),sum(b)]
[1] -1.820448

    ----------

> dt[c('A','B'),sum(b),by=.EACHI]
   a          V1
1: A -1.86190135
2: B  0.04145319
  • 设置多个键值列
> head(DT)
   sepal.length sepal.width petal.length petal.width species v1 v2
1:          5.1         3.5          1.4         0.2  setosa  A  1
2:          4.9         3.0          1.4         0.2  setosa  B  2
3:          4.7         3.2          1.3         0.2  setosa  C  1
4:          4.6         3.1          1.5         0.2  setosa  A  2
5:          5.0         3.6          1.4         0.2  setosa  B  1
6:          5.4         3.9          1.7         0.4  setosa  C  2
> setkey(DT,v1,v2) #会先按v1排序,在按v2排序
> head(DT[.('B',1)]) #筛选出v1列值为B,v2列值为1的数据
   sepal.length sepal.width petal.length petal.width species v1 v2
1:          5.0         3.6          1.4         0.2  setosa  B  1
2:          5.4         3.7          1.5         0.2  setosa  B  1
3:          5.4         3.9          1.3         0.4  setosa  B  1
4:          4.6         3.6          1.0         0.2  setosa  B  1
5:          5.2         3.4          1.4         0.2  setosa  B  1
6:          4.9         3.1          1.5         0.2  setosa  B  1


> head(DT[.(c('A','B'),1)]) #筛选出v1列值为A或者B,v2列值为1的数据
   sepal.length sepal.width petal.length petal.width species v1 v2
1:          5.1         3.5          1.4         0.2  setosa  A  1
2:          4.6         3.4          1.4         0.3  setosa  A  1
3:          4.8         3.0          1.4         0.1  setosa  A  1
4:          5.7         3.8          1.7         0.3  setosa  A  1
5:          4.8         3.4          1.9         0.2  setosa  A  1
6:          4.8         3.1          1.6         0.2  setosa  A  1
> tail(DT[.(c('A','B'),1)]) #筛选出v1列值为A或者B,v2列值为1的数据
   sepal.length sepal.width petal.length petal.width   species v1 v2
1:          7.7         2.6          6.9         2.3 virginica  B  1
2:          6.7         3.3          5.7         2.1 virginica  B  1
3:          7.4         2.8          6.1         1.9 virginica  B  1
4:          6.3         3.4          5.6         2.4 virginica  B  1
5:          5.8         2.7          5.1         1.9 virginica  B  1
6:          6.2         3.4          5.4         2.3 virginica  B  1

6 data.table高级操作

  • 使用.N来表示行的数量
> DT[.N] #在i处使用可以返回最后一行
   sepal.length sepal.width petal.length petal.width   species v1 v2
1:          5.9           3          5.1         1.8 virginica  C  2
> DT[,.N] #在j处使用可以返回最后一行的行数
[1] 150
  • .SD
    .SD是一个data.table,他包含了各个分组的数据,除了by中的变量的所有元素,且只能在j中使用
> DT[,print(.SD),by=v1]
    sepal.length sepal.width petal.length petal.width    species v2
 1:          5.1         3.5          1.4         0.2     setosa  1
 2:          4.6         3.4          1.4         0.3     setosa  1
 3:          4.8         3.0          1.4         0.1     setosa  1
 4:          5.7         3.8          1.7         0.3     setosa  1
 5:          4.8         3.4          1.9         0.2     setosa  1
 6:          4.8         3.1          1.6         0.2     setosa  1
 7:          5.5         3.5          1.3         0.2     setosa  1
 8:          4.4         3.2          1.3         0.2     setosa  1
 9:          5.3         3.7          1.5         0.2     setosa  1
10:          6.5         2.8          4.6         1.5 versicolor  1
11:          5.0         2.0          3.5         1.0 versicolor  1
12:          5.6         3.0          4.5         1.5 versicolor  1
13:          6.3         2.5          4.9         1.5 versicolor  1
14:          6.0         2.9          4.5         1.5 versicolor  1
15:          5.4         3.0          4.5         1.5 versicolor  1
16:          5.5         2.6          4.4         1.2 versicolor  1
17:          5.7         2.9          4.2         1.3 versicolor  1
18:          7.1         3.0          5.9         2.1  virginica  1
19:          6.7         2.5          5.8         1.8  virginica  1
20:          5.8         2.8          5.1         2.4  virginica  1
21:          6.9         3.2          5.7         2.3  virginica  1
22:          6.2         2.8          4.8         1.8  virginica  1
23:          6.4         2.8          5.6         2.2  virginica  1
24:          6.0         3.0          4.8         1.8  virginica  1
25:          6.7         3.3          5.7         2.5  virginica  1
26:          4.6         3.1          1.5         0.2     setosa  2
27:          4.9         3.1          1.5         0.1     setosa  2
28:          5.7         4.4          1.5         0.4     setosa  2
29:          5.1         3.7          1.5         0.4     setosa  2
30:          5.2         3.5          1.5         0.2     setosa  2
31:          5.5         4.2          1.4         0.2     setosa  2
32:          5.1         3.4          1.5         0.2     setosa  2
33:          4.8         3.0          1.4         0.3     setosa  2
34:          6.4         3.2          4.5         1.5 versicolor  2
35:          4.9         2.4          3.3         1.0 versicolor  2
36:          6.1         2.9          4.7         1.4 versicolor  2
37:          5.6         2.5          3.9         1.1 versicolor  2
38:          6.6         3.0          4.4         1.4 versicolor  2
39:          5.5         2.4          3.7         1.0 versicolor  2
40:          6.3         2.3          4.4         1.3 versicolor  2
41:          5.0         2.3          3.3         1.0 versicolor  2
42:          5.7         2.8          4.1         1.3 versicolor  2
43:          7.6         3.0          6.6         2.1  virginica  2
44:          6.4         2.7          5.3         1.9  virginica  2
45:          7.7         3.8          6.7         2.2  virginica  2
46:          6.3         2.7          4.9         1.8  virginica  2
47:          7.2         3.0          5.8         1.6  virginica  2
48:          7.7         3.0          6.1         2.3  virginica  2
49:          6.9         3.1          5.1         2.3  virginica  2
50:          6.5         3.0          5.2         2.0  virginica  2
    sepal.length sepal.width petal.length petal.width    species v2
    sepal.length sepal.width petal.length petal.width    species v2
 1:          5.0         3.6          1.4         0.2     setosa  1
 2:          5.4         3.7          1.5         0.2     setosa  1
 3:          5.4         3.9          1.3         0.4     setosa  1
 4:          4.6         3.6          1.0         0.2     setosa  1
 5:          5.2         3.4          1.4         0.2     setosa  1
 6:          4.9         3.1          1.5         0.2     setosa  1
 7:          5.0         3.5          1.3         0.3     setosa  1
 8:          5.1         3.8          1.6         0.2     setosa  1
 9:          6.9         3.1          4.9         1.5 versicolor  1
10:          6.6         2.9          4.6         1.3 versicolor  1
11:          5.6         2.9          3.6         1.3 versicolor  1
12:          5.9         3.2          4.8         1.8 versicolor  1
13:          6.8         2.8          4.8         1.4 versicolor  1
14:          5.8         2.7          3.9         1.2 versicolor  1
15:          5.6         3.0          4.1         1.3 versicolor  1
16:          5.6         2.7          4.2         1.3 versicolor  1
17:          6.3         3.3          6.0         2.5  virginica  1
18:          4.9         2.5          4.5         1.7  virginica  1
19:          6.8         3.0          5.5         2.1  virginica  1
20:          7.7         2.6          6.9         2.3  virginica  1
21:          6.7         3.3          5.7         2.1  virginica  1
22:          7.4         2.8          6.1         1.9  virginica  1
23:          6.3         3.4          5.6         2.4  virginica  1
24:          5.8         2.7          5.1         1.9  virginica  1
25:          6.2         3.4          5.4         2.3  virginica  1
26:          4.9         3.0          1.4         0.2     setosa  2
27:          5.0         3.4          1.5         0.2     setosa  2
28:          4.3         3.0          1.1         0.1     setosa  2
29:          5.1         3.8          1.5         0.3     setosa  2
30:          5.0         3.0          1.6         0.2     setosa  2
31:          5.4         3.4          1.5         0.4     setosa  2
32:          4.9         3.6          1.4         0.1     setosa  2
33:          5.0         3.5          1.6         0.6     setosa  2
34:          5.0         3.3          1.4         0.2     setosa  2
35:          5.7         2.8          4.5         1.3 versicolor  2
36:          5.9         3.0          4.2         1.5 versicolor  2
37:          5.8         2.7          4.1         1.0 versicolor  2
38:          6.1         2.8          4.7         1.2 versicolor  2
39:          5.7         2.6          3.5         1.0 versicolor  2
40:          6.0         3.4          4.5         1.6 versicolor  2
41:          6.1         3.0          4.6         1.4 versicolor  2
42:          6.2         2.9          4.3         1.3 versicolor  2
43:          6.3         2.9          5.6         1.8  virginica  2
44:          7.2         3.6          6.1         2.5  virginica  2
45:          6.4         3.2          5.3         2.3  virginica  2
46:          5.6         2.8          4.9         2.0  virginica  2
47:          6.1         3.0          4.9         1.8  virginica  2
48:          6.3         2.8          5.1         1.5  virginica  2
49:          6.9         3.1          5.4         2.1  virginica  2
50:          6.7         3.0          5.2         2.3  virginica  2
    sepal.length sepal.width petal.length petal.width    species v2
    sepal.length sepal.width petal.length petal.width    species v2
 1:          4.7         3.2          1.3         0.2     setosa  1
 2:          4.4         2.9          1.4         0.2     setosa  1
 3:          5.8         4.0          1.2         0.2     setosa  1
 4:          5.4         3.4          1.7         0.2     setosa  1
 5:          5.0         3.4          1.6         0.4     setosa  1
 6:          5.2         4.1          1.5         0.1     setosa  1
 7:          4.4         3.0          1.3         0.2     setosa  1
 8:          5.1         3.8          1.9         0.4     setosa  1
 9:          7.0         3.2          4.7         1.4 versicolor  1
10:          6.3         3.3          4.7         1.6 versicolor  1
11:          6.0         2.2          4.0         1.0 versicolor  1
12:          6.2         2.2          4.5         1.5 versicolor  1
13:          6.4         2.9          4.3         1.3 versicolor  1
14:          5.5         2.4          3.8         1.1 versicolor  1
15:          6.7         3.1          4.7         1.5 versicolor  1
16:          5.8         2.6          4.0         1.2 versicolor  1
17:          5.1         2.5          3.0         1.1 versicolor  1
18:          6.5         3.0          5.8         2.2  virginica  1
19:          6.5         3.2          5.1         2.0  virginica  1
20:          6.5         3.0          5.5         1.8  virginica  1
21:          7.7         2.8          6.7         2.0  virginica  1
22:          6.4         2.8          5.6         2.1  virginica  1
23:          6.1         2.6          5.6         1.4  virginica  1
24:          6.7         3.1          5.6         2.4  virginica  1
25:          6.3         2.5          5.0         1.9  virginica  1
26:          5.4         3.9          1.7         0.4     setosa  2
27:          4.8         3.4          1.6         0.2     setosa  2
28:          5.1         3.5          1.4         0.3     setosa  2
29:          5.1         3.3          1.7         0.5     setosa  2
30:          4.7         3.2          1.6         0.2     setosa  2
31:          5.0         3.2          1.2         0.2     setosa  2
32:          4.5         2.3          1.3         0.3     setosa  2
33:          4.6         3.2          1.4         0.2     setosa  2
34:          5.5         2.3          4.0         1.3 versicolor  2
35:          5.2         2.7          3.9         1.4 versicolor  2
36:          6.7         3.1          4.4         1.4 versicolor  2
37:          6.1         2.8          4.0         1.3 versicolor  2
38:          6.7         3.0          5.0         1.7 versicolor  2
39:          6.0         2.7          5.1         1.6 versicolor  2
40:          5.5         2.5          4.0         1.3 versicolor  2
41:          5.7         3.0          4.2         1.2 versicolor  2
42:          5.8         2.7          5.1         1.9  virginica  2
43:          7.3         2.9          6.3         1.8  virginica  2
44:          5.7         2.5          5.0         2.0  virginica  2
45:          6.0         2.2          5.0         1.5  virginica  2
46:          7.2         3.2          6.0         1.8  virginica  2
47:          7.9         3.8          6.4         2.0  virginica  2
48:          6.4         3.1          5.5         1.8  virginica  2
49:          6.8         3.2          5.9         2.3  virginica  2
50:          5.9         3.0          5.1         1.8  virginica  2
    sepal.length sepal.width petal.length petal.width    species v2
Empty data.table (0 rows) of 1 col: v1
> DT[,.SD,by=v1][]
     v1 sepal.length sepal.width petal.length petal.width   species v2
  1:  A          5.1         3.5          1.4         0.2    setosa  1
  2:  A          4.6         3.4          1.4         0.3    setosa  1
  3:  A          4.8         3.0          1.4         0.1    setosa  1
  4:  A          5.7         3.8          1.7         0.3    setosa  1
  5:  A          4.8         3.4          1.9         0.2    setosa  1
 ---                                                                  
146:  C          7.2         3.2          6.0         1.8 virginica  2
147:  C          7.9         3.8          6.4         2.0 virginica  2
148:  C          6.4         3.1          5.5         1.8 virginica  2
149:  C          6.8         3.2          5.9         2.3 virginica  2
150:  C          5.9         3.0          5.1         1.8 virginica  2
  • 返回以v1列为分组的数据的第一行和最后一行的数据
> DT[,.SD[c(1,.N)],by=v1]
   v1 sepal.length sepal.width petal.length petal.width   species v2
1:  A          5.1         3.5          1.4         0.2    setosa  1
2:  A          6.5         3.0          5.2         2.0 virginica  2
3:  B          5.0         3.6          1.4         0.2    setosa  1
4:  B          6.7         3.0          5.2         2.3 virginica  2
5:  C          4.7         3.2          1.3         0.2    setosa  1
6:  C          5.9         3.0          5.1         1.8 virginica  2
  • 返回以v1和species分组的其他数据的汇总数据
> DT[,lapply(.SD,sum),by=c("v1","species")]
   v1    species sepal.length sepal.width petal.length petal.width v2
1:  A     setosa         85.9        59.0         25.3         3.9 25
2:  A versicolor         98.1        45.5         71.4        22.0 26
3:  A  virginica        108.1        47.7         89.1        33.1 24
4:  B     setosa         85.2        58.6         24.0         4.2 26
5:  B versicolor         96.3        46.5         69.3        21.4 24
6:  B  virginica        109.6        51.4         93.3        35.5 25
7:  C     setosa         79.2        53.8         23.8         4.2 24
8:  C versicolor        102.4        46.5         72.3        22.9 25
9:  C  virginica        111.7        49.6         95.2        32.7 26
  • .SDcols
    常与.SD一起用,用于对.SD取某些列
> DT[,.SD,by=v1,.SDcols=c("species","sepal.length")]
     v1   species sepal.length
  1:  A    setosa          5.1
  2:  A    setosa          4.6
  3:  A    setosa          4.8
  4:  A    setosa          5.7
  5:  A    setosa          4.8
 ---                          
146:  C virginica          7.2
147:  C virginica          7.9
148:  C virginica          6.4
149:  C virginica          6.8
150:  C virginica          5.9
> DT[,.(species,sepal.length),by=v1] #相当于这句
     v1   species sepal.length
  1:  A    setosa          5.1
  2:  A    setosa          4.6
  3:  A    setosa          4.8
  4:  A    setosa          5.7
  5:  A    setosa          4.8
 ---                          
146:  C virginica          7.2
147:  C virginica          7.9
148:  C virginica          6.4
149:  C virginica          6.8
150:  C virginica          5.9

#也可以是一个函数的返回值:
> DT[,lapply(.SD,sum),by=v1,.SDcols=paste0("v",2)]
   v1 v2
1:  A 75
2:  B 75
3:  C 75
  • 串联操作,有点管道(%>%)操作的味道
不串联的情况
> DT2 <- copy(DT)
> DT2 <- DT2[,.(SUM=sum(sepal.length)),by=v1]
> DT2[SUM>291.5]
   v1   SUM
1:  A 292.1
2:  C 293.3
> ##串联操作
> DT2 <- copy(DT)
> DT2[,.(SUM=sum(sepal.length)),by=v1][SUM>291.5] #分组的情况下有点像SQL中的having
   v1   SUM
1:  A 292.1
2:  C 293.3

7.data.table中的melt和dcast

用法和reshape2包差不多,可以参考
利用reshape2包进行数据逆透视和数据透视

  • 参考文章
    1.R之data.table -melt/dcast(数据拆分和合并)
    2.【数据处理】data.table包 - 知乎专栏
    3.R语言data.table速查手册
    4.超高性能数据处理包data.table

你可能感兴趣的:(data.table简介)