47.关于Filtering joins类函数

【上一篇:46.关于Mutating joins类函数和merge函数】
【下一篇:48.关于Set Operations-集合操作函数】

    上篇讲了Mutating joins类的四个函数和base R中的merge函数,本篇讲Filtering joins。
    Filtering joins类函数包括semi_join()、anti_join(),返回的结果是过滤后的observations。
    Set Operations类函数(集合操作类函数)包括intersect()、union()、setdiff(),用来比较两个数据框,返回两个数据框行的交集、并集和差集。


    Filtering joins类的两个函数的Usage比Mutating joins类的四个函数简单多了:

semi_join(x, y, by = NULL, copy = FALSE, ...)
anti_join(x, y, by = NULL, copy = FALSE, ...)

    Filtering joins类函数根据y中是否存在匹配项,筛选连接x中的筛选行。semi_join返回x中与y匹配的所有行;anti_join返回x中没有与y匹配的所有行,也就是与semo_join的作用正好是相反的。by参数的用法和Mutating join类函数中的是一致的。
    总结:返回结果是x的一个子集,所有x的属性都保持不变;行的顺序尽可能与x中的顺序保持一致,列不会被编辑。
    这两个函数很简单,举例说明如下:

nt1<-tibble(
  name = c("ZhangS","LiS","WangW","ZhaoL","QianQ","SunB","LiJ","ZhouS"),
  score = c("90","89","95","93","88","96","92","85"),
  sex = c("male","male","female","female","male","female","female","female"),
  address = c("Beijing","Tianjin","Guangzhou","Shanghai","Shenzen","Wuhan","Liuyang","Fuzhou")
)

nt2<-tibble(
  name = c("ZhangS","ZhaoL","QianQ","ZhouS"),
  hobbies = c("baseball","basketball","tennis","swim")
)

> semi_join(nt1,nt2)
Joining, by = "name"
# A tibble: 4 x 4
name   score sex    address 
        
  1 ZhangS 90    male   Beijing 
2 ZhaoL  93    female Shanghai
3 QianQ  88    male   Shenzen 
4 ZhouS  85    female Fuzhou  
> anti_join(nt1,nt2)
Joining, by = "name"
# A tibble: 4 x 4
name  score sex    address  
        
  1 LiS   89    male   Tianjin  
2 WangW 95    female Guangzhou
3 SunB  96    female Wuhan    
4 LiJ   92    female Liuyang 

    当然也有重复值的问题,但简单很多:如果nt1中有重复值,无论nt2中有没有重复、重复几次,都不影响nt1输出所有重复行。

【上一篇:46.关于Mutating joins类函数和merge函数】
【下一篇:48.关于Set Operations-集合操作函数】

你可能感兴趣的:(47.关于Filtering joins类函数)