用warning信息为例,刷一波字符串技巧

其实本来是备课转录组下游分析,安装一个包,出现了一摞error,最后是一摞整齐的warning。

用warning信息为例,刷一波字符串技巧_第1张图片

non-zero exit status是很常见的一个问题,解决办法是到安装路径,安装路径下每个包是一个文件夹,把报错里的包对应的文件夹找出来删掉,重新运行安装代码即可。
之前写过一篇讲这个问题:
mac让你找不到路径

氮素,我最近复盘了我的R语言基础讲义,tidyverse里的stringr包是缺少练习题的,刚好今天这个可以拿来用了。

题目

很简单,从上面截图的warning信息里提取出所有包的名字。

做法

step1:赋值

我们把这段直接复制过来的warning作为一个长字符串,没有问题。对于字符串来说,单双引号没有特别本质的区别,但是当出现嵌套的时候就不行咯,这是第一个坑。

如果用双引号会发现:
一片报错哗啦啦,配色还有点不对劲。


用warning信息为例,刷一波字符串技巧_第2张图片

这是因为这个长字符串内部含有双引号。好在没有单引号,所以我们可以给整个字符串两端加单引号,就可以成功赋值。

x='3: In install.packages(update[instlib == l, "Package"], l, repos = repos,  :
  installation of package ‘BiocParallel’ had non-zero exit status
4: In install.packages(update[instlib == l, "Package"], l, repos = repos,  :
  installation of package ‘feather’ had non-zero exit status
5: In install.packages(update[instlib == l, "Package"], l, repos = repos,  :
  installation of package ‘geometry’ had non-zero exit status
6: In install.packages(update[instlib == l, "Package"], l, repos = repos,  :
  installation of package ‘ggraph’ had non-zero exit status
7: In install.packages(update[instlib == l, "Package"], l, repos = repos,  :
  installation of package ‘RcppArmadillo’ had non-zero exit status
8: In install.packages(update[instlib == l, "Package"], l, repos = repos,  :
  installation of package ‘vegan’ had non-zero exit status
9: In install.packages(update[instlib == l, "Package"], l, repos = repos,  :
  installation of package ‘zip’ had non-zero exit status'

step2:长字符串分割

用空格将整个长字符串分割为单词。

if(!require(stringr))install.packages(stringr)
library(stringr)
str_split(x," ") %>% 
  unlist() %>% 
  head(50)
# [1] "3:"                              "In"                             
# [3] "install.packages(update[instlib" "=="                             
# [5] "l,"                              "\"Package\"],"                  
# [7] "l,"                              "repos"                          
# [9] "="                               "repos,"                         
# [11] ""                                ":\n"                            
# [13] ""                                "installation"                   
# [15] "of"                              "package"                        
# [17] "‘BiocParallel’"                  "had"                            
# [19] "non-zero"                        "exit"                           
# [21] "status\n4:"                      "In"                             
# [23] "install.packages(update[instlib" "=="                             
# [25] "l,"                              "\"Package\"],"                  
# [27] "l,"                              "repos"                          
# [29] "="                               "repos,"                         
# [31] ""                                ":\n"                            
# [33] ""                                "installation"                   
# [35] "of"                              "package"                        
# [37] "‘feather’"                       "had"                            
# [39] "non-zero"                        "exit"                           
# [41] "status\n5:"                      "In"                             
# [43] "install.packages(update[instlib" "=="                             
# [45] "l,"                              "\"Package\"],"                  
# [47] "l,"                              "repos"                          
# [49] "="                               "repos," 

step3:提取包名

所有的包名有个共同点,被两个中文单引号包围,匹配模式可以写成^‘,即以单引号开头。

str_split(x," ") %>% 
  unlist() %>% 
  str_subset("^‘")
#[1] "‘BiocParallel’"  "‘feather’"       "‘geometry’"      "‘ggraph’"       
#[5] "‘RcppArmadillo’" "‘vegan’"         "‘zip’"     

强迫症还需要做一件事,就是去掉单引号。

step4:去掉单引号

用到的函数是str_replace_all,因为str_replace默认只替换匹配到的第一个字符。匹配模式[’‘]表示前后单引号都可以,替换为空字符串""就是删除咯。

str_split(x," ") %>% 
  unlist() %>% 
  str_subset("^‘") %>% 
  str_replace_all("[’‘]","")

二路解法

y = x %>%
  str_replace_all("’","‘") %>% 
  str_split("‘") %>% 
  unlist()
y[str_length(y)<20]  
#[1] "BiocParallel"  "feather"       "geometry"      "ggraph"        "RcppArmadillo"
#[6] "vegan"         "zip"   

分隔符只能有一个,如果事实有两个,那就替换一下。

有没有解法三?我正则表达式学艺不精,不会表示"两个引号中间的字符",但我jio的肯定可以实现,如果你刚好会,不妨告诉我啊。
不定期的放一下我的微信二维码。


用warning信息为例,刷一波字符串技巧_第3张图片

你可能感兴趣的:(用warning信息为例,刷一波字符串技巧)