R字符串操作(一)

经常记不住R的字符串操作函数，统一整理加强记忆一下。
本文统一例子

eXample <- c("colooor_example", "red", "yellow", "Blue")

统计字符数目

nchar(x, type = "chars", allowNA = FALSE)
nzchar(x)

函数 nchar 统计字符数目，nzchar 判断字符是否为空。两函数都支持向量化操作。

> nchar(eXample)
[1] 15  3  6  4
> nzchar(eXample)
[1] TRUE TRUE TRUE TRUE

字符串拼接

paste (…, sep = " ", collapse = NULL)
paste0(…, collapse = NULL)

函数 paste 指定连接符将多个字符串拼接成1个。paste0 相当于指定 paste 的连接符为空字符串[""]。paste 可以接受多个输入，sep 参数指定多个输入间连接符，collapse 参数指定每个输入(如向量)内连接符。paste 函数会先调用 as.character 函数处理输入，所以输入向量不是字符串也是可以的。

> paste(eXample, collapse = '+')
[1] "colooor_example+red+yellow+Blue"
# 多个输入
> paste(eXample, "Green", collapse = '+', sep = "...")
[1] "colooor_example...Green+red...Green+yellow...Green+Blue...Green"

字符串拆分

strsplit(x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE)

函数 strsplit 在给定字符处将字符串拆分，是 paste 的逆操作。指定的拆分字符默认是正则匹配，如果不想正则匹配，需要精确匹配使用 fixed = TRUE . 注意函数返回的是列表，想要得到向量的使用 unlist 函数转换一下。

> strsplit(eXample[1], split = "_")
[[1]]
[1] "colooor" "example"

# unlist转换成向量
> unlist(strsplit(eXample[1], split = "_"))
[1] "colooor" "example"

> strsplit(eXample, split = "e")
[[1]]
[1] "colooor_" "xampl"   

[[2]]
[1] "r" "d"

[[3]]
[1] "y"    "llow"

[[4]]
[1] "Blu"

按位置截取字符串

substr(x, start, stop)
substring(text, first, last = 1000000L)

函数 substr 如果 start 大于字符串长度，会返回空字符串[""]。如果输入不是字符串，函数会先尝试调用 as.character 处理。

> substr(eXample[1], 2, 4)
[1] "olo"
> substr(eXample[1], 20, 40)
[1] ""
> substr(eXample[1], 2:4, 4:6)
[1] "olo"
> substring(eXample[1], 2:4, 4:6)
[1] "olo" "loo" "ooo"

字符串替换

sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)
gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)

两函数区别在 sub 只替换第一次匹配内容，gsub 替换所有匹配。 fixed 参数控制是否正则匹配。

> sub("o", "A", eXample[1], fixed = TRUE)
[1] "cAlooor_example"
# 全部替换
> gsub("o", "A", eXample[1], fixed = TRUE)
# 正则匹配
> gsub("[lo]", "A", eXample[1], fixed = FALSE)
[1] "cAAAAAr_exampAe"

查询字符位置

grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE, fixed = FALSE, useBytes = FALSE, invert = FALSE)
grepl(pattern, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)
regexpr(pattern, text, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)
gregexpr(pattern, text, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE

函数 grep 返回匹配字符串位置(向量索引值)， grepl 返回是否找到字符串逻辑值。

> grep("e", eXample, fixed = TRUE)
[1] 1 2 3 4
> grep("ooo", eXample, fixed = TRUE)
[1] 1
> grepl("ooo", eXample, fixed = TRUE)
[1]  TRUE FALSE FALSE FALSE

函数 regexpr 返回列表，统计了第一个匹配字符串位置和匹配长度等信息。gregexpr 则返回所有匹配字符串信息。 -1 表示没有任何匹配结果。

> regexpr("o", eXample, fixed = TRUE)
[1]  2 -1  5 -1
attr(,"match.length")
[1]  1 -1  1 -1
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE
> gregexpr("o", eXample, fixed = TRUE)
[[1]]
[1] 2 4 5 6
attr(,"match.length")
[1] 1 1 1 1
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE

[[2]]
[1] -1
attr(,"match.length")
[1] -1
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE

[[3]]
[1] 5
attr(,"match.length")
[1] 1
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE

[[4]]
[1] -1
attr(,"match.length")
[1] -1
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE

大小写转换

tolower(x)
toupper(x)
casefold(x, upper = FALSE)

函数 tolower 转换到小写字母，toupper 转换到大写字母，casefold 是对2者的包装。

> toupper(eXample)
[1] "COLOOOR_EXAMPLE" "RED"             "YELLOW"          "BLUE"

本文姊妹篇
R字符串操作(二)

欢迎关注我的微信公众号 Hello BioInfo

R字符串操作(一)

你可能感兴趣的:(R字符串操作(一))