以下是常见的字符处理函数
nchar(x) | 计算字符串中字符的数量,空格也算 |
substr(x,start,stop) | 提取或替换一个字符向量中的子串 |
grep(pattern,x,ignore.case=F,fixed=F) | 在x中搜索某种模式。如果fixed=F,则pattern为一个正则表达式,否则,pattern事宜个文本字符串,返回值为匹配的下标 |
sub(pattern,replacement,x,ignore.case=F,fixed=F) | 在x中查找某种格式,并且用replacement替换,其余与以上一样 |
strsplit(x,split,fixed=F) | split处分割字符向量x中的元素。若fixed=F,则pattern处是一个正则表达式 |
paste(...,sep=' ') | 连接字符串,分隔符为sep |
toupper(x) | 大写转换 |
tolower(x) | 小写转换 |
> nchar('hello world')#用于统计字符串的长度,空格也算在里面
[1] 11
这是length()和nchar()函数之间的区别
length()返回的是对象中元素的个数,而nchar()返回的是对象中字符串的字符个数
> nchar(month.name)
[1] 7 8 5 5 3 4 4 6 9 7 8 8
> length(month.name)
[1] 12
当nchar()对一个数字型向量进行统计的时候,会自动将数值型向量识别为字符型向量
> nchar(c(1,34,123))
[1] 1 2 3
paste()是对字符串进行拼接的函数,其默认分隔符是空格
> paste('everybody','loves','stats')
[1] "everybody loves stats"
> paste('everybody','loves','stats',sep = '-')
[1] "everybody-loves-stats"
>
当某个对象有多个元素时
> paste(names,'loves','stats')
[1] "lily loves stats" "lucy loves stats" "mary loves stats"
则分别处理
substr()用于字符提取
> substr(month.name,start = 1,stop = 3)
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
下面对所提取的字符进行大小写处理
> x<-substr(month.name,start = 1,stop = 3)
> toupper(x)
[1] "JAN" "FEB" "MAR" "APR" "MAY" "JUN" "JUL" "AUG" "SEP" "OCT" "NOV" "DEC"
> tolower(x)
[1] "jan" "feb" "mar" "apr" "may" "jun" "jul" "aug" "sep" "oct" "nov" "dec"
如果我们想只让首字母是大写或者小写呢?
这需要用到正则表达式,这里先不赘述了
用match()函数进行字符串的匹配,和grep()类似,但是不支持正则表达式
> x
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul"
[8] "Aug" "Sep" "Oct" "Nov" "Dec"
> match('Jan',x)
[1] 1
strsplit()用于字符串的分割
> y<-'i/hate/you'
> split(y,'/')
$`/`
[1] "i/hate/you"
> strsplit(y,'/')
[[1]]
[1] "i" "hate" "you"
下面介绍一个,用于生成字符串成对组合的技巧
> face<-1:13
> suit<-c('apades','clubs','hearts','diamonds')
> outer(face,suit,FUN=paste)
[,1] [,2] [,3] [,4]
[1,] "1 apades" "1 clubs" "1 hearts" "1 diamonds"
[2,] "2 apades" "2 clubs" "2 hearts" "2 diamonds"
[3,] "3 apades" "3 clubs" "3 hearts" "3 diamonds"
[4,] "4 apades" "4 clubs" "4 hearts" "4 diamonds"
[5,] "5 apades" "5 clubs" "5 hearts" "5 diamonds"
[6,] "6 apades" "6 clubs" "6 hearts" "6 diamonds"
[7,] "7 apades" "7 clubs" "7 hearts" "7 diamonds"
[8,] "8 apades" "8 clubs" "8 hearts" "8 diamonds"
[9,] "9 apades" "9 clubs" "9 hearts" "9 diamonds"
[10,] "10 apades" "10 clubs" "10 hearts" "10 diamonds"
[11,] "11 apades" "11 clubs" "11 hearts" "11 diamonds"
[12,] "12 apades" "12 clubs" "12 hearts" "12 diamonds"
[13,] "13 apades" "13 clubs" "13 hearts" "13 diamonds"
> outer(suit,face,FUN=paste)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] "apades 1" "apades 2" "apades 3" "apades 4" "apades 5" "apades 6" "apades 7" "apades 8"
[2,] "clubs 1" "clubs 2" "clubs 3" "clubs 4" "clubs 5" "clubs 6" "clubs 7" "clubs 8"
[3,] "hearts 1" "hearts 2" "hearts 3" "hearts 4" "hearts 5" "hearts 6" "hearts 7" "hearts 8"
[4,] "diamonds 1" "diamonds 2" "diamonds 3" "diamonds 4" "diamonds 5" "diamonds 6" "diamonds 7" "diamonds 8"
[,9] [,10] [,11] [,12] [,13]
[1,] "apades 9" "apades 10" "apades 11" "apades 12" "apades 13"
[2,] "clubs 9" "clubs 10" "clubs 11" "clubs 12" "clubs 13"
[3,] "hearts 9" "hearts 10" "hearts 11" "hearts 12" "hearts 13"
[4,] "diamonds 9" "diamonds 10" "diamonds 11" "diamonds 12" "diamonds 13"
>