gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)
grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE, fixed = FALSE, useBytes = FALSE, invert = FALSE)
grepl(pattern, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)
sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)
regexpr(pattern, text, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)
gregexpr(pattern, text, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)
regexec(pattern, text, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)
各常用功能介绍:
gsub():对查找到的所有内容进行替换,返回替换后的text;否则直接返回text
sub():只对查找到的第一个内容进行替换
注意:sub()是对字符串中查找到的第一个进行替换,x可以是一个向量,举例如下:
a <- c(12,123,234)
x <- c(a,a,a)
sub("2","r",a)
sub("2","r",x)
gsub("2","r",a)
gsub("2","r",x)
可以跑一下代码感受下
grep():查找,存在参数value,返回结果是匹配项的下标
grep1():查找,返回值为true
pattern:替换什么(即正则表达式:被用来检索、替换那些符合某个模式/规则的文本)
replacement:替换成什么
x:在哪里/什么里面替换
ignore.case:FALSE表示区分大小写;TRUE则忽略大小写
perl:是否使用perl兼容的正则表达式(regexps)(TRUE/FALSE)
fixed:如果为TRUE,pattern是要匹配的字符串。覆盖所有冲突的参数
useBytes:默认为false,当为true时,则是逐字节逐字节匹配而不是逐字符逐字符匹配。
sub()只对查找到的第一个内容进行替换;gsub()对查找到的所有内容进行替换
Syntax | Description |
\\d | Digit, 0,1,2 ... 9 |
\\D | Not Digit |
\\s | Space |
\\S | Not Space |
\\w | Word |
\\W | Not Word |
\\t | Tab |
\\n | New line |
^ | Beginning of the string |
$ | End of the string |
\ | Escape special characters, e.g. \\ is "\", \+ is "+" |
| | Alternation match. e.g. /(e|d)n/ matches "en" and "dn" |
• | Any character, except \n or line terminator |
[ab] | a or b |
[^ab] | Any character except a and b |
[0-9] | All Digit |
[A-Z] | All uppercase A to Z letters |
[a-z] | All lowercase a to z letters |
[A-z] | All Uppercase and lowercase a to z letters |
i+ | i at least one time |
i* | i zero or more times |
i? | i zero or 1 time |
i{n} | i occurs n times in sequence |
i{n1,n2} | i occurs n1 - n2 times in sequence |
i{n1,n2}? | non greedy match, see above example |
i{n,} | i occures >= n times |
[:alnum:] | Alphanumeric characters: [:alpha:] and [:digit:] |
[:alpha:] | Alphabetic characters: [:lower:] and [:upper:] |
[:blank:] | Blank characters: e.g. space, tab |
[:cntrl:] | Control characters |
[:digit:] | Digits: 0 1 2 3 4 5 6 7 8 9 |
[:graph:] | Graphical characters: [:alnum:] and [:punct:] |
[:lower:] | Lower-case letters in the current locale |
[:print:] | Printable characters: [:alnum:], [:punct:] and space |
[:punct:] | Punctuation character: ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~ |
[:space:] | Space characters: tab, newline, vertical tab, form feed, carriage return, space |
[:upper:] | Upper-case letters in the current locale |
[:xdigit:] | Hexadecimal digits: 0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f |
可参考http://www.endmemo.com/program/R/gsub.php
https://www.cnblogs.com/wheng/p/6262737.html