man rexgex

1 函数集名字

regcomp, regexec, regerror, regfree  -- POSIX regex functions


简介

#include <sys/types.h>

#include <regex.h>

 

int regcomp(regex_t *preg, const char *regex, int cflags);

int regexec(const regex_t *preg, const char *string, size_t nmatch,

            regmatch_t pmatch[], int eflags);

size_t regerror(int errcode, const regex_t *preg, char *errbuf,

                size_t errbuf_size);

void regfree(regex_t *preg);

2 描述

(1) POSIX正则表达式编译

POSIX(Portable Operating System Interface,可移植操作系统接口)。


regcomp() 是用来将某个正则表达式编译成适合regexec()查找的格式。


regcomp() 的preg参数是一个指向保存模式(正则表达式)的缓冲区的指针;regex是一个指向一个以空’\0’结束的字符串的指针;cflags是用来决定编译类型的标记。


所有的正则表达式搜索必须通过一个经编译的模式缓冲区才能完成,所以regexec()必须需要regcom()提供的经初始化的模式缓冲区。


cflags由以下一个或者多个选项按位或组成:

REG_EXTENDED

POSIX正则表达式扩展规范来翻译正则表达式regex。如果没有设置REG_EXTENDED标志,则只支持POSIX正则表达式基本规范


REG_ICASE

不区分大小写。使用regexec()使用模式缓冲区进行搜索的时候是大小写敏感的。使用这个标志对大小写将不再敏感。


REG_NOSUB

不支持匹配子字符串地址。如果模式缓冲区被这个标志编译则将忽视nmathpmatch两个参数。


REG_NEWLINE

匹配满足正则表达式的任何字符符但不匹配换行符。


在一个[^…]序列中,虽然不包含换行符,但也不匹配换行符。


匹配行首的字符类(^)在换行符后匹配一个空字符串,忽视regexec()的执行标记eflags参数是否包含了REG_NOTBOL标记。


匹配行末字符类($)在换行符前匹配一个空字符串,忽视eflags参数是否包含了REG_NOTEOL


(2) POSIX 正则表达式匹配

regexec()根据之前编译的模式缓冲区preg匹配一个以空’\0’结束的字符串。nmatchpmath用来提供匹配位置的信息。eflagsREG_NOTBOLREG_NOTEOL之一或者两者的或运算,用来改变匹配行为。具体描述如下:

REG_NOTBOL

匹配行首字符经常会匹配失败(但设置上面的编译标志REG_NEWLINE后例外),当有不同部分字符串传递给regexec()并且字符串开头不被翻译作为一行开头时这个标志将会被用到。


REG_NOTEOL

匹配行末的字符经常会匹配失败(但设置上面的编译标志REG_NEWLINE后例外)。


(3) 字节偏移量

除非REG_NOSUB标志被用来编译模式缓冲区,否则是能获取到匹配到的子字符串的地址信息的。pmatch的维数大小至少要能容纳nmatch个元素。pmatch元素被regexec()用匹配到的子字符串地址信息填充。没有填充的结构体元素将包含值-1。

regmatch_t结构体是pmatch参数的类型,在<regex.h>头文件中定义:

typedef struct {

         regoff_t   rm_so;

         regoff_t   rm_eo;

}regmatch_t;

如果rm_so元素的值不为-1其值就代表在整个字符串中所匹配到的最大子字符串在整个字符串中的开始偏移量,rm_eo是所匹配到的子字符串在整个字符串中的结束偏移量,是匹配文本之后第一个字符的偏移量。


(4) Posix 错误报告

regerror()用来转换regcomp()和regexec()函数返回的错误码成字符串。


errcode是传给regerror()函数的错误码,preg是模式缓冲区,errbuf是指向字符串缓冲区的指针,errbuf_size是字符串缓冲区的大小。此函数返回errbuf所包含的以空结尾的错误信息字符串的长度。如果errbuferrbuf_size都非0,那么errbuf的内容是errbuf_size – 1个字符,最后以’\0’结束,代表错误信息。


(5) POSIX模式缓冲区释放

传给regfree()函数经regcom()函数编译过的模式缓冲区的地址并调用regfree()函数,那么曾分配给regcomp()函数的缓冲区内存将会被释放。


3 返回值

regcomp()编译成功则返回0,编译失败则返回一段错误码。

regexec()匹配成功则返回0,匹配失败则返回REG_NOMATCH


4 错误

regcomp()可能会返回以下错误:

REG_BADBR

后面的字符无效。


REG_BADPAT

模式字符类无效,如组或者序列。


REG_BADRPT

重复字符类无效,如将’*’作为第一个字符。


REG_EBRACE

不可匹配的算子空间。


REG_EBRACK

不可匹配的序列字符类。


REG_ECOLLATE

无法校对的元素。


REG_ECTYPE

未知的字符类名。


REG_EEND

不是特别的错误。在POSIX.2中无此项定义。


REG_EESCAPE

末尾有反斜杠。


REG_EPAREN

括号组不对应。


REG_ERANGE

非法使用范围字符,如:范围的结束点优先发生在起点。


REG_ESIZE

编译正则表达式要求超过了64KB。此项在POSIX.2中无定义。


REG_ESPACE

正则表达式代码越界。


REG_ESUBREG

后面的子表达式无效。


符合的标准 POSIX.1-2001


5 man regex原文

翻译差得有些过分,不断检讨英文水平。将英文也粘贴到这里。man regex

REGEX(3)                                                              Linux Programmer's Manual                                                              REGEX(3)






NAME
       regcomp, regexec, regerror, regfree - POSIX regex functions


SYNOPSIS
       #include <sys/types.h>
       #include <regex.h>


       int regcomp(regex_t *preg, const char *regex, int cflags);


       int regexec(const regex_t *preg, const char *string, size_t nmatch,
                   regmatch_t pmatch[], int eflags);


       size_t regerror(int errcode, const regex_t *preg, char *errbuf,
                       size_t errbuf_size);


       void regfree(regex_t *preg);


DESCRIPTION
   POSIX Regex Compiling
       regcomp() is used to compile a regular expression into a form that is suitable for subsequent regexec() searches.


       regcomp()  is  supplied with preg, a pointer to a pattern buffer storage area; regex, a pointer to the null-terminated string and cflags, flags used to deter‐
       mine the type of compilation.


       All regular expression searching must be done via a compiled pattern buffer, thus regexec() must always be supplied with the address of a  regcomp()  initial‐
       ized pattern buffer.


       cflags may be the bitwise-or of one or more of the following:


       REG_EXTENDED
              Use POSIX Extended Regular Expression syntax when interpreting regex.  If not set, POSIX Basic Regular Expression syntax is used.


       REG_ICASE
              Do not differentiate case.  Subsequent regexec() searches using this pattern buffer will be case insensitive.


       REG_NOSUB
              Support  for  substring addressing of matches is not required.  The nmatch and pmatch arguments to regexec() are ignored if the pattern buffer supplied
              was compiled with this flag set.


       REG_NEWLINE
              Match-any-character operators don't match a newline.


              A nonmatching list ([^...])  not containing a newline does not match a newline.


              Match-beginning-of-line operator (^) matches the empty string immediately after a newline,  regardless  of  whether  eflags,  the  execution  flags  of
              regexec(), contains REG_NOTBOL.


              Match-end-of-line operator ($) matches the empty string immediately before a newline, regardless of whether eflags contains REG_NOTEOL.


   POSIX Regex Matching
       regexec() is used to match a null-terminated string against the precompiled pattern buffer, preg.  nmatch and pmatch are used to provide information regarding
       the location of any matches.  eflags may be the bitwise-or of one or both of REG_NOTBOL and REG_NOTEOL which cause  changes  in  matching  behavior  described
       below.


       REG_NOTBOL
              The  match-beginning-of-line  operator always fails to match (but see the compilation flag REG_NEWLINE above) This flag may be used when different por‐
              tions of a string are passed to regexec() and the beginning of the string should not be interpreted as the beginning of the line.


       REG_NOTEOL
              The match-end-of-line operator always fails to match (but see the compilation flag REG_NEWLINE above)


   Byte Offsets
       Unless REG_NOSUB was set for the compilation of the pattern buffer, it is possible to obtain substring match addressing information.  pmatch  must  be  dimen‐
       sioned  to  have  at  least nmatch elements.  These are filled in by regexec() with substring match addresses.  Any unused structure elements will contain the
       value -1.


       The regmatch_t structure which is the type of pmatch is defined in <regex.h>.


           typedef struct {
               regoff_t rm_so;
               regoff_t rm_eo;
           } regmatch_t;


       Each rm_so element that is not -1 indicates the start offset of the next largest substring match within the string.  The relative rm_eo element indicates  the
       end offset of the match, which is the offset of the first character after the matching text.


   Posix Error Reporting
       regerror() is used to turn the error codes that can be returned by both regcomp() and regexec() into error message strings.


       regerror()  is  passed  the  error code, errcode, the pattern buffer, preg, a pointer to a character string buffer, errbuf, and the size of the string buffer,
       errbuf_size.  It returns the size of the errbuf required to contain the null-terminated error message string.  If both errbuf  and  errbuf_size  are  nonzero,
       errbuf is filled in with the first errbuf_size - 1 characters of the error message and a terminating null byte ('\0').


   POSIX Pattern Buffer Freeing
       Supplying regfree() with a precompiled pattern buffer, preg will free the memory allocated to the pattern buffer by the compiling process, regcomp().


RETURN VALUE
       regcomp() returns zero for a successful compilation or an error code for failure.


       regexec() returns zero for a successful match or REG_NOMATCH for failure.


ERRORS
       The following errors can be returned by regcomp():


       REG_BADBR
              Invalid use of back reference operator.


       REG_BADPAT
              Invalid use of pattern operators such as group or list.


       REG_BADRPT
              Invalid use of repetition operators such as using '*' as the first character.


       REG_EBRACE
              Un-matched brace interval operators.


       REG_EBRACK
              Un-matched bracket list operators.


       REG_ECOLLATE
              Invalid collating element.


       REG_ECTYPE
              Unknown character class name.


       REG_EEND
              Non specific error.  This is not defined by POSIX.2.


       REG_EESCAPE
              Trailing backslash.


       REG_EPAREN
              Un-matched parenthesis group operators.


       REG_ERANGE
              Invalid use of the range operator, e.g., the ending point of the range occurs prior to the starting point.


       REG_ESIZE
              Compiled regular expression requires a pattern buffer larger than 64Kb.  This is not defined by POSIX.2.


       REG_ESPACE
              The regex routines ran out of memory.


       REG_ESUBREG
              Invalid back reference to a subexpression.


CONFORMING TO
       POSIX.1-2001.


SEE ALSO
       grep(1), regex(7)
       The glibc manual section, Regular Expression Matching


COLOPHON
       This  page  is  part  of  release  3.44  of  the Linux man-pages project.  A description of the project, and information about reporting bugs, can be found at
       http://www.kernel.org/doc/man-pages/.






GNU                                                                           2012-06-11                                                                     REGEX(3)


[2014.8.12 - 9.40 ~14.53]
TNote Over.

你可能感兴趣的:(man rexgex)