This is cre2.info, produced by makeinfo version 6.3 from cre2.texi.
This document describes version 0.3.4 of CRE2, a C language wrapper for
the C++ library RE2: a fast, safe, thread-friendly alternative to
backtracking regular expression engines like those used in PCRE, Perl,
and Python.
The latest release can be downloaded from:
development takes place at:
and as backup at:
Copyright (C) 2012, 2016 by Marco Maggi http://github.com/marcomaggi
Copyright (C) 2011 by Keegan McAllister http://github.com/kmcallister/
Portions of this document come from the source code of RE2 itself,
see the file ‘LICENSE.re2’ for the license notice.
Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License,
Version 1.3 or any later version published by the Free Software
Foundation; with Invariant Sections being "GNU Free Documentation
License" and "GNU General Public License", no Front-Cover Texts,
and no Back-Cover Texts. A copy of the license is included in the
section entitled "GNU Free Documentation License".
INFO-DIR-SECTION Development
START-INFO-DIR-ENTRY
* cre2: (cre2). C wrapper for RE2.
END-INFO-DIR-ENTRY
File: cre2.info, Node: Top, Next: overview, Up: (dir)
C wrapper for RE2
This document describes version 0.3.4 of CRE2, a C language wrapper for
the C++ library RE2: a fast, safe, thread-friendly alternative to
backtracking regular expression engines like those used in PCRE, Perl,
and Python.
The latest release can be downloaded from:
development takes place at:
and as backup at:
Copyright (C) 2012, 2016 by Marco Maggi http://github.com/marcomaggi
Copyright (C) 2011 by Keegan McAllister http://github.com/kmcallister/
Portions of this document come from the source code of RE2 itself,
see the file ‘LICENSE.re2’ for the license notice.
Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License,
Version 1.3 or any later version published by the Free Software
Foundation; with Invariant Sections being "GNU Free Documentation
License" and "GNU General Public License", no Front-Cover Texts,
and no Back-Cover Texts. A copy of the license is included in the
section entitled "GNU Free Documentation License".
Menu:
overview:: Overview of the package.
Appendices
Indexes
File: cre2.info, Node: overview, Next: version, Prev: Top, Up: Top
1 Overview of the package
CRE2 is a C language wrapper for the C++ library RE2: a fast, safe,
thread-friendly alternative to backtracking regular expression engines
like those used in PCRE, Perl, and Python. CRE2 is based on code by
Keegan McAllister for the ‘haskell-re2’ binding:
For the supported regular expressions syntax we should refer to the
original documentation:
The C wrapper is meant to make it easier to interface RE2 with other
languages. The exposed API allows searching for substrings of text
matching regular expressions and reporting portions of text matching
parenthetical subexpressions.
CRE2 installs the single header file ‘cre2.h’. All the function
names in the API are prefixed with ‘cre2_’; all the constant names are
prefixed with ‘CRE2_’; all the type names are prefixed with ‘cre2_’ and
suffixed with ‘_t’.
When searching for the installed libraries with the GNU Autotools, we
can use the following macros in ‘configure.ac’:
AC_CHECK_LIB([re2],[main],,
[AC_MSG_FAILURE([test for RE2 library failed])])
AC_CHECK_LIB([cre2],[cre2_version_string],,
[AC_MSG_FAILURE([test for CRE2 library failed])])
AC_CHECK_HEADERS([cre2.h],,
[AC_MSG_ERROR([test for CRE2 header failed])])
notice that there is no need to check for the header file ‘re2/re2.h’.
It is customary for regular expression engines to provide methods to
replace backslash sequences like ‘\1’, ‘\2’, … in a given string with
portions of text that matched the first, second, … parenthetical
subexpression; CRE2 does not provide such methods in its public API,
because they require interacting with the storage mechanism in the
client code. However, it is not difficult to implement such
substitutions given the results of a regular expression matching
operation.
Some functions and methods from RE2 requiring memory allocation
handling are unofficially wrapped by CRE2 with unsafe code (execution
will succeed when no memory allocation errors happen). These
“problematic” functions are documented in the header file ‘cre2.h’ and,
at present, are not considered part of the public API of CRE2.
It is sometimes useful to try a program in the original C++ to verify
if a problem is caused by CRE2 or is in the original RE2 code; we may
want to start by customising this program:
/* compile and run with:
$ g++ -Wall -o proof proof.cpp -lre2 && ./proof
*/
#include
#include
static void try_match (RE2::Options& opt, const char * text);
int
main (int argc, const char *const argv[])
{
RE2::Options opt;
opt.set_never_nl(true);
try_match(opt, "abcdef");
return 0;
}
void
try_match (RE2::Options& opt, const char * text)
{
RE2 re("abcdef", opt);
assert(re.ok());
assert(RE2::FullMatch(text, re));
//assert(RE2::PartialMatch(text, re));
}
File: cre2.info, Node: version, Next: regexps, Prev: overview, Up: Top
2 Version functions
The installed libraries follow version numbering as established by the
GNU Autotools. For an explanation of interface numbers as managed by
GNU Libtool *Note interface: (libtool)Libtool versioning.
– Function: const char * cre2_version_string (void)
Return a pointer to a statically allocated ASCIIZ string
representing the interface version number.
– Function: int cre2_version_interface_current (void)
Return an integer representing the library interface current
number.
– Function: int cre2_version_interface_revision (void)
Return an integer representing the library interface current
revision number.
– Function: int cre2_version_interface_age (void)
Return an integer representing the library interface current age.
File: cre2.info, Node: regexps, Next: options, Prev: version, Up: Top
3 Precompiled regular expressions construction
Regular expression objects are built and finalised as follows:
cre2_regexp_t * rex;
cre2_options_t * opt;
opt = cre2_opt_new();
if (opt) {
cre2_opt_set_log_errors(opt, 0);
rex = cre2_new("ciao", 4, opt);
if (rex) {
if (!cre2_error_code(rex))
/* successfully built */
else
/* an error occurred while compiling rex */
cre2_delete(rex);
} else {
/* rex memory allocation error */
}
cre2_opt_delete(opt);
} else {
/* opt memory allocation error */
}
– Opaque Type: cre2_regexp_t
Opaque type for regular expression objects; it is meant to be used
to declare pointers to objects. Instances of this type can be used
for any number of matching operations and are safe for concurrent
use by multiple threads.
– Struct Typedef: cre2_string_t
Simple data structure used to reference a portion of another
string. It has the following fields:
'const char * data'
Pointer to the first byte in the referenced substring.
'int length'
The number of bytes in the referenced substring.
– Enumeration Typedef: cre2_error_code_t
Enumeration type for error codes returned by ‘cre2_error_code()’.
It contains the following symbols:
'CRE2_NO_ERROR'
Defined as '0', represents a successful operation.
'CRE2_ERROR_INTERNAL'
Unexpected error.
'CRE2_ERROR_BAD_ESCAPE'
Bad escape sequence.
'CRE2_ERROR_BAD_CHAR_CLASS'
Bad character class.
'CRE2_ERROR_BAD_CHAR_RANGE'
Bad character class range.
'CRE2_ERROR_MISSING_BRACKET'
Missing closing ']'.
'CRE2_ERROR_MISSING_PAREN'
Missing closing ')'.
'CRE2_ERROR_TRAILING_BACKSLASH'
Trailing '\' at end of regexp.
'CRE2_ERROR_REPEAT_ARGUMENT'
Repeat argument missing, e.g. '*'.
'CRE2_ERROR_REPEAT_SIZE'
Bad repetition argument.
'CRE2_ERROR_REPEAT_OP'
Bad repetition operator.
'CRE2_ERROR_BAD_PERL_OP'
Bad Perl operator.
'CRE2_ERROR_BAD_UTF8'
Invalid UTF-8 in regexp.
'CRE2_ERROR_BAD_NAMED_CAPTURE'
Bad named capture group.
'CRE2_ERROR_PATTERN_TOO_LARGE'
Pattern too large (compile failed).
– Function: cre2_regexp_t * cre2_new (const char * PATTERN, int
PATTERN_LEN, const cre2_options_t * OPT)
Build and return a new regular expression object representing the
PATTERN of length PATTERN_LEN bytes; the object is configured with
the options in OPT. If memory allocation fails: the return value
is a ‘NULL’ pointer.
The options object OPT is duplicated in the internal state of the
regular expression instance, so OPT can be safely mutated or
finalised after this call. If OPT is 'NULL': the regular
expression object is built with the default set of options.
– Function: void cre2_delete (cre2_regexp_t * REX)
Finalise a regular expression object releasing all the associated
resources.
– Function: const char * cre2_pattern (const cre2_regexp_t * REX)
Whether REX is a successfully built regular expression object or
not: return a pointer to the pattern string. The returned pointer
is valid only while REX is alive: if ‘cre2_delete()’ is applied to
REX the pointer becomes invalid.
– Function: int cre2_num_capturing_groups (const cre2_regexp_t * REX)
If REX is a successfully built regular expression object: return a
non-negative integer representing the number of capturing groups
(parenthetical subexpressions) in the pattern. If an error
occurred while building REX: return ‘-1’.
– Function: int cre2_find_named_capturing_groups (const cre2_regexp_t
* REX, const char * NAME)
If REX is a successfully built regular expression object: return a
non-negative integer representing the index of the named capturing
group whose name is NAME. If an error occurred while building REX
or the name is invalid: return ‘-1’.
const char * pattern = "from (?P.*) to (?P.*)";
cre2_options_t * opt = cre2_opt_new();
cre2_regexp_t * rex = cre2_new(pattern, strlen(pattern),
opt);
{
if (cre2_error_code(rex))
{ /* handle the error */ }
int nmatch = cre2_num_capturing_groups(rex) + 1;
cre2_string_t strings[nmatch];
int e, SIndex, DIndex;
const char * text = \
"from Montreal, Canada to Lausanne, Switzerland";
int text_len = strlen(text);
e = cre2_match(rex, text, text_len, 0, text_len,
CRE2_UNANCHORED, strings, nmatch);
if (0 == e)
{ /* handle the error */ }
SIndex = cre2_find_named_capturing_groups(rex, "S");
if (0 != strncmp("Montreal, Canada",
strings[SIndex].data, strings[SIndex].length))
{ /* handle the error */ }
DIndex = cre2_find_named_capturing_groups(rex, "D");
if (0 != strncmp("Lausanne, Switzerland",
strings[DIndex].data, strings[DIndex].length))
{ /* handle the error */ }
}
cre2_delete(rex);
cre2_opt_delete(opt);
– Function: int cre2_program_size (const cre2_regexp_t * REX)
If REX is a successfully built regular expression object: return a
non-negative integer representing the program size, a very
approximate measure of a regexp’s “cost”; larger numbers are more
expensive than smaller numbers. If an error occurred while
building REX: return ‘-1’.
– Function: int cre2_error_code (const cre2_regexp_t * REX)
In case an error occurred while building REX: return an integer
representing the associated error code. Return zero if no error
occurred.
– Function: const char * cre2_error_string (const cre2_regexp_t * REX)
If an error occurred while building REX: return a pointer to an
ASCIIZ string representing the associated error message. The
returned pointer is valid only while REX is alive: if
‘cre2_delete()’ is applied to REX the pointer becomes invalid.
If REX is a successfully built regular expression object: return a
pointer to an empty string.
The following code:
cre2_regexp_t * rex;
rex = cre2_new("ci(ao", 5, NULL);
{
printf("error: code=%d, msg=\"%s\"\n",
cre2_error_code(rex),
cre2_error_string(rex));
}
cre2_delete(rex);
prints:
error: code=6, msg="missing ): ci(ao"
– Function: void cre2_error_arg (const cre2_regexp_t * REX,
cre2_string_t * ARG)
If an error occurred while building REX: fill the structure
referenced by ARG with the interval of bytes representing the
offending portion of the pattern.
If REX is a successfully built regular expression object: ARG
references an empty string.
The following code:
cre2_regexp_t * rex;
cre2_string_t S;
rex = cre2_new("ci(ao", 5, NULL);
{
cre2_error_arg(rex, &S);
printf("arg: len=%d, data=\"%s\"\n", S.length, S.data);
}
cre2_delete(rex);
prints:
arg: len=5 data="ci(ao"
File: cre2.info, Node: options, Next: matching, Prev: regexps, Up: Top
4 Matching configuration
Compiled regular expressions can be configured, at construction-time,
with a number of options collected in a ‘cre2_options_t’ object. Notice
that, by default, when attempting to compile an invalid regular
expression pattern, RE2 will print to ‘stderr’ an error message; usually
we want to avoid this logging by disabling the associated option:
cre2_options_t * opt;
opt = cre2_opt_new();
cre2_opt_set_log_errors(opt, 0);
– Opaque Typedef: cre2_options_t
Type of opaque pointers to options objects. Any instance of this
type can be used to configure any number of regular expression
objects.
– Enumeration Typedef: cre2_encoding_t
Enumeration type for constants selecting encoding. It contains the
following values:
CRE2_UNKNOWN
CRE2_UTF8
CRE2_Latin1
The value 'CRE2_UNKNOWN' should never be used: it exists only in
case there is a mismatch between the definitions of RE2 and CRE2.
– Function: cre2_options_t * cre2_opt_new (void)
Allocate and return a new options object. If memory allocation
fails: the return value is a ‘NULL’ pointer.
– Function: void cre2_opt_delete (cre2_options_t * OPT)
Finalise an options object releasing all the associated resources.
Compiled regular expressions configured with this object are not
affected by its destruction.
All the following functions are getters and setters for regular
expression options; the FLAG argument to the setter must be false to
disable the option and true to enable it; unless otherwise specified the
‘int’ return value is true if the option is enabled and false if it is
disabled.
– Function: cre2_encoding_t cre2_opt_encoding (cre2_options_t * OPT)
– Function: void cre2_opt_set_encoding (cre2_options_t * OPT,
cre2_encoding_t ENC)
By default, the regular expression pattern and input text are
interpreted as UTF-8. CRE2_Latin1 encoding causes them to be
interpreted as Latin-1.
The getter returns 'CRE2_UNKNOWN' if the encoding value returned by
RE2 is unknown.
– Function: int cre2_opt_posix_syntax (cre2_options_t * OPT)
– Function: void cre2_opt_set_posix_syntax (cre2_options_t * OPT, int
FLAG)
Restrict regexps to POSIX egrep syntax. Default is disabled.
– Function: int cre2_opt_longest_match (cre2_options_t * OPT)
– Function: void cre2_opt_set_longest_match (cre2_options_t * OPT, int
FLAG)
Search for longest match, not first match. Default is disabled.
– Function: int cre2_opt_log_errors (cre2_options_t * OPT)
– Function: void cre2_opt_set_log_errors (cre2_options_t * OPT, int
FLAG)
Log syntax and execution errors to ‘stderr’. Default is enabled.
– Function: int cre2_opt_literal (cre2_options_t * OPT)
– Function: void cre2_opt_set_literal (cre2_options_t * OPT, int FLAG)
Interpret the pattern string as literal, not as regular expression.
Default is disabled.
Setting this option is equivalent to quoting all the special
characters defining a regular expression pattern:
cre2_regexp_t * rex;
cre2_options_t * opt;
const char * pattern = "(ciao) (hello)";
const char * text = pattern;
int len = strlen(pattern);
opt = cre2_opt_new();
cre2_opt_set_literal(opt, 1);
rex = cre2_new(pattern, len, opt);
{
/* successful match */
cre2_match(rex, text, len, 0, len,
CRE2_UNANCHORED, NULL, 0);
}
cre2_delete(rex);
cre2_opt_delete(opt);
– Function: int cre2_opt_never_nl (cre2_options_t * OPT)
– Function: void cre2_opt_set_never_nl (cre2_options_t * OPT, int
FLAG)
Never match a newline character, even if it is in the regular
expression pattern; default is disabled. Turning on this option
allows us to attempt a partial match, against the beginning of a
multiline text, without using subpatterns to exclude the newline in
the regexp pattern.
* When set to true: matching always fails if the text or the
regexp contains a newline.
* When set to false: matching succeeds or fails taking normal
account of newlines.
* The option does *not* cause newlines to be skipped.
– Function: int cre2_opt_dot_nl (cre2_options_t * OPT)
– Function: void cre2_opt_set_dot_nl (cre2_options_t * OPT, int FLAG)
The dot matches everything, including the new line; default is
disabled.
– Function: int cre2_opt_never_capture (cre2_options_t * OPT)
– Function: void cre2_opt_set_never_capture (cre2_options_t * OPT, int
FLAG)
Parse all the parentheses as non-capturing; default is disabled.
– Function: int cre2_opt_case_sensitive (cre2_options_t * OPT)
– Function: void cre2_opt_set_case_sensitive (cre2_options_t * OPT,
int FLAG)
Match is case-sensitive; the regular expression pattern can
override this setting with ‘(?i)’ unless configured in POSIX syntax
mode. Default is enabled.
– Function: int cre2_opt_max_mem (cre2_options_t * OPT)
– Function: void cre2_opt_set_max_mem (cre2_options_t * OPT, int M)
The max memory option controls how much memory can be used to hold
the compiled form of the regular expression and its cached DFA
graphs. These functions set and get such amount of memory. See
the documentation of RE2 for details.
The following options are only consulted when POSIX syntax is
enabled; when POSIX syntax is disabled: these features are always
enabled and cannot be turned off.
– Function: int cre2_opt_perl_classes (cre2_options_t * OPT)
– Function: void cre2_opt_set_perl_classes (cre2_options_t * OPT, int
FLAG)
Allow Perl’s ‘\d’, ‘\s’, ‘\w’, ‘\D’, ‘\S’, ‘\W’. Default is
disabled.
– Function: int cre2_opt_word_boundary (cre2_options_t * OPT)
– Function: void cre2_opt_set_word_boundary (cre2_options_t * OPT, int
FLAG)
Allow Perl’s ‘\b’, ‘\B’ (word boundary and not). Default is
disabled.
– Function: int cre2_opt_one_line (cre2_options_t * OPT)
– Function: void cre2_opt_set_one_line (cre2_options_t * OPT, int
FLAG)
The patterns ‘^’ and ‘$’ only match at the beginning and end of the
text. Default is disabled.
File: cre2.info, Node: matching, Next: other, Prev: options, Up: Top
5 Matching regular expressions
Basic pattern matching goes as follows (with error checking omitted):
cre2_regexp_t * rex;
cre2_options_t * opt;
const char * pattern = "(ciao) (hello)";
opt = cre2_opt_new();
cre2_opt_set_posix_syntax(opt, 1);
rex = cre2_new(pattern, strlen(pattern), opt);
{
const char * text = "ciao hello";
int text_len = strlen(text);
int nmatch = 3;
cre2_string_t match[nmatch];
cre2_match(rex, text, text_len, 0, text_len, CRE2_UNANCHORED,
match, nmatch);
/* prints: full match: ciao hello */
printf("full match: ");
fwrite(match[0].data, match[0].length, 1, stdout);
printf("\n");
/* prints: first group: ciao */
printf("first group: ");
fwrite(match[1].data, match[1].length, 1, stdout);
printf("\n");
/* prints: second group: hello */
printf("second group: ");
fwrite(match[2].data, match[2].length, 1, stdout);
printf("\n");
}
cre2_delete(rex);
cre2_opt_delete(opt);
– Enumeration Typedef: cre2_anchor_t
Enumeration type for the anchor point of matching operations. It
contains the following constants:
CRE2_UNANCHORED
CRE2_ANCHOR_START
CRE2_ANCHOR_BOTH
– Function: int cre2_match (const cre2_regexp_t * REX, const char *
TEXT, int TEXT_LEN, int START_POS, int END_POS, cre2_anchor_t
ANCHOR, cre2_string_t * MATCH, int NMATCH)
Match a substring of the text referenced by TEXT and holding
TEXT_LEN bytes against the regular expression object REX. Return
true if the text matched, false otherwise.
The zero-based indices START_POS (inclusive) and END_POS
(exclusive) select the substring of TEXT to be examined. ANCHOR
selects the anchor point for the matching operation.
Data about the matching groups is stored in the array MATCH, which
must have at least NMATCH entries; the referenced substrings are
portions of the TEXT buffer. If we are only interested in
verifying if the text matches or not (ignoring the matching
portions of text): we can use 'NULL' as MATCH argument and 0 as
NMATCH argument.
The first element of MATCH (index 0) references the full portion of
the substring of TEXT matching the pattern; the second element of
MATCH (index 1) references the portion of text matching the first
parenthetical subexpression, the third element of MATCH (index 2)
references the portion of text matching the second parenthetical
subexpression; and so on.
– Function: int cre2_easy_match (const char * PATTERN, int
PATTERN_LEN, const char * TEXT, int TEXT_LEN, cre2_string_t *
MATCH, int NMATCH)
Like ‘cre2_match()’ but the pattern is specified as string PATTERN
holding PATTERN_LEN bytes. Also the text is fully matched without
anchoring.
If the text matches the pattern: the return value is 1. If the
text does not match the pattern: the return value is 0. If the
pattern is invalid: the return value is 2.
– Struct Typedef: cre2_range_t
Structure type used to represent a substring of the text to be
matched as starting and ending indices. It has the following
fields:
'long start'
Inclusive start byte index.
'long past'
Exclusive end byte index.
– Function: void cre2_strings_to_ranges (const char * TEXT,
cre2_range_t * RANGES, cre2_string_t * STRINGS, int NMATCH)
Given an array of STRINGS with NMATCH elements being the result of
matching TEXT against a regular expression: fill the array of
RANGES with the index intervals in the TEXT buffer representing the
same results.
File: cre2.info, Node: other, Next: tips, Prev: matching, Up: Top
6 Other matching functions
The following functions match a buffer of text against a regular
expression, allowing the extraction of portions of text matching
parenthetical subexpressions. All of them show the following behaviour:
If the text matches the pattern: the return value is 1; if the text
does not match the pattern: the return value is 0.
If the pattern is invalid: the return value is 0; there is no way
to distinguish this case from the case of text not matching other
than looking at what RE2 prints to ‘stderr’.
It is impossible to turn off logging of error messages to ‘stderr’
when the specification of the regular expression is invalid.
Data about the matching groups is stored in the array MATCH, which
must have at least NMATCH slots; the referenced substrings are
portions of the TEXT buffer.
The array MATCH can have a number of slots between zero (included)
and the number of parenthetical subexpressions in PATTERN
(excluded); if NMATCH is greater than the number of parenthetical
subexpressions: the return value is 0.
If we are only interested in verifying if the text matches the
pattern or not: we can use ‘NULL’ as MATCH argument and 0 as NMATCH
argument.
The first slot of MATCH (index 0) references the portion of text
matching the first parenthetical subexpression; the second slot of
MATCH (index 1) references the portion of text matching the second
parenthetical subexpression; and so on.
see the documentation of each function for the differences.
The following example is a successful match:
const char * pattern = "ci.*ut";
const char * text = "ciao salut";
cre2_string_t input = {
.data = text,
.length = strlen(text)
};
int result;
result = cre2_full_match(pattern, &input, NULL, 0);
result => 1
the following example is a successful match in which the parenthetical
subexpression is ignored:
const char * pattern = "(ciao) salut";
const char * text = "ciao salut";
cre2_string_t input = {
.data = text,
.length = strlen(text)
};
int result;
result = cre2_full_match(pattern, &input, NULL, 0);
result => 1
the following example is a successful match in which the portion of text
matching the parenthetical subexpression is reported:
const char * pattern = "(ciao) salut";
const char * text = "ciao salut";
cre2_string_t input = {
.data = text,
.length = strlen(text)
};
int nmatch = 1;
cre2_string_t match[nmatch];
int result;
result = cre2_full_match(pattern, &input, match, nmatch);
result => 1
strncmp(text, input.data, input.length) => 0
strncmp("ciao", match[0].data, match[0].length) => 0
– Function: int cre2_full_match (const char * PATTERN, const
cre2_string_t * TEXT, cre2_string_t * MATCH, int NMATCH)
– Function: int cre2_full_match_re (cre2_regexp_t * REX, const
cre2_string_t * TEXT, cre2_string_t * MATCH, int NMATCH)
Match the zero-terminated string PATTERN or the precompiled regular
expression REX against the full buffer TEXT.
For example: the text 'abcdef' matches the pattern 'abcdef'
according to this function, but neither the pattern 'abc' nor the
pattern 'def' will match.
– Function: int cre2_partial_match (const char * PATTERN, const
cre2_string_t * TEXT, cre2_string_t * MATCH, int NMATCH)
– Function: int cre2_partial_match_re (cre2_regexp_t * REX, const
cre2_string_t * TEXT, cre2_string_t * MATCH, int NMATCH)
Match the zero-terminated string PATTERN or the precompiled regular
expression REX against the buffer TEXT, resulting in success if a
substring of TEXT matches; these functions behave like the full
match ones, but the matching text does not need to be anchored to
the beginning and end.
For example: the text 'abcDEFghi' matches the pattern 'DEF'
according to this function.
– Function: int cre2_consume (const char * PATTERN, cre2_string_t *
TEXT, cre2_string_t * MATCH, int NMATCH)
– Function: int cre2_consume_re (cre2_regexp_t * REX, cre2_string_t *
TEXT, cre2_string_t * MATCH, int NMATCH)
Match the zero-terminated string PATTERN or the precompiled regular
expression REX against the buffer TEXT, resulting in success if the
prefix of TEXT matches. The data structure referenced by TEXT is
mutated to reference text right after the last byte that matched
the pattern.
For example: the text 'abcDEF' matches the pattern 'abc' according
to this function; after the call TEXT will reference the text
'DEF'.
– Function: int cre2_find_and_consume (const char * PATTERN,
cre2_string_t * TEXT, cre2_string_t * MATCH, int NMATCH)
– Function: int cre2_find_and_consume_re (cre2_regexp_t * REX,
cre2_string_t * TEXT, cre2_string_t * MATCH, int NMATCH)
Match the zero-terminated string PATTERN or the precompiled regular
expression REX against the buffer TEXT, resulting in success if,
after skipping a non-matching prefix in TEXT, a substring of TEXT
matches. The data structure referenced by TEXT is mutated to
reference text right after the last byte that matched the pattern.
For example: the text 'abcDEFghi' matches the pattern 'DEF'
according to this function; the prefix 'abc' is skipped; after the
call TEXT will reference the text 'ghi'.
File: cre2.info, Node: tips, Next: Package License, Prev: other, Up: Top
7 Tips for using the regexp syntax
Menu:
tips dot:: Matching newlines with the
‘.’ subpattern.
File: cre2.info, Node: tips dot, Up: tips
By default the dot subpattern ‘.’ matches any character but newlines; to
enable newline matching we have to enable the ‘s’ flag using the special
subpattern ‘(?)’ or ‘(?:)’, where ‘’ is a
sequence of characters, one character for each flag, and ‘’ is a
regexp subpattern. Notice that the parentheses in ‘(?:)’ are
non-capturing.
So let’s consider the text ‘ciao\nhello’:
The regexp ‘ciao.hello’ does not match because ‘s’ is disabled.
The regexp ‘(?s)ciao.hello’ matches because the subpattern ‘(?s)’
has enabled flag ‘s’ for the rest of the pattern, including the
dot.
The regexp ‘ciao(?s).hello’ matches because the subpattern ‘(?s)’
has enabled flag ‘s’ for the rest of the pattern, including the
dot.
The regexp ‘ciao(?s:.)hello’ matches because the subpattern
‘(?s:.)’ has enabled flag ‘s’ for the subpattern ‘.’ which is the
dot.
File: cre2.info, Node: Package License, Next: Documentation License, Prev: tips, Up: Top
Appendix A Package license
Copyright (C) 2012, 2016 Marco Maggi http://github.com/marcomaggi
Copyright (C) 2011 Keegan McAllister http://github.com/kmcallister/
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the
distribution.
Neither the name of the author nor the names of his contributors
may be used to endorse or promote products derived from this
software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
“AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
File: cre2.info, Node: Documentation License, Next: references, Prev: Package License, Up: Top
Appendix B GNU Free Documentation License
Version 1.3, 3 November 2008
Copyright (C) 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc.
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
PREAMBLE
The purpose of this License is to make a manual, textbook, or other
functional and useful document “free” in the sense of freedom: to
assure everyone the effective freedom to copy and redistribute it,
with or without modifying it, either commercially or
noncommercially. Secondarily, this License preserves for the
author and publisher a way to get credit for their work, while not
being considered responsible for modifications made by others.
This License is a kind of “copyleft”, which means that derivative
works of the document must themselves be free in the same sense.
It complements the GNU General Public License, which is a copyleft
license designed for free software.
We have designed this License in order to use it for manuals for
free software, because free software needs free documentation: a
free program should come with manuals providing the same freedoms
that the software does. But this License is not limited to
software manuals; it can be used for any textual work, regardless
of subject matter or whether it is published as a printed book. We
recommend this License principally for works whose purpose is
instruction or reference.
APPLICABILITY AND DEFINITIONS
This License applies to any manual or other work, in any medium,
that contains a notice placed by the copyright holder saying it can
be distributed under the terms of this License. Such a notice
grants a world-wide, royalty-free license, unlimited in duration,
to use that work under the conditions stated herein. The
“Document”, below, refers to any such manual or work. Any member
of the public is a licensee, and is addressed as “you”. You accept
the license if you copy, modify or distribute the work in a way
requiring permission under copyright law.
A “Modified Version” of the Document means any work containing the
Document or a portion of it, either copied verbatim, or with
modifications and/or translated into another language.
A “Secondary Section” is a named appendix or a front-matter section
of the Document that deals exclusively with the relationship of the
publishers or authors of the Document to the Document’s overall
subject (or to related matters) and contains nothing that could
fall directly within that overall subject. (Thus, if the Document
is in part a textbook of mathematics, a Secondary Section may not
explain any mathematics.) The relationship could be a matter of
historical connection with the subject or with related matters, or
of legal, commercial, philosophical, ethical or political position
regarding them.
The “Invariant Sections” are certain Secondary Sections whose
titles are designated, as being those of Invariant Sections, in the
notice that says that the Document is released under this License.
If a section does not fit the above definition of Secondary then it
is not allowed to be designated as Invariant. The Document may
contain zero Invariant Sections. If the Document does not identify
any Invariant Sections then there are none.
The “Cover Texts” are certain short passages of text that are
listed, as Front-Cover Texts or Back-Cover Texts, in the notice
that says that the Document is released under this License. A
Front-Cover Text may be at most 5 words, and a Back-Cover Text may
be at most 25 words.
A “Transparent” copy of the Document means a machine-readable copy,
represented in a format whose specification is available to the
general public, that is suitable for revising the document
straightforwardly with generic text editors or (for images composed
of pixels) generic paint programs or (for drawings) some widely
available drawing editor, and that is suitable for input to text
formatters or for automatic translation to a variety of formats
suitable for input to text formatters. A copy made in an otherwise
Transparent file format whose markup, or absence of markup, has
been arranged to thwart or discourage subsequent modification by
readers is not Transparent. An image format is not Transparent if
used for any substantial amount of text. A copy that is not
“Transparent” is called “Opaque”.
Examples of suitable formats for Transparent copies include plain
ASCII without markup, Texinfo input format, LaTeX input format,
SGML or XML using a publicly available DTD, and standard-conforming
simple HTML, PostScript or PDF designed for human modification.
Examples of transparent image formats include PNG, XCF and JPG.
Opaque formats include proprietary formats that can be read and
edited only by proprietary word processors, SGML or XML for which
the DTD and/or processing tools are not generally available, and
the machine-generated HTML, PostScript or PDF produced by some word
processors for output purposes only.
The “Title Page” means, for a printed book, the title page itself,
plus such following pages as are needed to hold, legibly, the
material this License requires to appear in the title page. For
works in formats which do not have any title page as such, “Title
Page” means the text near the most prominent appearance of the
work’s title, preceding the beginning of the body of the text.
The “publisher” means any person or entity that distributes copies
of the Document to the public.
A section “Entitled XYZ” means a named subunit of the Document
whose title either is precisely XYZ or contains XYZ in parentheses
following text that translates XYZ in another language. (Here XYZ
stands for a specific section name mentioned below, such as
“Acknowledgements”, “Dedications”, “Endorsements”, or “History”.)
To “Preserve the Title” of such a section when you modify the
Document means that it remains a section “Entitled XYZ” according
to this definition.
The Document may include Warranty Disclaimers next to the notice
which states that this License applies to the Document. These
Warranty Disclaimers are considered to be included by reference in
this License, but only as regards disclaiming warranties: any other
implication that these Warranty Disclaimers may have is void and
has no effect on the meaning of this License.
VERBATIM COPYING
You may copy and distribute the Document in any medium, either
commercially or noncommercially, provided that this License, the
copyright notices, and the license notice saying this License
applies to the Document are reproduced in all copies, and that you
add no other conditions whatsoever to those of this License. You
may not use technical measures to obstruct or control the reading
or further copying of the copies you make or distribute. However,
you may accept compensation in exchange for copies. If you
distribute a large enough number of copies you must also follow the
conditions in section 3.
You may also lend copies, under the same conditions stated above,
and you may publicly display copies.
COPYING IN QUANTITY
If you publish printed copies (or copies in media that commonly
have printed covers) of the Document, numbering more than 100, and
the Document’s license notice requires Cover Texts, you must
enclose the copies in covers that carry, clearly and legibly, all
these Cover Texts: Front-Cover Texts on the front cover, and
Back-Cover Texts on the back cover. Both covers must also clearly
and legibly identify you as the publisher of these copies. The
front cover must present the full title with all words of the title
equally prominent and visible. You may add other material on the
covers in addition. Copying with changes limited to the covers, as
long as they preserve the title of the Document and satisfy these
conditions, can be treated as verbatim copying in other respects.
If the required texts for either cover are too voluminous to fit
legibly, you should put the first ones listed (as many as fit
reasonably) on the actual cover, and continue the rest onto
adjacent pages.
If you publish or distribute Opaque copies of the Document
numbering more than 100, you must either include a machine-readable
Transparent copy along with each Opaque copy, or state in or with
each Opaque copy a computer-network location from which the general
network-using public has access to download using public-standard
network protocols a complete Transparent copy of the Document, free
of added material. If you use the latter option, you must take
reasonably prudent steps, when you begin distribution of Opaque
copies in quantity, to ensure that this Transparent copy will
remain thus accessible at the stated location until at least one
year after the last time you distribute an Opaque copy (directly or
through your agents or retailers) of that edition to the public.
It is requested, but not required, that you contact the authors of
the Document well before redistributing any large number of copies,
to give them a chance to provide you with an updated version of the
Document.
MODIFICATIONS
You may copy and distribute a Modified Version of the Document
under the conditions of sections 2 and 3 above, provided that you
release the Modified Version under precisely this License, with the
Modified Version filling the role of the Document, thus licensing
distribution and modification of the Modified Version to whoever
possesses a copy of it. In addition, you must do these things in
the Modified Version:
A. Use in the Title Page (and on the covers, if any) a title
distinct from that of the Document, and from those of previous
versions (which should, if there were any, be listed in the
History section of the Document). You may use the same title
as a previous version if the original publisher of that
version gives permission.
B. List on the Title Page, as authors, one or more persons or
entities responsible for authorship of the modifications in
the Modified Version, together with at least five of the
principal authors of the Document (all of its principal
authors, if it has fewer than five), unless they release you
from this requirement.
C. State on the Title page the name of the publisher of the
Modified Version, as the publisher.
D. Preserve all the copyright notices of the Document.
E. Add an appropriate copyright notice for your modifications
adjacent to the other copyright notices.
F. Include, immediately after the copyright notices, a license
notice giving the public permission to use the Modified
Version under the terms of this License, in the form shown in
the Addendum below.
G. Preserve in that license notice the full lists of Invariant
Sections and required Cover Texts given in the Document’s
license notice.
H. Include an unaltered copy of this License.
I. Preserve the section Entitled “History”, Preserve its Title,
and add to it an item stating at least the title, year, new
authors, and publisher of the Modified Version as given on the
Title Page. If there is no section Entitled “History” in the
Document, create one stating the title, year, authors, and
publisher of the Document as given on its Title Page, then add
an item describing the Modified Version as stated in the
previous sentence.
J. Preserve the network location, if any, given in the Document
for public access to a Transparent copy of the Document, and
likewise the network locations given in the Document for
previous versions it was based on. These may be placed in the
“History” section. You may omit a network location for a work
that was published at least four years before the Document
itself, or if the original publisher of the version it refers
to gives permission.
K. For any section Entitled “Acknowledgements” or “Dedications”,
Preserve the Title of the section, and preserve in the section
all the substance and tone of each of the contributor
acknowledgements and/or dedications given therein.
L. Preserve all the Invariant Sections of the Document, unaltered
in their text and in their titles. Section numbers or the
equivalent are not considered part of the section titles.
M. Delete any section Entitled “Endorsements”. Such a section
may not be included in the Modified Version.
N. Do not retitle any existing section to be Entitled
“Endorsements” or to conflict in title with any Invariant
Section.
O. Preserve any Warranty Disclaimers.
If the Modified Version includes new front-matter sections or
appendices that qualify as Secondary Sections and contain no
material copied from the Document, you may at your option designate
some or all of these sections as invariant. To do this, add their
titles to the list of Invariant Sections in the Modified Version’s
license notice. These titles must be distinct from any other
section titles.
You may add a section Entitled “Endorsements”, provided it contains
nothing but endorsements of your Modified Version by various
parties–for example, statements of peer review or that the text
has been approved by an organization as the authoritative
definition of a standard.
You may add a passage of up to five words as a Front-Cover Text,
and a passage of up to 25 words as a Back-Cover Text, to the end of
the list of Cover Texts in the Modified Version. Only one passage
of Front-Cover Text and one of Back-Cover Text may be added by (or
through arrangements made by) any one entity. If the Document
already includes a cover text for the same cover, previously added
by you or by arrangement made by the same entity you are acting on
behalf of, you may not add another; but you may replace the old
one, on explicit permission from the previous publisher that added
the old one.
The author(s) and publisher(s) of the Document do not by this
License give permission to use their names for publicity for or to
assert or imply endorsement of any Modified Version.
COMBINING DOCUMENTS
You may combine the Document with other documents released under
this License, under the terms defined in section 4 above for
modified versions, provided that you include in the combination all
of the Invariant Sections of all of the original documents,
unmodified, and list them all as Invariant Sections of your
combined work in its license notice, and that you preserve all
their Warranty Disclaimers.
The combined work need only contain one copy of this License, and
multiple identical Invariant Sections may be replaced with a single
copy. If there are multiple Invariant Sections with the same name
but different contents, make the title of each such section unique
by adding at the end of it, in parentheses, the name of the
original author or publisher of that section if known, or else a
unique number. Make the same adjustment to the section titles in
the list of Invariant Sections in the license notice of the
combined work.
In the combination, you must combine any sections Entitled
“History” in the various original documents, forming one section
Entitled “History”; likewise combine any sections Entitled
“Acknowledgements”, and any sections Entitled “Dedications”. You
must delete all sections Entitled “Endorsements.”
COLLECTIONS OF DOCUMENTS
You may make a collection consisting of the Document and other
documents released under this License, and replace the individual
copies of this License in the various documents with a single copy
that is included in the collection, provided that you follow the
rules of this License for verbatim copying of each of the documents
in all other respects.
You may extract a single document from such a collection, and
distribute it individually under this License, provided you insert
a copy of this License into the extracted document, and follow this
License in all other respects regarding verbatim copying of that
document.
AGGREGATION WITH INDEPENDENT WORKS
A compilation of the Document or its derivatives with other
separate and independent documents or works, in or on a volume of a
storage or distribution medium, is called an “aggregate” if the
copyright resulting from the compilation is not used to limit the
legal rights of the compilation’s users beyond what the individual
works permit. When the Document is included in an aggregate, this
License does not apply to the other works in the aggregate which
are not themselves derivative works of the Document.
If the Cover Text requirement of section 3 is applicable to these
copies of the Document, then if the Document is less than one half
of the entire aggregate, the Document’s Cover Texts may be placed
on covers that bracket the Document within the aggregate, or the
electronic equivalent of covers if the Document is in electronic
form. Otherwise they must appear on printed covers that bracket
the whole aggregate.
TRANSLATION
Translation is considered a kind of modification, so you may
distribute translations of the Document under the terms of section
Replacing Invariant Sections with translations requires special
permission from their copyright holders, but you may include
translations of some or all Invariant Sections in addition to the
original versions of these Invariant Sections. You may include a
translation of this License, and all the license notices in the
Document, and any Warranty Disclaimers, provided that you also
include the original English version of this License and the
original versions of those notices and disclaimers. In case of a
disagreement between the translation and the original version of
this License or a notice or disclaimer, the original version will
prevail.
If a section in the Document is Entitled “Acknowledgements”,
“Dedications”, or “History”, the requirement (section 4) to
Preserve its Title (section 1) will typically require changing the
actual title.
TERMINATION
You may not copy, modify, sublicense, or distribute the Document
except as expressly provided under this License. Any attempt
otherwise to copy, modify, sublicense, or distribute it is void,
and will automatically terminate your rights under this License.
However, if you cease all violation of this License, then your
license from a particular copyright holder is reinstated (a)
provisionally, unless and until the copyright holder explicitly and
finally terminates your license, and (b) permanently, if the
copyright holder fails to notify you of the violation by some
reasonable means prior to 60 days after the cessation.
Moreover, your license from a particular copyright holder is
reinstated permanently if the copyright holder notifies you of the
violation by some reasonable means, this is the first time you have
received notice of violation of this License (for any work) from
that copyright holder, and you cure the violation prior to 30 days
after your receipt of the notice.
Termination of your rights under this section does not terminate
the licenses of parties who have received copies or rights from you
under this License. If your rights have been terminated and not
permanently reinstated, receipt of a copy of some or all of the
same material does not give you any rights to use it.
FUTURE REVISIONS OF THIS LICENSE
The Free Software Foundation may publish new, revised versions of
the GNU Free Documentation License from time to time. Such new
versions will be similar in spirit to the present version, but may
differ in detail to address new problems or concerns. See
http://www.gnu.org/copyleft/.
Each version of the License is given a distinguishing version
number. If the Document specifies that a particular numbered
version of this License “or any later version” applies to it, you
have the option of following the terms and conditions either of
that specified version or of any later version that has been
published (not as a draft) by the Free Software Foundation. If the
Document does not specify a version number of this License, you may
choose any version ever published (not as a draft) by the Free
Software Foundation. If the Document specifies that a proxy can
decide which future versions of this License can be used, that
proxy’s public statement of acceptance of a version permanently
authorizes you to choose that version for the Document.
RELICENSING
“Massive Multiauthor Collaboration Site” (or “MMC Site”) means any
World Wide Web server that publishes copyrightable works and also
provides prominent facilities for anybody to edit those works. A
public wiki that anybody can edit is an example of such a server.
A “Massive Multiauthor Collaboration” (or “MMC”) contained in the
site means any set of copyrightable works thus published on the MMC
site.
“CC-BY-SA” means the Creative Commons Attribution-Share Alike 3.0
license published by Creative Commons Corporation, a not-for-profit
corporation with a principal place of business in San Francisco,
California, as well as future copyleft versions of that license
published by that same organization.
“Incorporate” means to publish or republish a Document, in whole or
in part, as part of another Document.
An MMC is “eligible for relicensing” if it is licensed under this
License, and if all works that were first published under this
License somewhere other than this MMC, and subsequently
incorporated in whole or in part into the MMC, (1) had no cover
texts or invariant sections, and (2) were thus incorporated prior
to November 1, 2008.
The operator of an MMC Site may republish an MMC contained in the
site under CC-BY-SA on the same site at any time before August 1,
2009, provided the MMC is eligible for relicensing.
To use this License in a document you have written, include a copy of
the License in the document and put the following copyright and license
notices just after the title page:
Copyright (C) YEAR YOUR NAME.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
Texts. A copy of the license is included in the section entitled ``GNU
Free Documentation License''.
If you have Invariant Sections, Front-Cover Texts and Back-Cover
Texts, replace the “with…Texts.” line with this:
with the Invariant Sections being LIST THEIR TITLES, with
the Front-Cover Texts being LIST, and with the Back-Cover Texts
being LIST.
If you have Invariant Sections without Cover Texts, or some other
combination of the three, merge those two alternatives to suit the
situation.
If your document contains nontrivial examples of program code, we
recommend releasing these examples in parallel under your choice of free
software license, such as the GNU General Public License, to permit
their use in free software.
File: cre2.info, Node: references, Next: concept index, Prev: Documentation License, Up: Top
Appendix C Bibliography and references
File: cre2.info, Node: concept index, Next: function index, Prev: references, Up: Top
Appendix D An entry for each concept