UTF-8 aware regular expression and functions.
The Speect regular expression syntax is a bit different from the POSIX or Perl syntax’s. The supported meta-character (operators) are mostly the same, that is: . | ( ) [ ] ? + * ^ $
Escaping (literal character inclusion) is supported using the -character. POSIX character classes are not supported. Unlike the POSIX or Perl variants, the Speect regular expression engine always matches the whole string, and not part of it. That is, regular expression “a” matches the string “a”, but not the string “blah”, whereas a POSIX or Perl regular expression would match both strings. To get the latter behaviour, simply add ”.*” before and after the string, i.e. ”.*a.*”.
The regular expressions are matched using a purely NFA (nondeterministic finite automaton) based approach. No backtracking algorithm is provided.
Also see Regular Expressions Example.
Metacharacter | Description |
---|---|
. | Matches any single character. |
| | The alternation operator matches either the expression before or the expression after the operator. For example, abc|def matches “abc” or “def”. |
( ) | Defines a marked group. |
[ ] | A bracket expression. Matches a single character that is contained within the brackets. For example, [abc] matches “a”, “b”, or “c”. [a-z] specifies a range which matches any lowercase letter from “a” to “z”. These forms can be mixed: [abcx-z0-9] matches “a”, “b”, “c”, “x”, “y”, “z” or single digits “0” to “9”. |
[^ ] | Matches a single character that is not contained within the brackets. For example, [^abc] matches any character other than “a”, “b”, or “c”. [^a-z] matches any single character that is not a lowercase letter from “a” to “z”. |
? | Matches the preceding element zero or one time. |
+ | Matches the preceding element one or more times. |
* | Matches the preceding element zero or more times. |
^ | Matches the starting position within the string. |
$ | Matches the ending position of the string. |
s_regex_comp | Compile a UTF-8 regular expression and return a pointer to the generated description. |
s_regex_match | Matches a null-terminated UTF-8 string against the given compiled regular expression in rxcomp. |
s_regexsub_num_groups | Query the number of groups that correspond to the parenthesized sub-expressions of the given matched regular expression. |
s_regexsub_group | Extract the given numbered group of the given matched regular expression sub-expression elements. |
Type definition of the (opaque) compiled regular expression.
Type definition of the (opaque) sub-expression elements of a matched regular expression.
Regular expression flags.
Values:
”.” Metacharacter includes newlines.
”.” Metacharacter excludes newlines.
Compile a UTF-8 regular expression and return a pointer to the generated description.
Parameters: |
|
---|---|
Return: | Compiled regular expression. |
Matches a null-terminated UTF-8 string against the given compiled regular expression in rxcomp.
If it matches, and the sub-expression elements structure rsub is not NULL, rsub will be filled with character pointers to the groups of strings that correspond to the parenthesized sub-expressions of the expression.
Parameters: |
|
---|---|
Return Value: |
|
Query the number of groups that correspond to the parenthesized sub-expressions of the given matched regular expression.
Parameters: |
|
---|---|
Return: | The number of sub-string matches. |
Extract the given numbered group of the given matched regular expression sub-expression elements.
Parameters: |
|
---|---|
Return: | Character string of the given numbered group, can be NULL. |
Note: | Caller is responsible for memory of returned string. Group 0 is always the whole match. |