Table Of Contents

Previous topic

String Printing Functions

Next topic

String Lists

Regular Expressions

UTF-8 aware regular expression and functions.

The Speect regular expression syntax is a bit different from the POSIX or Perl syntax’s. The supported meta-character (operators) are mostly the same, that is: . | ( ) [ ] ? + * ^ $

Escaping (literal character inclusion) is supported using the -character. POSIX character classes are not supported. Unlike the POSIX or Perl variants, the Speect regular expression engine always matches the whole string, and not part of it. That is, regular expression “a” matches the string “a”, but not the string “blah”, whereas a POSIX or Perl regular expression would match both strings. To get the latter behaviour, simply add ”.*” before and after the string, i.e. ”.*a.*”.

The regular expressions are matched using a purely NFA (nondeterministic finite automaton) based approach. No backtracking algorithm is provided.

Also see Regular Expressions Example.

Syntax Reference

Metacharacter Description
. Matches any single character.
| The alternation operator matches either the expression before or the expression after the operator. For example, abc|def matches “abc” or “def”.
( ) Defines a marked group.
[ ] A bracket expression. Matches a single character that is contained within the brackets. For example, [abc] matches “a”, “b”, or “c”. [a-z] specifies a range which matches any lowercase letter from “a” to “z”. These forms can be mixed: [abcx-z0-9] matches “a”, “b”, “c”, “x”, “y”, “z” or single digits “0” to “9”.
[^ ] Matches a single character that is not contained within the brackets. For example, [^abc] matches any character other than “a”, “b”, or “c”. [^a-z] matches any single character that is not a lowercase letter from “a” to “z”.
? Matches the preceding element zero or one time.
+ Matches the preceding element one or more times.
* Matches the preceding element zero or more times.
^ Matches the starting position within the string.
$ Matches the ending position of the string.

Summary

s_regex_comp Compile a UTF-8 regular expression and return a pointer to the generated description.
s_regex_match Matches a null-terminated UTF-8 string against the given compiled regular expression in rxcomp.
s_regexsub_num_groups Query the number of groups that correspond to the parenthesized sub-expressions of the given matched regular expression.
s_regexsub_group Extract the given numbered group of the given matched regular expression sub-expression elements.

Definitions

typedef struct s_regex s_regex

Type definition of the (opaque) compiled regular expression.

typedef struct s_regexsub s_regexsub

Type definition of the (opaque) sub-expression elements of a matched regular expression.

typedef enum s_regex_flags

Regular expression flags.

Values:

  • S_DOT_INCLD_NEWLINE = 0 -

    ”.” Metacharacter includes newlines.

  • S_DOT_EXCLD_NEWLINE = 1 -

    ”.” Metacharacter excludes newlines.

Compile

s_regex *s_regex_comp(const char *string, s_regex_flags flags, s_erc *error)

Compile a UTF-8 regular expression and return a pointer to the generated description.

Parameters:
  • string

    String containing the regular expression to compile.

  • flags

    Regular expression flags.

  • error

    Error code.

Return:

Compiled regular expression.

Match

int s_regex_match(s_regex *rxcomp, const char *string, s_regexsub **rsub, s_erc *error)

Matches a null-terminated UTF-8 string against the given compiled regular expression in rxcomp.

If it matches, and the sub-expression elements structure rsub is not NULL, rsub will be filled with character pointers to the groups of strings that correspond to the parenthesized sub-expressions of the expression.

Parameters:
  • rxcomp

    The compiled regular expression.

  • string

    String to run regular expression on.

  • rsub

    Sub-expression elements, or NULL.

  • error

    Error code.

Return Value:
  • 0

    if no match

  • >

    0 if a match

  • <

    0 if we ran out of space

Query

uint8 s_regexsub_num_groups(s_regexsub *rsub, s_erc *error)

Query the number of groups that correspond to the parenthesized sub-expressions of the given matched regular expression.

Parameters:
  • rsub

    Sub-expression elements of a matched regular expression.

  • error

    Error code.

Return:

The number of sub-string matches.

Extract

char *s_regexsub_group(s_regexsub *rsub, uint8 n, s_erc *error)

Extract the given numbered group of the given matched regular expression sub-expression elements.

Parameters:
  • rsub

    Sub-expression elements of a matched regular expression.

  • n

    The group number to extract.

  • error

    Error code.

Return:

Character string of the given numbered group, can be NULL.

Note:

Caller is responsible for memory of returned string.

Group 0 is always the whole match.