首页 > 代码库 > 对于C11中的正则表达式的使用

对于C11中的正则表达式的使用

Regular Expression Special Characters

"."---Any single character(a "wildcard")

"["---Begin character class

"]"---End character class

"{"---Begin count

"}"---End count

"("---Begin grouping

")"---End grouping

"\"---Next character has a special meaning

"*"---Zero or more

"+"---One or more

"?"---Optional(zero or one)

"!"---Alternative(or)

"^"---Start of line; negation

"$"---End of line

?

Example:

case 1:

        ^A*B+C?$

explain 1:

        以A开头,有多个或者没有B,有至少一个C,之后有没有都可以,结束。


A pattern can be optional or repeated(the default is exactly once) by adding a suffix:


Repetition

{n}---Exactly n times;

{n,}---no less than n times;

{n,m}---at least n times and at most m times;

*---Zero or more , that is , {0,}

+---One or more, that is ,{1,}

?---Optional(zero or one), that is {0,1}


Example:

case 1:

        A{3}B{2,4}C*

explain 1:

        AAABBC  or  AAABBB


A suffix ? after any of the repetition notations makes the pattern matcher "lazy" or "non-greedy".

That is , when looking for a pattern, it will look for the shortest match rather than the lonest.

By default, the pattern matcher always looks for the longest match (similar to C++‘s Max rule).

Consider:

    ?ababab

The pattern (ab)*matches all of "ababab". However, (ab)*? matches only the first "ab".

The most common character classifications have names:

Character Classes

alnum --- Any alphanumeric character

alpha --- Any alphanumeric character

blank --- Any whitespace character that is not a line separator

cntrl --- Any control character

d --- Any decimal digit

digit --- Any decimal digit

graph --- Any graphical character

lower --- Any lowercase character

print --- Any printable character

punct --- Any punctuation character

s --- Any whitespace character

space --- Any whitespace character

upper --- Any uppercase charater

w --- Any word character(alphnumeric characters plus the underscore)

xdigit --- Any hexadecimal digit character


Several character classes are supported by shorthand notation:

Character Class Abbreviations
\d --- A decimal digit --- [[:digit:]]

\s --- A space (space tab,...) --- [[:space:]]

\w --- A letter(a-z) or digit(0-9) or underscore(_) --- [_[:alnum:]]

\D --- Not \d --- [^[:digit:]]

\S --- Not \s --- [^[:space:]]

\W --- Not \w --- [^_[:alnum:]]

In addition, languages supporting regular expressions often provide:

Nonstandard (but Common)  Character Class Abbreviations

\l --- A lowercase character --- [[:lower:]]

\u --- An uppercase character --- [[:upper;]]

\L --- Not \l --- [^[:lower:]]

\U --- Not \u --- [^[:upper:]]


Note the doubling of the backslash to include a backslash in an ordinary string literal.

As usual, backslashes can denote special charaters:

Special Characters

\n --- Newline

\t --- Tab

\\ --- One backslash

\xhh -- Unicode characters expressed using twp hexadecimal digits

\uhhh --- Unicode characters expressed using four hexadecimal digits


To add to the opportunites for confusion, two further logically differents uses of the backslash are provided:

Special Characters

\b --- The first or last character of a word (a "boundary character")

\B --- Not a \b

\i --- The ith sub_match in this pattern


Here are some examples of patterns:

Ax*    ?    ?//A,Ax,Axxxx

Ax+    ?    ?//Ax,Axxx not A

\d-?\d    ?//1-2,12 not 1--2

\w{2}-d{4,5}    ?    ?//Ab-1234,XX54321,22-5432

(\d*:)?(\d+)    ?    ?  //12:3, 1:23, 123, :123 Not 123:

(bs|BS)    ?    ?    ?    ?  //bs ,BS Not bS

[aeiouy]    ?    ?    ?    ?//a,o,u    An English vowel, not x

[^aeiouy]    ?    ?    ? //x,k     Not an English vowel, not e

[a^eiouy]    ?    ?    ? //a,^,o,u   An Engish vowel or ^


下面是测试代码:

#include <iostream>
#include <regex>

using namespace std;

int main()
{
    const char* reg_esp = "^A*B+C?$";
    regex rgx(reg_esp);
    cmatch match;
    const char* target = "AAAAAAAAABBBBBBBBC";
    if(regex_search(target,match,rgx))
    {
        for(size_t a = 0;a < match.size();a++)
            cout << string(match[a].first,match[a].second) << endl;
    }
    else
        cout << "No Match Case !" << endl;
    return 0;
}