正则表达式简介
1 元字符描述
元字符 | 含义 |
* | 匹配零次或任意多次前置字符,例如: |
ab*.c match: a.c ab.c abb.c abbb.c | |
[A-Za-z0-9 ]*.c match: Xx.c GTAGS | |
+ | 匹配一次或多次前置字符,例如: |
ab+.c match: ab.c, abb.c, abbb.c | |
? | 匹配零次或一次前置字符,例如: |
ab?.c match: a.c, ab.c | |
{n} | 匹配n次前置字符,例如: |
ab{2}.c match: abb.c | |
{m,} | 匹配前置字符至少m次 |
{m,n} | 匹配前置字符[m,n]次 |
[] | 匹配[]中的任一字符一次,例如: |
[a-z0-9]x.c match: bx.c, 5x.c | |
[^] | 匹配不在[]中的字符一次,例如: |
[^b-z]x.c match: ax.c, 8x.c | |
() | 匹配子字符串 |
2 示例
3 WIKI摘抄
3.1 POSIX basic and extended
- ^
- Matches the starting position within the string. In line-based tools, it
matches the starting position of any line.
- .
- Matches any single character (many applications exclude newlines, and
exactly which characters are considered newlines is flavor-, character-encoding-, and platform-specific, but it is safe to assume that the line feed character is included). Within POSIX bracket expressions, the dot character matches a literal dot. For example, a.c matches "abc", etc., but [a.c] matches only "a", ".", or "c".
[ ]
:: A bracket expression. Matches a single character that is contained
within the brackets. For example, [abc] matches "a", "b", or "c". [a-z] specifies a range which matches any lowercase letter from "a" to "z". These forms can be mixed: [abcx-z] matches "a", "b", "c", "x", "y", or "z", as does [a-cx-z]. The - character is treated as a literal character if it is the last or the first (after the ^, if present) character within the brackets: [abc-], [-abc]. Note that backslash escapes are not allowed. The ] character can be included in a bracket expression if it is the first (after the ^) character: []abc].
- [^ ]
- Matches a single character that is not contained within the brackets.
For example, [^abc] matches any character other than "a", "b", or "c". [^a-z] matches any single character that is not a lowercase letter from "a" to "z". Likewise, literal characters and ranges can be mixed.
- $
- Matches the ending position of the string or the position just before a
string-ending newline. In line-based tools, it matches the ending position of any line.
- ( )
- Defines a marked subexpression. The string matched within the
parentheses can be recalled later (see the next entry, \n). A marked subexpression is also called a block or capturing group. BRE1 mode requires \( \).
- \n
- Matches what the nth marked subexpression matched, where n is a digit
from 1 to 9. This construct is vaguely defined in the POSIX.2 standard. Some tools allow referencing more than nine capturing groups.
- *
- Matches the preceding element zero or more times. For example, ab*c
matches "ac", "abc", "abbbc", etc. [xyz]* matches "", "x", "y", "z", "zx", "zyx", "xyzzy", and so on. (ab)* matches "", "ab", "abab", "ababab", and so on.
- {m,n}
- Matches the preceding element at least m and not more than n times.
For example, a{3,5} matches only "aaa", "aaaa", and "aaaaa". This is not found in a few older instances of regexes. BRE mode requires \{m,n\}.
3.2 Glob
4 References
- WIKI Regular Expression
- https://en.wikipedia.org/wiki/Regular_expression
脚注:
BRE: Basic Regular Expressions.