Regular Expression
Table of Contents
1 General
symbol | meaning |
---|---|
. |
everything |
\b |
start or end of a word |
\B |
not \b |
\d |
digit |
\D |
not \d |
\s |
whitespace |
\S |
not \s |
\w |
alphabet,digit, _ |
\W |
not \w |
{8} |
repeat 8 times |
{5,8} |
repeat 5 - 8 times |
{5,} |
repeat 5+ times |
* |
0+ |
+ |
1+ |
? |
{0,1} |
^ |
start of whole string |
$ |
end of whole string |
[] |
any one of them |
[^] |
any one other than them |
(?=ABC) |
Positive look ahead |
(?!ABC) |
Negative look ahead |
(?<=ABC) |
Positive look behind |
(?<!ABC) |
Negative look behind |
() |
capturing group, \1, \2 to back-refer |
(?:) |
non-capturing group |
ahead means the string AFTER it, and behind means the string BEFORE it.
2 Language Specific
2.1 C++
2.1.1 Usage
#include <iostream> #include <string> #include <regex>
2.1.2 construction
std::regex color_regex("([a-f0-9]{2})");
2.1.3 Search
std::string line = "xxx"; std::regex_search(line, color_regex); std::smatch color_match; std::regex_search(line, color_match, color_regex); for (int i=0;i<color_match.size();i++) { // the first is entire match // the followings are () matches color_match[i]; // string }
2.1.4 Match
std::regex_match(fname, base_match, base_regex); // The first sub_match is the whole string; the next // sub_match is the first parenthesized expression. if (std::regex_match(fname, pieces_match, pieces_regex)) { std::cout << fname << '\n'; for (size_t i = 0; i < pieces_match.size(); ++i) { std::ssub_match sub_match = pieces_match[i]; std::string piece = sub_match.str(); // can also piece = sub_match, implicit convertion } }
2.1.5 replace
std::regex_replace(s, reg, "");
2.2 Java
Pattern p = Pattern.compile("a*b"); Matcher m = p.matcher("aaaaab"); boolean b = m.matches(); m.group(3); // => String
2.3 Python
2.3.1 construction
import re pattern = re.compile('\d+.*$')
2.3.2 match
s = 'this is a test string' pattern.match(s) # return True or False
2.3.3 search
pattern.findall(s)
2.3.4 shorthand
m = re.match("[pattern]", "string") m.group() # 匹配的字符串 m = re.search("[pattern]", "string") m.group() re.search("pattern", "string", re.IGNORECASE) m = re.findall("[pattern]", "string")