Regular Expression

Table of Contents

1 General

symbol meaning
. everything
\b start or end of a word
\B not \b
\d digit
\D not \d
\s whitespace
\S not \s
\w alphabet,digit, _
\W not \w
{8} repeat 8 times
{5,8} repeat 5 - 8 times
{5,} repeat 5+ times
* 0+
+ 1+
? {0,1}
^ start of whole string
$ end of whole string
[] any one of them
[^] any one other than them
(?=ABC) Positive look ahead
(?!ABC) Negative look ahead
(?<=ABC) Positive look behind
(?<!ABC) Negative look behind
() capturing group, \1, \2 to back-refer
(?:) non-capturing group

ahead means the string AFTER it, and behind means the string BEFORE it.

2 Language Specific

2.1 C++

2.1.1 Usage

#include <iostream>
#include <string>
#include <regex>

2.1.2 construction

  std::regex color_regex("([a-f0-9]{2})");

2.1.3 Search

  std::string line = "xxx";
  std::regex_search(line, color_regex);
  std::smatch color_match;
  std::regex_search(line, color_match, color_regex);
  for (int i=0;i<color_match.size();i++) {
    // the first is entire match
    // the followings are () matches
    color_match[i]; // string
  }

2.1.4 Match

std::regex_match(fname, base_match, base_regex);
// The first sub_match is the whole string; the next
// sub_match is the first parenthesized expression.
if (std::regex_match(fname, pieces_match, pieces_regex)) {
  std::cout << fname << '\n';
  for (size_t i = 0; i < pieces_match.size(); ++i) {
    std::ssub_match sub_match = pieces_match[i];
    std::string piece = sub_match.str();
    // can also piece = sub_match, implicit convertion
  }   
}  

2.1.5 replace

std::regex_replace(s, reg, "");

2.2 Java

Pattern p = Pattern.compile("a*b");
Matcher m = p.matcher("aaaaab");
boolean b = m.matches();
m.group(3); // => String

2.3 Python

2.3.1 construction

import re
pattern = re.compile('\d+.*$')

2.3.2 match

s = 'this is a test string'
pattern.match(s) # return True or False

2.3.3 search

pattern.findall(s)

2.3.4 shorthand

m = re.match("[pattern]", "string")
m.group() # 匹配的字符串
m = re.search("[pattern]", "string")
m.group()
re.search("pattern", "string", re.IGNORECASE)
m = re.findall("[pattern]", "string")