Preface
Previous developments required string matching filtering, which involved the function of regular expression.This article is a note taken by someone about the regex, a c++ regular expression library, and is welcome to correct any errors.
Introduction to Regex Library
Regex started with c++ 11.
c++ regular expressions provide the following main functions:
- Math: Compare the entire input to a regular expression.
- Search: Finds if a substring matches a regular expression.
- Tokenize: Divides according to a regular expression to get the desired target substring.
- Replace: Replace one or more substrings that match the regular expression.
Regex Library Use
Match Judgment
Let's start with a simple example
#include<regex> #include<iostream> int main(int argc, char *argv[]) { try { std::regex pattern("t_[^_]*_[^_]*_.*"); bool match = regex_match("t_123_345_456", pattern); if (match) { std::cout<< "match" <<std::endl; } else { std::cout << "not match" << std::endl; } } catch (const std::regex_error &e) { std::cout << "regex_error: what(): " << e.what() << std::endl; } return 0; }
Code execution results:
match
Here are the basic usage methods.First declare a regular expression:
std::regex pattern("t_[^_]*_[^_]*_.*");
Then call the matching method:
regex_match("t_123_345_456", pattern);
regex supports a variety of regular expression syntax, including ECMAScript, basic (BRE for POSIX), extended (ERE for POSIX), awk, grep, egrep.ECMAScript is used by default.You can also specify the grammar yourself, for example:
std::regex pattern("t_[^_]*_[^_]*_.*", std::regex_constants::grep);
Get matches
If you want to get matches, you need to use std::smatch, see the following code:
#include<regex> #include<iostream> int main(int argc, char *argv[]) { try { std::regex pattern("t_[^_]*_[^_]*_.*"); std::smatch m; bool match = regex_match(std::string("t_123_345_456"), m, pattern); std::cout << "m.empty():" << m.empty() << std::endl; std::cout << "m.size():" << m.size() << std::endl; if (match) { std::cout << "match" << std::endl; std::cout << "m.str():" << m.str() << std::endl; std::cout << "m.position():" << m.position() << std::endl; std::cout << "m.length():" << m.length() << std::endl; } else { std::cout << "not match" << std::endl; } } catch (const std::regex_error &e) { std::cout << "regex_error: what(): " << e.what() << std::endl; } return 0; }
Execution results:
m.empty():0 m.size():1 match m.str():t_123_345_456 m.position():0 m.length():13
Regex_match puts the result of the match in regex_match, as defined in the header file:
typedef match_results<const char*> cmatch; typedef match_results<string::const_iterator> smatch;
From the result of program execution, it returns information matching the entire string, including the matched string, starting position, length, and so on.
Get matching substrings
What if I just want to get a matching substring?Let's start with the following example:
#include<regex> #include<iostream> int main(int argc, char *argv[]) { try { std::regex pattern("t_([^_]*)_([^_]*)_(.*)"); std::smatch m; bool match = regex_match(std::string("t_123_345_456"), m, pattern); std::cout << "m.empty():" << m.empty() << std::endl; std::cout << "m.size():" << m.size() << std::endl; if (match) { std::cout << "match" << std::endl; std::cout << "m.str():" << m.str() << std::endl; std::cout << "m.position():" << m.position() << std::endl; std::cout << "m.length():" << m.length() << std::endl; for (int i=0; i < m.size(); ++i) { std::cout << " m[" << i << "].str():" << m[i].str() << std::endl; std::cout << " m[" << i << "].position():" << m.position(i) << std::endl; std::cout << " m[" << i << "].length():" << m.length(i) << std::endl; } } else { std::cout << "not match" << std::endl; } } catch (const std::regex_error &e) { std::cout << "regex_error: what(): " << e.what() << " code:" << e.code() << std::endl; } return 0; }
Program execution output:
m.empty():0 m.size():4 match m.str():t_123_345_456 m.position():0 m.length():13 m[0].str():t_123_345_456 m[0].position():0 m[0].length():13 m[1].str():123 m[1].position():2 m[1].length():3 m[2].str():345 m[2].position():6 m[2].length():3 m[3].str():456 m[3].position():10 m[3].length():3
The concept of grouping is used here, grouping with (), which parts of the rule need to be included with ().
Regular expressions in the code:
"t_([^_]*)_([^_]*)_(.*)"
In my example, I am interested in three sets of arrays, so three substrings are matched in the regular expression.
From the result, you can see that the first element of the smatch is the matching information for the entire string, and from the second element you need to group the corresponding substrings.
Giant pit
When I compile the code using gcc-4.8.5, there are no errors, but when the regular expression contains [], an exception is thrown during execution:
code: 4 what: regex_error
I suspected that my regular expression was written incorrectly. Later, I found that although there was regex header file in versions below gcc-4.9.0, the GCC was very rude and not implemented. The syntax was fully supported, but the library did not keep up with it, so there was no problem compiling, but an exception would be thrown directly when running.
The example above is the result of spending a lot of time upgrading the installation to 4.9.0 to work properly.Hold for a long time!