C++ Regular Expression regex Initial Exploration and Stepping Points

Keywords: Programming ECMAScript

Preface

Previous developments required string matching filtering, which involved the function of regular expression.This article is a note taken by someone about the regex, a c++ regular expression library, and is welcome to correct any errors.

Introduction to Regex Library

Regex started with c++ 11.
c++ regular expressions provide the following main functions:

  1. Math: Compare the entire input to a regular expression.
  2. Search: Finds if a substring matches a regular expression.
  3. Tokenize: Divides according to a regular expression to get the desired target substring.
  4. Replace: Replace one or more substrings that match the regular expression.

Regex Library Use

Match Judgment

Let's start with a simple example

#include<regex>
#include<iostream>

int main(int argc, char *argv[]) 
{
    try {
        std::regex pattern("t_[^_]*_[^_]*_.*");
        bool match = regex_match("t_123_345_456", pattern);
        if (match) {
            std::cout<< "match" <<std::endl;
        } else {
            std::cout << "not match" << std::endl;
        }
    }
    catch (const std::regex_error &e) {
        std::cout << "regex_error: what(): " << e.what() << std::endl;
    }
    return 0;
}

Code execution results:
match

Here are the basic usage methods.First declare a regular expression:

std::regex pattern("t_[^_]*_[^_]*_.*");

Then call the matching method:

 regex_match("t_123_345_456", pattern);
 
If there are exceptions to regular expressions or matches, an exception to regex_error is thrown, which is derived from std::runtime_error.
regex supports a variety of regular expression syntax, including ECMAScript, basic (BRE for POSIX), extended (ERE for POSIX), awk, grep, egrep.ECMAScript is used by default.You can also specify the grammar yourself, for example:
std::regex pattern("t_[^_]*_[^_]*_.*", std::regex_constants::grep);

 

Get matches

If you want to get matches, you need to use std::smatch, see the following code:

#include<regex>
#include<iostream>

int main(int argc, char *argv[]) 
{
	try
		{
		std::regex pattern("t_[^_]*_[^_]*_.*");
		std::smatch m;
		bool match = regex_match(std::string("t_123_345_456"), m, pattern);
		std::cout << "m.empty():" << m.empty() << std::endl;
		std::cout << "m.size():" << m.size() << std::endl;
		if (match) {
			std::cout << "match" << std::endl;
			std::cout << "m.str():" << m.str() << std::endl;
			std::cout << "m.position():" << m.position() << std::endl;
			std::cout << "m.length():" << m.length() << std::endl;			
		} else {
			std::cout << "not match" << std::endl;
		}
	}
	catch (const std::regex_error &e)
	{
		std::cout << "regex_error: what(): " << e.what() << std::endl;
	}
	return 0;
}

Execution results:

m.empty():0
m.size():1
match
m.str():t_123_345_456
m.position():0
m.length():13

Regex_match puts the result of the match in regex_match, as defined in the header file:

typedef match_results<const char*> cmatch; 
typedef match_results<string::const_iterator> smatch;
You can see that two special case templates are declared in the header file, one for c-string and the other for c+++ string.Because I'm using smatch, the first argument in the second example when calling regex_match is to construct a string object.
From the result of program execution, it returns information matching the entire string, including the matched string, starting position, length, and so on.

Get matching substrings

What if I just want to get a matching substring?Let's start with the following example:

#include<regex>
#include<iostream>

int main(int argc, char *argv[])
{
    try
        {
        std::regex pattern("t_([^_]*)_([^_]*)_(.*)");
        std::smatch m;
        bool match = regex_match(std::string("t_123_345_456"), m, pattern);
        std::cout << "m.empty():" << m.empty() << std::endl;
        std::cout << "m.size():" << m.size() << std::endl;
        if (match) {
            std::cout << "match" << std::endl;
            std::cout << "m.str():" << m.str() << std::endl;
            std::cout << "m.position():" << m.position() << std::endl;
            std::cout << "m.length():" << m.length() << std::endl;

            for (int i=0; i < m.size(); ++i) {
                std::cout << "    m[" << i << "].str():" << m[i].str() << std::endl;
                std::cout << "    m[" << i << "].position():" << m.position(i) << std::endl;
                std::cout << "    m[" << i << "].length():" << m.length(i) << std::endl;
            }

        } else {
            std::cout << "not match" << std::endl;
        }
    }
    catch (const std::regex_error &e)
    {
        std::cout << "regex_error: what(): " << e.what() << " code:" << e.code() << std::endl;
    }
    return 0;
}

Program execution output:

m.empty():0
m.size():4
match
m.str():t_123_345_456
m.position():0
m.length():13
    m[0].str():t_123_345_456
    m[0].position():0
    m[0].length():13
    m[1].str():123
    m[1].position():2
    m[1].length():3
    m[2].str():345
    m[2].position():6
    m[2].length():3
    m[3].str():456
    m[3].position():10
    m[3].length():3

The concept of grouping is used here, grouping with (), which parts of the rule need to be included with ().
Regular expressions in the code:

"t_([^_]*)_([^_]*)_(.*)"

In my example, I am interested in three sets of arrays, so three substrings are matched in the regular expression.
From the result, you can see that the first element of the smatch is the matching information for the entire string, and from the second element you need to group the corresponding substrings.

Giant pit

When I compile the code using gcc-4.8.5, there are no errors, but when the regular expression contains [], an exception is thrown during execution:

code: 4
what: regex_error

I suspected that my regular expression was written incorrectly. Later, I found that although there was regex header file in versions below gcc-4.9.0, the GCC was very rude and not implemented. The syntax was fully supported, but the library did not keep up with it, so there was no problem compiling, but an exception would be thrown directly when running.
The example above is the result of spending a lot of time upgrading the installation to 4.9.0 to work properly.Hold for a long time!

Posted by geo__ on Fri, 21 Feb 2020 08:14:26 -0800