ES6 learning Chapter 5 regular extension

Keywords: Javascript ECMAScript

preface

This chapter introduces the extension of regular. Some knowledge that is not commonly used can be understood.
Link to the original text of this chapter: Regular extension

RegExp constructor

Starting from ES6, if the first parameter of RegExp constructor is a regular object and the second flag exists and is a flag parameter, TypeError will not be thrown, and a new regular expression will be created using these parameters. The original regular expression modifier will be ignored

const flag = new RegExp(/[0-9]/ig, 'i').flags; // The original modifier Wei ig was replaced by i
console.log(flag); // i

String about regular expressions

ES6 changes all the four regular expression methods on the previous string to RegExp instance methods, so now all regular expression related methods are defined on RegExp objects.

  • String.prototype.match call RegExp.prototype[Symbol.match]
  • String.prototype.replace call RegExp.prototype[Symbol.replace]
  • String.prototype.search call RegExp.prototype[Symbol.search]
  • String.prototype.split call RegExp.prototype[Symbol.split]

flags attribute

RegExp.prototype.flags property is a new property in ES6, which will return the modifier of regular expression.

const SAMPLEREG = /abc/ig;
console.log(SAMPLEREG.flags); // gi

u modifier

The u modifier is added in ES6 to indicate that the pattern of Unicode code is used for matching. Process Unicode characters larger than \ uFFFF

be careful

Once the u modifier is added, the behavior of the following regular expressions will be modified.

  1. Dot character

For Unicode characters with code point greater than 0xFFFF, the point character cannot be recognized and must be added with u modifier.

  1. **Unicode * * character representation

The use of curly braces to represent Unicode characters is added. This representation must add a u modifier in the regular expression to recognize the curly braces, otherwise it will be interpreted as a quantifier.

  1. classifier

After using the u modifier, all quantifiers will correctly identify Unicode characters with code points greater than 0xFFFF.

  1. Predefined mode

The u modifier also affects the predefined mode and whether Unicode characters with code points greater than 0xFFFF can be correctly recognized.

  1. i modifier

The encoding of some Unicode characters is different, but the font is very similar. For example, \ u004B and \ u212A are both uppercase K

  1. Escape

If there is no u modifier, the escape (such as comma escape \,) not defined in the regular is invalid, and an error will be reported in the U mode.

unicode properties

The RegExp.prototype.unicode property indicates that the regular expression has a "u" flag. Unicode is a read-only property of a separate instance of a regular expression.

const SAMPLEREG = /abc/u;

console.log(SAMPLEREG.flags); // u
console.log(SAMPLEREG.unicode); // true

Unicode attribute class

**Unicode property escapes**
ES2018 introduces a new class writing method \ P {...} and \ ` P {...} to solve the problem that JavaScript does not match different words in a strong and effective way. Regular expressions are allowed to match all characters that match some attribute of Unicode.

\p{Unicode Attribute name=Unicode Attribute value}
// For some attributes, you can write only the attribute name or only the attribute value.
\p{Unicode Attribute value}
\p{Unicode Attribute name}

// \P is the inverse of \ p
\P{Unicode Attribute value}
\P{Unicode Attribute name}

be careful:
These two types are only valid for Unicode, so you must add the u modifier when using them.
\P {...} is the reverse matching of \ P {...}, that is, matching characters that do not meet the conditions.

const SAMPLEREG = /\p{Script=Greek}/u;
SAMPLEREG.test('π'); // true

y modifier

Function of y modifier

In ES6, the y modifier is added to indicate that the "sticky" search is performed, and the matching starts from the current position of the target string.

The y modifier is similar to the g modifier, which is a global match. The last match starts from the next position where the last match was successful.
The difference is: g modifier as long as there is a match in the remaining positions; The y modifier must match from the first remaining position.

// The difference between y modifier and g modifier
const SAMPLE = 'abcdabcd';
const SAMPLEREG1 = /abcd/g;
const SAMPLEREG2 = /abcda/y;

console.log(SAMPLEREG1.test(SAMPLE)); // true
console.log(SAMPLEREG2.test(SAMPLE)); // true
console.log(SAMPLEREG1.test(SAMPLE)); // true
console.log(SAMPLEREG2.test(SAMPLE)); // false

be careful

In fact, the y modifier implies the header matching flag ^.

const SAMPLEREGGY = /ab/gy;
const SAMPLEREGY = /ab/y;

let sample1 = 'ababcabcd'.replace(SAMPLEREGGY, '-'); 
let sample2 = 'ababcabcd'.replace(SAMPLEREGY, '-');

// The last ab will not be replaced because it does not appear in the next matching header.
console.log(sample1);
// Only the first match can be returned and must be used in conjunction with the g modifier to return all matches.
console.log(sample2);

sticky attribute

RegExp.prototype.sticky indicates whether the y modifier is set. Sticky is a read-only property of a regular expression object.

const SAMPLEREG = /a/gy;
console.log(SAMPLEREG.sticky); // true

s modifier

ES2018 introduces the s modifier so that. Can match any single character. Include line terminator character.

Line terminator

The so-called line terminator is that the character represents the end of a line. The following four characters are line terminators.

  • U+000A line break (\ n)
  • U+000D carriage return (\ r)
  • U+2028 line separator
  • U+2029 segment separator
const SAMPLEREG = /ab.cd/s;
console.log(SAMPLEREG.test('ab\ncd') ); // true

dotAll

The above situation is called * * dotAll * * mode, that is, dot represents all characters. Regular expressions also introduce a * * dotAll * * attribute
The dotAll property returns a boolean indicating whether the "s" modifier is used together in regular expressions. dotAll is a read-only attribute that belongs to a single regular expression instance.

    const SAMPLEREG = /ab.cd/s;
    const sample = SAMPLEREG.test('ab\ncd');
    console.log(SAMPLEREG.flags); // s
    console.log(SAMPLEREG.dotAll); // true

Post assertion

After the introduction of ES2018, it is asserted that V8 engine version 4.9 (Chrome 62) has been supported.

  • Antecedent assertion
    x matches only before y and must be written as / x(?=y) /.
    For example, only match the number before the percent sign, which should be written as / \ d + (? =%) /.
  • First negative assertion,
    x matches only if it does not precede y and must be written as / x(?!y) /.
    For example, to match only numbers that do not precede the percent sign, write / \ d + (?!%) /.
  • The latter assertion is the opposite of the previous assertion,
    X matches only after y and must be written as / (? < = y) x /.
    For example, only the number after the dollar sign should be written as / (? < = \ $) \ D + /.
  • The subsequent negative assertion is opposite to the previous negative assertion,
    X matches only if it is not after y. it must be written as / (? <! Y) x /.
    For example, to match only numbers that do not follow the dollar sign, write / (? <! \ $) \ D + /.

The subsequent assertion needs to match the X of / (? < = y) x /, and then go back to the left to match the part of Y. The order is from right to left,

// Antecedent assertion
const sample1 = /\d+(?=%)/.exec('100% of US presidents have been male');
// Antecedent negative assertion
const sample2 = /\d+(?!%)/.exec('that's all 44 of them');
console.log(sample1);  // 100
console.log(sample2);  // 44

// Post assertion
const sample3 = /(?<=\$)\d+/.exec('Benjamin Franklin is on the $100 bill');
// Subsequent negative assertion
const sample4 = /(?<!\$)\d+/.exec('it's is worth about €90');
console.log(sample3);  // 100
console.log(sample4);  // 90

Group matching

The parentheses of regular expressions indicate group matching, and the patterns in parentheses can be used to match the contents of groups.

ES2018 introduces Named Capture Groups, which allows you to specify a name for each group match, which is easy to read code and reference.
Named groups are matched inside parentheses, and "question mark + angle bracket + group name" (? < year >) is added to the head of the pattern. Then the group name can be referenced on the groups attribute of the result returned by the exec method. At the same time, the digital serial number is still valid.

const sampleUsers = `
Surname Liu, name Bei, character Xuande
 Surname Guan, name Yu, character Yun Chang
 Surname: Zhang Mingfei character: Yide`;
const SAMPLEREG = /surname(?<surnames>.+)name(?<name>.+)word(?<word>.+)/g;
let result = SAMPLEREG.exec(sampleUsers);

do { console.log(`${result.groups.surnames}${result.groups.name}${result.groups.surnames}${result.groups.word}`);
} while ((result = SAMPLEREG.exec(sampleUsers)) !== null);

/*
* Liu Bei, Liu Xuande
* Guan Yu, Guan Yunchang
* Zhang Fei, Zhang Yide
*/

In the above code:< XXX > is used to define a group name for this matching. You can view the matching group name in the matching groups attribute. Here, you can use deconstruction assignment to directly assign values to variables from the matching results.

Note: if you want to reference a named group match inside the regular expression, you can use the writing method of \ K < group name >

matchAll()

ES2020 adds the String.prototype.matchAll() method, which can take out all matches at one time. However, it returns an Iterator instead of an array.

const string = 'sample1sample2sample3';
const regex = /sample/g;

for (const match of string.matchAll(regex)) {
  console.log(match);
}
// Traversal output
/*
['sample', index: 0, input: 'sample1sample2sample3', groups: undefined]
['sample', index: 7, input: 'sample1sample2sample3', groups: undefined]
['sample', index: 14, input: 'sample1sample2sample3', groups: undefined]
*/

Posted by paulmo on Wed, 01 Dec 2021 07:04:10 -0800