Learning regular expressions through examples

Keywords: Javascript PHP IE

Determine if your mailbox is correct

Mailbox Regular

/\w+[\w\.]*@[\w\.]+\.\w+/

test case

const regex = /\w+[\w\.]*@[\w\.]+\.\w+/

regex.test('666@email.com')            // true
regex.test('july@e.c')                 // true               
regex.test('_@email.com.cn')           // true
regex.test('july_1234@email.com')      // true

regex.test('@email.com')               // false
regex.test('julyemail.com')            // false
regex.test('july.email.com')           // false
regex.test('july@')                    // false
regex.test('july@email')               // false
regex.test('july@email.')              // false
regex.test('july@.')                   // false
regex.test('july@.com')                // false
regex.test('-~!#$%@email.com')         // false

Regular Explanation

  1. \w.\w is a predefined pattern that matches any letter, number, and underscore. Click to view other predefined modes.
  2. +, *.+, * and?They are called quantifiers in regular expressions.+for one or more times, * for zero or more times,?Represents zero or one time.
  3. \..Called metacharacters in regular expressions, it matches all characters except carriage return (\r), line break (\n), line separator (\u2028), and segment separator (\u2029).Because metacharacters have special meanings, if you want to match the metacharacter itself, you need to use an escape character, which is preceded by a backslash (\). Click to view other metacharacters
  4. [\w\].[] denotes a character set, such as [J u l y], which does not mean matching the entire word, but a character set of j, u, l, and y, where matching succeeds by matching only one of the letters. Click to view character set details
  5. Overview

Match URL Address

URL Regular

/https?:\/\/(\w*:\w*@)?[-\w\.]+(:\d+)?(\/([\w\/\.]*(\?\S+)?)?)?/

test case

const regex = /https?:\/\/(\w*:\w*@)?[-\w\.]+(:\d+)?(\/([\w\/\.]*(\?\S+)?)?)?/

regex.test('http://www.forta.com/blog')                    // true
regex.test('https://www.forta.com:80/blog/index.cfm')      // true
regex.test('https://www.forta.com')                        // true
regex.test('http://ben:password@www.forta.com/')           // true
regex.test('http://localhost/index.php?ab=1&c=2')          // true
regex.test('http://localhost:8500/')                       // true

Regular Explanation

  1. ().Expressions like (\w:\w*@) are called subexpressions, which match bracketed expressions as a whole instead of matching only one character in the set when the character set [] matches.For example (:\d+) matches a string like':8080', while [:d] matches a: or a number.
  2. Overview

Practice

Remove all comments from the html file

html file

I freely wrote an html file locally, which contains three parts: css, html and js. It is a complete web page.

<!DOCTYPE html>
<html lang="en">

<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <meta http-equiv="X-UA-Compatible" content="ie=edge">
  <title>Document</title>
  <style>
    /* 
      This is a multi-line comment for css
      This is a multi-line comment for css This is a multi-line comment for css
    */
    * {
      margin: 0;
      padding: 0;
      /* This is a multi-line comment for css */
    }

    html {
      color: #000;
    }
  </style>
</head>

<body>
  <h1>h1 title</h1> <!-- This is html Notes -->

  <h2>h2 title</h2>
  <!--  This is html Notes -->
  <button style="/* This is a comment for inline css */color: red/* This is a comment for inline css */">click me</button>
  <button onclick="/* This is a comment for inline js */alert('july')/* This is a comment for inline js */">click me 2</button>
  <!--  
    //This is an html comment 
  -->
  <button onclick="// This is a comment for inline js
    alert(/* Notes */'july')
    //This is the comment'sdfs'for inline js, "style='color: blue' > Click me 3</button>
  <!--  This is html Notes -->

  <script>
    // This is a js single line comment
    // This is a js single line comment This is a js single line comment 
    function func() {
      console.log('test');  // This is a js single line comment
    }

    /*
     * This is a js multiline comment
     * This is a js multiline comment This is a js multiline comment
     */
    function func2() {
      // This is a js single line comment
      /*
        This is a js multiline comment
      */
      console.log('test');  /* 
      This is a js multiline comment */
    } 
  </script>
</body>

</html>

matching

const htmlStr = `html Character string`;   // Copy the above html content here, because it is too long, it will not be copied

// Match /* */
htmlStr.match(/\/\*[^]*?\*\//g); //The line of code returns an array with a length of 10 and each element of the array corresponds to the matched /* */, which is not shown here due to limited space

// Match <!-->
htmlStr.match(/<!--[^]*?-->/g);

// Match//
htmlStr.match(/(\/\/.*?(?=(["']\s*\w+\s*=)|(["']\s*>)))|(\/\/.*)/g);

Analysis

  1. g Global modifier.g is a modifier of a regular expression, indicating a global match or search, since there are multiple comments in html, a global modifier is required ( Click to see all modifiers).
  2. [^].^ is called caret. My understanding is to take the opposite as [^a b c] means that all characters except a, b and c can match.[^] Matches any character, including line breaks.
  3. Non-greedy mode.Quantifiers are matched by default in greedy mode, such as [^]* above, which means matching 0 or more arbitrary characters. Since greedy mode, as many arbitrary characters as possible are matched until the condition is not met.By adding one after [^]*?No. becomes a non-greedy mode, in which once the conditions are met, no further matching occurs.To actually see the difference between the two modes, match the regular expression /* */ above?Remove and execute to see how the results are different.
  4. Look forward.Looking forward is one of them?=Subexpression at the beginning.Examples illustrate what this means, such as the protocol part of the URL that we want to match: https://www.forta.com , the regular: /. +(?=:)/, (?=:) is a forward search, which means that as long as it matches:, the previous content will be returned,: it does not need to return itself.
  5. The first two notes are easier to match, and the third, the //comment, is more complex.In fact, for a //comment, /\///. */ in most cases can be matched, but there are two cases that cannot be met, see the code below
<button onclick="
    alert('july')
    //This is the comment'sdfs'for inline js, "style='color: blue' > Click me 3</button>
    
<button onclick="
    alert('july')
    //This is the comment'sdfs'for inline js'> click me 3</button>

Let's take a closer look at the picture

Final Code

For convenience, the final code is chosen to execute in the node environment because the initial requirement was to remove all comments from the html, so we used the replace method of strings, which takes two parameters, the first being a regular expression and the second being what needs to be replaced.

const fs = require('fs');

// regex.html is the HTML source file placed in the same directory
fs.readFile('./regex.html', 'utf8', (err, data) => {
  if (err) throw err;

  console.log(
    data
      .replace(/\/\*[^]*?\*\//g,'') //Replace/* */
      .replace(/<!--[^]*?-->/g, '')     // Replace <!-->
      .replace(/(\/\/.*?(?=(["']\s*\w+\s*=)|(["']\s*>)))|(\/\/.*)/g, '')  // Replace// 
  );
});

Reference data or blogs

Regular Expressions Are Always Knowable
Ruan Yifeng javascript tutorial

Posted by SFDonovan on Fri, 23 Aug 2019 20:08:20 -0700