Vue source code analysis: the history of template compilation

Keywords: REST Attribute Vue IE

In the last section, we talked about the general process of vue template analysis. In this section, we will elaborate on the specific content of template compilation? In fact, template compilation mainly includes html parsing and text parsing.

// Code location: src/complier/parser/index.js

/**
 * Convert HTML string to AST.
 * Convert HTML template string to AST
 */
export function parse(template, options) {
   // ...
  parseHTML(template, {
    warn,
    expectHTML: options.expectHTML,
    isUnaryTag: options.isUnaryTag,
    canBeLeftOpenTag: options.canBeLeftOpenTag,
    shouldDecodeNewlines: options.shouldDecodeNewlines,
    shouldDecodeNewlinesForHref: options.shouldDecodeNewlinesForHref,
    shouldKeepComment: options.comments,
    // This function is called when the start tag is resolved
    start (tag, attrs, unary) {

    },
    // This function is called when the end tag is resolved
    end () {

    },
    // This function is called when the text is parsed
    chars (text) {

    },
    // This function is called when the comment is parsed
    comment (text) {

    }
  })
  return root
}

The parsing of html mainly depends on the parse function and receives two parameters. The first parameter is the direct content of < template > < / template >. The second parameter is options, which is the parameter for parsing html. At the same time, four hook functions are defined: the main function is to extract the content of template string, and then convert it into AST tree. Pass these four hook functions to parseHTML, and call the hook functions when parsing different contents.
When the tag element is resolved, the start function is called to generate the element type to the AST node.

// When parsing to the start of the tag, start is triggered
start (tag, attrs, unary) {
	let element = createASTElement(tag, attrs, currentParent)
}

export function createASTElement (tag,attrs,parent) {
  return {
    type: 1,
    tag,
    attrsList: attrs,
    attrsMap: makeAttrsMap(attrs),
    parent,
    children: []
  }
}

Through three parameters, tag and attribute, whether to close or not, call the createelement() method to create the tag to the AST tree.

When the end tag is resolved, the end function is called.

When parsing to text, the chars function is called:

chars (text) {
	if(text Is dynamic text with variables){
    let element = {
      type: 2,
      expression: res.expression,
      tokens: res.tokens,
      text
    }
  } else {
    let element = {
      type: 3,
      text
    }
  }
}

Therefore, the function will first judge whether it is a dynamic text, and create a dynamic AST node, otherwise, create a static AST node.

When the comment is resolved, the comment is called:

// When the comment of the tag is resolved, the comment is triggered
comment (text: string) {
  let element = {
    type: 3,
    text,
    isComment: true
  }
}

The AST node is created while parsing. This is what the HTML parser does.

How to analyze different contents?

Generally speaking, vue parsing content is as follows:

  • Text, such as "hello world"
  • HTML comments, such as <! -- I'm a comment -- >
  • Conditional comments, such as <! -- [if! Ie] > -- I'm a comment <! --! [ENDIF] -- >
  • DOCTYPE, e.g. <! DOCTYPE HTML >
  • Start tag, such as < div > for example
  • End tag, e.g. < / div >

(1)html annotation parsing: start with <! -- and end with -- > through regular matching, then the content is the annotation content.  

const comment = /^<!\--/
if (comment.test(html)) {
  // If it is a comment, continue to find if '-- >'
  const commentEnd = html.indexOf('-->')

  if (commentEnd >= 0) {
    // If '-- >', continue to judge whether to keep the comment in options
    if (options.shouldKeepComment) {
      // If the comment is retained, the comment is intercepted and passed to the options.comment To create an AST node of annotation type
      options.comment(html.substring(4, commentEnd))
    }
    // If you don't keep the comment, move the cursor after '-- >' and continue parsing backward
    advance(commentEnd + 3)
    continue
  }
}

When the content of the comment is matched, the comment function will be called, and then parsing will start. If we set comments to true on < template > < / template >, we will keep the comments when parsing, thus creating the annotation AST node. The advance function is used to move the parsing cursor. After a part of parsing is completed, the cursor is moved backward to ensure that it will not be parsed repeatedly.

function advance (n) {
  index += n   // index is the parse cursor
  html = html.substring(n)
}

(2) Comment on parsing conditions

First use regular to judge whether it starts with the unique beginning identifier of the conditional annotation, and then looks for its unique end identifier. If it is found, it indicates that it is a conditional annotation, and it can be intercepted. Since the conditional annotation does not exist in the real DOM tree, it is not necessary to call the hook function to create AST nodes.

// Resolve whether it is a conditional comment
const conditionalComment = /^<!\[/
if (conditionalComment.test(html)) {
  // If it is a conditional comment, continue to search for the existence of '] >'
  const conditionalEnd = html.indexOf(']>')

  if (conditionalEnd >= 0) {
    // If there is'] > ', the conditional comment is truncated from the original html string,
    // Reassign the rest of the content to the html and continue to match backward
    advance(conditionalEnd + 2)
    continue
  }
}

(3) Analysis of DOCTYPE

const doctype = /^<!DOCTYPE [^>]+>/i
// Resolve whether it is DOCTYPE
const doctypeMatch = html.match(doctype)
if (doctypeMatch) {
  advance(doctypeMatch[0].length)
  continue
}

(4) Analysis of tags

The parsing of tags is a little more complicated. Firstly, the template string is matched to the template string through regular matching to see whether the template string has the characteristics of starting tag

/**
 * Regular matching of start Tags
 */
const ncname = '[a-zA-Z_][\\w\\-\\.]*'
const qnameCapture = `((?:${ncname}\\:)?${ncname})`
const startTagOpen = new RegExp(`^<${qnameCapture}`)

const start = html.match(startTagOpen)
if (start) {
  const match = {
    tagName: start[1],
    attrs: [],
    start: index
  }
}

// Template starting with start tag:
'<div></div>'.match(startTagOpen)  => ['<div','div',index:0,input:'<div></div>']
// Template starting with end tag:
'</div><div></div>'.match(startTagOpen) => null
// Template starting with text:
'I'm text</p>'.match(startTagOpen) => null

The above code matches an array of div tags. However, the start function of tag parsing needs three parameters, tag and attribute, whether it is closed or not, so it needs to be parsed. Therefore, when matching attributes, we will first cut off part of the start tag, leaving the part after the attribute

//    <div class="a" id="b"></div> ===> class="a" id="b">
const attribute = /^\s*([^\s"'<>\/=]+)(?:\s*(=)\s*(?:"([^"]*)"+|'([^']*)'+|([^\s"'=<>`]+)))?/
let html = 'class="a" id="b"></div>'
let attr = html.match(attribute)
console.log(attr)
// ["class="a"", "class", "=", "a", undefined, undefined, index: 0, input: "class="a" id="b"></div>", groups: undefined]

For example, if the div tag above has multiple attributes, it will match in a loop. Every time it is matched, the matched part will be cut off, and then continue to match the next one until the position is not satisfied.

const attribute = /^\s*([^\s"'<>\/=]+)(?:\s*(=)\s*(?:"([^"]*)"+|'([^']*)'+|([^\s"'=<>`]+)))?/
const startTagClose = /^\s*(\/?)>/
const match = {
 tagName: start[1],
 attrs: [],
 start: index
}
while (!(end = html.match(startTagClose)) && (attr = html.match(attribute))) {
 advance(attr[0].length)
 match.attrs.push(attr)
}

There's the tag. It's a matching of self closing tags. When the attribute in the previous step is resolved, the remaining closing tag is. Then match by regular.

const ncname = '[a-zA-Z_][\\w\\-\\.]*'
const qnameCapture = `((?:${ncname}\\:)?${ncname})`
const startTagOpen = new RegExp(`^<${qnameCapture}`)
const startTagClose = /^\s*(\/?)>/


function parseStartTag () {
  const start = html.match(startTagOpen)
  // '<div></div>'.match(startTagOpen)  => ['<div','div',index:0,input:'<div></div>']
  if (start) {
    const match = {
      tagName: start[1],
      attrs: [],
      start: index
    }
    advance(start[0].length)
    let end, attr
    /**
     * <div a=1 b=2 c=3></div>
     * attrs is matched from < div until the end of the start tag '>'
     * After all the attributes are matched, the html string is left
     * Closing tag left: '/ >'
     * Non self closing tags are left: '> < / div >'
     */
    while (!(end = html.match(startTagClose)) && (attr = html.match(attribute))) {
      advance(attr[0].length)
      match.attrs.push(attr)
    }

    /**
     * Whether the label is a self closing label is judged here
     * For example: < input type = < text '/ >
     * Non self closing tags such as: < div > < / div >
     * '></div>'.match(startTagClose) => [">", "", index: 0, input: "></div>", groups: undefined]
     * '/><div></div>'.match(startTagClose) => ["/>", "/", index: 0, input: "/><div></div>", groups: undefined]
     * Therefore, we can judge whether the label is self closing by whether end[1] is a '/ "
     */
    if (end) {
      match.unarySlash = end[1]
      advance(end[0].length)
      match.end = index
      return match
    }
  }
}

After all parsing is completed, start can be called to parse. However, in vue source code, handleStartTag is called to process attribute array, and then start function is called to generate AST tree.

function handleStartTag (match) {
    const tagName = match.tagName
    const unarySlash = match.unarySlash

    if (expectHTML) {
      // ...
    }

    const unary = isUnaryTag(tagName) || !!unarySlash

    const l = match.attrs.length
    const attrs = new Array(l)
    for (let i = 0; i < l; i++) {
      const args = match.attrs[i]
      const value = args[3] || args[4] || args[5] || ''
      const shouldDecodeNewlines = tagName === 'a' && args[1] === 'href'
        ? options.shouldDecodeNewlinesForHref
        : options.shouldDecodeNewlines
      attrs[i] = {
        name: args[1],
        value: decodeAttr(value, shouldDecodeNewlines)
      }
    }

    if (!unary) {
      stack.push({ tag: tagName, lowerCasedTag: tagName.toLowerCase(), attrs: attrs })
      lastTag = tagName
    }

    if (options.start) {
      options.start(tagName, attrs, unary, match.start, match.end)
    }
  }

This is the beginning of parsing tags.

(5) Parse end tag

Parsing the end tag is relatively simple, because the end tag has no attributes and the like, and ends the tag by regular matching, then calls parseEndTag, and calls the end function to do the related processing.

const ncname = '[a-zA-Z_][\\w\\-\\.]*'
const qnameCapture = `((?:${ncname}\\:)?${ncname})`
const endTag = new RegExp(`^<\\/${qnameCapture}[^>]*>`)
const endTagMatch = html.match(endTag)

'</div>'.match(endTag)  // ["</div>", "div", index: 0, input: "</div>", groups: undefined]
'<div>'.match(endTag)  // null

if (endTagMatch) {
    const curIndex = index
    advance(endTagMatch[0].length)
    parseEndTag(endTagMatch[1], curIndex, index)
    continue
}

(6) Text analysis

let textEnd = html.indexOf('<')
// '<' in the first position, it is the other five types
if (textEnd === 0) {
    // ...
}
// '<' is not in the first position, at the beginning of the text
if (textEnd >= 0) {
    // If the html string does not start with '<', it means that the text in front of '<' is plain text and does not need to be processed
    // Then take out the content after '<' and assign it to rest
    rest = html.slice(textEnd)
    while (
        !endTag.test(rest) &&
        !startTagOpen.test(rest) &&
        !comment.test(rest) &&
        !conditionalComment.test(rest)
    ) {
        // < in plain text, be forgiving and treat it as text
        /**
           * Use rest after '<' to match endTag, startTagOpen, comment and conditionalComment
           * If they do not match, it means that '<' belongs to the text itself
           */
        // Find if there are any '<' after '<'
        next = rest.indexOf('<', 1)
        // If not, it means that '<' is also followed by text
        if (next < 0) break
        // If there are any, it means' < 'is a character in the text
        textEnd += next
        // The next round of matching will be continued
        rest = html.slice(textEnd)
    }
    // '<' is the beginning of the end tag, indicating that the text from the beginning to '<' is intercepted
    text = html.substring(0, textEnd)
    advance(textEnd)
}
// No '<' is found in the entire template string, indicating that the entire template string is text
if (textEnd < 0) {
    text = html
    html = ''
}
// Convert the intercepted text into textAST
if (options.chars && text) {
    options.chars(text)
}

It is still through regularization, matching, and finally calling the chars function, and then creating the text AST tree.

After analyzing so many tags, comments, etc., especially tags, there is always a hierarchical relationship, so how to ensure this hierarchical relationship?

In fact, vue creates a stack stack at the beginning of parsing. Its main function is to ensure the hierarchical relationship of the ast tree. We know that at the beginning, the start function will be called. At this time, the tag can be pushed into the stack stack. When the parsing ends, the end function will be called, and the tag in the stack will pop up. This ensures the hierarchical relationship of AST tree.

Let's take a look at the source code:

function parseHTML(html, options) {
	var stack = [];
	var expectHTML = options.expectHTML;
	var isUnaryTag$$1 = options.isUnaryTag || no;
	var canBeLeftOpenTag$$1 = options.canBeLeftOpenTag || no;
	var index = 0;
	var last, lastTag;

	// Open a while loop. The condition for the end of the loop is that HTML is empty, that is, html is parse d
	while (html) {
		last = html;
		// Make sure that the content to parse is not in a plain text tag (script,style,textarea)
		if (!lastTag || !isPlainTextElement(lastTag)) {
		   let textEnd = html.indexOf('<')
              /**
               * If the html string starts with '<', there are several possibilities
               * Start tag: < div >
               * End tag: < / div >
               * Note: <! -- I'm a comment -- >
               * Condition comment: <! -- [if! Ie] - > <! -- [ENDIF] -- >
               * DOCTYPE:<!DOCTYPE html>
               * We need to match and try one by one
               */
            if (textEnd === 0) {
                // Resolve whether it is a comment or not
        		if (comment.test(html)) {

                }
                // Resolve whether it is a conditional comment
                if (conditionalComment.test(html)) {

                }
                // Resolve whether it is DOCTYPE
                const doctypeMatch = html.match(doctype)
                if (doctypeMatch) {

                }
                // Resolve whether it is an end tag
                const endTagMatch = html.match(endTag)
                if (endTagMatch) {

                }
                // Whether the match is a start label
                const startTagMatch = parseStartTag()
                if (startTagMatch) {

                }
            }
            // If the html string does not start with '<', the text type is resolved
            let text, rest, next
            if (textEnd >= 0) {

            }
            // If '<' is not found in the html string, it means that the html string is plain text
            if (textEnd < 0) {
                text = html
                html = ''
            }
            // Convert the intercepted text into textAST
            if (options.chars && text) {
                options.chars(text)
            }
		} else {
			// When the parent elements are script, style and textarea, all the contents inside are treated as plain text
		}

		//Treat the entire string as text
		if (html === last) {
			options.chars && options.chars(html);
			if (!stack.length && options.warn) {
				options.warn(("Mal-formatted tag at end of template: \"" + html + "\""));
			}
			break
		}
	}

	// Clean up any remaining tags
	parseEndTag();
	//parse start tag
	function parseStartTag() {

	}
	//Result of processing parseStartTag
	function handleStartTag(match) {

	}
	//parse end tag
	function parseEndTag(tagName, start, end) {

	}
}

html parsing is so tedious, but simple... To be continued...

Posted by littlejones on Tue, 30 Jun 2020 02:17:58 -0700