Lexical Analysis and Parsing of Go Translation Part Three

Keywords: Go JSON github

Author: Adam Presley | Address: https://adampresley.github.io...

Translator's Preface

Recently, I found that my translation is more and more casual. At first, when I translated articles, I was more restrained. Now I emphasize more on readability. For example, some words which have no effect on the main idea of the article will be skipped directly.

This article is mainly about the parser implementation of the INI interpreter, which receives Token parsing from Lexer in the previous section and eventually returns it to the user's meaningful structure. Read this series of articles, I believe that you will have a basic understanding of the principle of lexical implement, but if you want to really practice, it seems that there is still a long way to go. If we are interested, we can implement our own JSON interpreter. Requirements can be slightly simplified, parsed only to the first layer of JSON.

The translation is as follows:

This series First articleEnglish original In this paper, we introduce some basic concepts of lexical analysis and parsing, understand the basic composition of INI files, and define some constants and structures on this basis, which will be very helpful for our next implementation of INI file parsing.

Second articleEnglish original Because the main focus is on the implementation of Lexer. It completes the process of converting input text into Token.

Today is the last article in this series, and we finally complete our interpreter. The interpreter is responsible for reading Token from channel and eventually creating a structural instance representing the content of the INI file. After parsing, we can print out the results in JSON format.

structural morphology

The parser is responsible for starting the lexicograph and reading Token's components from channel. After receiving Token, the parser needs to know the current Token state and parse it into the corresponding structure. The first thing we need to do is define the structure that represents the INI content. There will be three main structures involved.

The first structure representing Key/Value, named IniKeyValue, is as follows.

model/ini/IniKeyValue.go

package ini

type IniKeyValue struct {
    Key   string `json:"key"`
    Value string `json:"value"`
}

The second structure representing Section, named IniSection, is as follows:

/model/ini/IniSection.go

package ini

type IniSection struct {
    Name          string        `json:"name"`
    KeyValuePairs []IniKeyValue `json:"keyValuePairs"`
}

We know that Section is composed of Key/Value, where Key Value Pairs is the Key/Value that belongs to this Section. If Key/Value does not belong to any Section, it will belong to a Section with Name empty.

The last structure representing the entire file, named IniFile, is as follows:

/model/ini/IniFile.go

package ini

type IniFile struct {
    FileName string       `json:"fileName"`
    Sections []IniSection `json:"sections"`
}

IniFile consists of two member fields, FileName file name and a series of Section s.

Parser

The first thing we need to do to write a parser is to create a variable to store the parsing structure, that is, an IniFile structure type variable. As follows:

output := ini.IniFile{
   FileName: fileName,
   Sections: make([]ini.IniSection, 0),
}

Now, we need some variables to track the values of Token, Token, and the current state of the parser. For example, when we get Key/Value, we need to know which Section we are currently in. Next, you can start the lexical device.

var token lexertoken.Token
var tokenValue string

/* State variables */
key := ""

log.Println("Starting lexer and parser for file", fileName, "...")

l := lexer.BeginLexing(filename, input)

At this point, the relevant variables have been defined and the lexical device has been successfully started. Next, start receiving Token from channel, and execute the Trim space if the Token type is not TOKEN_VALUE.

for {
    token = l.NextToken()

    if token.Type != lexertoken.TOKEN_VALUE {
        tokenValue = strings.TrimSpace(token.Value)
    } else {
        tokenValue = token.Value
    }

    ...

We know that if the lexicographer traverses to the end of the file, it will return a Token of type EOF. At this point, you need to record Section and Key/Value and exit the loop.

if isEOF(token) {
    output.Sections = append(output.Sections, section)
    break
}

The parser function finally implements the processing of specific Token. There are mainly three Tokens that need to be paid attention to.

The first is Section Token. If we encounter Section Token, we first check whether there is a Key / Value in the section variable, and then record it in output.Sections. Then reset the section variable to track the current Section and the next Key/Value.

Next comes Key/Value. If TOKEN_KEY is encountered, the variable key is used to record the value of TOKEN_KEY. When we encounter TOKEN_VALUE, we can append the value of the variable key to section.KeyValuePairs.

The sample code is as follows:

switch token.Type {
   case lexertoken.TOKEN_SECTION:
      /*
* Reset tracking variables
*/
      if len(section.KeyValuePairs) > 0 {
         output.Sections = append(output.Sections, section)
      }

      key = ""

      section.Name = tokenValue
      section.KeyValuePairs = make([]ini.IniKeyValue, 0)

   case lexertoken.TOKEN_KEY:
      key = tokenValue

   case lexertoken.TOKEN_VALUE:
      section.KeyValuePairs = append(section.KeyValuePairs, ini.IniKeyValue{
         Key: key,
         Value: tokenValue,
      })
      key = ""
   }

test

The development work has been completed, and the next step is to enter the testing phase. Github has the corresponding test code, go get download the code, in your GOPATH src/github.com/adampresley/sample-ini-parser directory sampleIniParser.go is the test code.

The code is as follows:

sampleInput := `
key=abcdefg

[User]
userName=adampresley
keyFile=~/path/to/keyfile

[Servers]
server1=localhost:8080
`

parsedINIFile := parser.Parse("sample.ini", sampleInput)
prettyJSON, err := json.MarshalIndent(parsedINIFile, "", " ")

if err != nil {
   log.Println("Error marshalling JSON:", err.Error())
   return
}

log.Println(string(prettyJSON))

It can be executed directly through go run sampleIniParser.go. The execution output is as follows:

2019/08/01 00:06:33 Starting lexer and parser for file sample.ini ...
2019/08/01 00:06:33 Parser has been shutdown
2019/08/01 00:06:33 {
   "fileName": "sample.ini",
   "sections": [
      {
         "name": "",
         "keyValuePairs": [
            {
               "key": "key",
               "value": "abcdefg"
            }
         ]
      },
      {
         "name": "User",
         "keyValuePairs": [
            {
               "key": "userName",
               "value": "adampresley"
            },
            {
               "key": "keyFile",
               "value": "~/path/to/keyfile"
            }
         ]
      },
      {
         "name": "Servers",
         "keyValuePairs": [
            {
               "key": "server1",
               "value": "localhost:8080"
            }
         ]
      }
   ]
}

summary

This series of articles is very challenging, but also very interesting. Lexical analysis and parsing is a very complex topic, there is too much to learn. As we can see, even if it's as simple as parsing the INI file above, we still need some effort to complete it.

Posted by mr9802 on Wed, 31 Jul 2019 23:04:55 -0700