Quick start to golang [8.2] - Secrets of automatic type inference

Keywords: Go Programming less Windows

Preceding text

Quick start to golang [8.1] - variable type, declaration assignment, scope declaration cycle and variable memory allocation

Preface

In the above, we have learned various concepts of variables and the type system in go language
We will learn in this article:
What is automatic type inference
Why automatic type inference is needed
Characteristics and pitfalls of automatic type inference in go language
How the go language performs automatic type inference at compile time

Type inference

Type inference is the ability of programming languages to interpret expression data types automatically at compile time. It usually exists in functional programming languages (such as Haskell). The main advantage of type inference is that types can be omitted, which makes programming tasks easier.
It is clear that variable types are common in programming languages, and the extent to which a compiler can do this varies by language. For example, some compilers can infer values: variables, function parameters, and return values.
As a static type language, go language needs to know the type of variables at compile time

Advantages of type inference

There are two main advantages to having the compiler support things like type inference. First, if used properly, it makes the code easier to read. For example, you can use the following C + + Code:

vector<int> v;
vector<int>::iterator itr = v.iterator();

Turn into:

vector<int> v;
auto itr = v.iterator();

Although the benefits here seem trivial, the value of type inference becomes obvious if the types are more complex. In many cases, this will allow us to reduce redundant information in our code.
Type inference is also used for other functions. Haskell language can be written as follows:

succ x = x + 1

In the above function, no matter what type of variable X is, add 1 and return the result.
However, explicit indication of the type is still valid because the compiler can more easily understand what the code should actually do and is less likely to make any errors.

Type inference in go language

As mentioned above, the ability of type inference varies from language to language, and in the go language, according to developers, their goal is to reduce the confusion found in static type languages. They think that many types systems in languages like Java or C + + are too cumbersome.

So when designing Go, they borrowed ideas from these languages. One of these ideas is to use simple type inference for variables, giving people the feeling of writing dynamic type code, while still using the benefits of static type
As mentioned earlier, type inference can cover things like parameters and return values, but not in Go
In practice, you can trigger type inference in Go by simply ignoring type information when declaring a new variable or constant, or by using the: = notation
In Go, the following three statements are equivalent:

var a int = 10
var a = 10
a := 10

Go's type inference is semi complete in handling inference that contains identifiers. In essence, the compiler will not allow casts on values referenced from identifiers, for example:
The following code works normally, and the type of a is float64

a := 1 + 1.1

The following code is still correct. A will be inferred as a floating-point number, and 1 will be added to the value of A

a := 1.1
b := 1 + a

However, the following code will make an error, that is, the value of a has been inferred as an integer, while 1.1 is a floating-point number, but a cannot be cast to a floating-point number, and the addition fails. Compiler error: constant 1.1 truncated to integer

a := 1
b := a + 1.1

The following types will make the same mistake, the compiler prompts: invalid operation: a + b (mismatched types int and float64)

a := 1
b := 1.1
c := a + b

Detailed implementation description

In the previous article (how to compile go language into machine code), we introduced the process of compiler execution: lexical analysis = > syntax analysis = > type checking = > intermediate code = > code optimization = > generate machine code
The code for the compilation phase is in the go/src/cmd/compile file

Lexical analysis stage

Specifically, in the lexical analysis phase, the constant on the right side of the assignment will be resolved to an undefined type, which can be divided into the following types: as the name implies, where imagelit represents the complex number, IntLit represents the integer

//go/src/cmd/compile/internal/syntax
const (
 IntLit LitKind = iota
 FloatLit
 ImagLit
 RuneLit
 StringLit
)

The source code of go language adopts UTF-8 encoding mode. When encountering constant operations that need to be assigned during lexical analysis, the UTF-8 characters of the subsequent constant will be read one by one. The first character of the string is "and the first letter of the number is" 0 '- "9". The implementation functions are located in:

// go/src/cmd/compile/internal/syntax

func (s *scanner) next() {
...
switch c {
    case '0', '1', '2', '3', '4', '5', '6', '7', '8', '9':
        s.number(c)
    case '"':
        s.stdString()
    case '`':
        s.rawString()
    ...

Therefore, it is very simple to recognize constants such as integers and decimals. Specifically, an integer is a number that is all "0" - "9". A floating-point number is a number with a "." sign in the character. A string is a number with the first character as“
The functions listed below are specific implementations of decimal and integer parsing:

// go/src/cmd/compile/internal/syntax
func (s *scanner) number(c rune) {
    s.startLit()

    base := 10        // number base
    prefix := rune(0) // one of 0 (decimal), '0' (0-octal), 'x', 'o', or 'b'
    digsep := 0       // bit 0: digit present, bit 1: '_' present
    invalid := -1     // index of invalid digit in literal, or < 0

    // integer part
    var ds int
    if c != '.' {
        s.kind = IntLit
        if c == '0' {
            c = s.getr()
            switch lower(c) {
            case 'x':
                c = s.getr()
                base, prefix = 16, 'x'
            case 'o':
                c = s.getr()
                base, prefix = 8, 'o'
            case 'b':
                c = s.getr()
                base, prefix = 2, 'b'
            default:
                base, prefix = 8, '0'
                digsep = 1 // leading 0
            }
        }
        c, ds = s.digits(c, base, &invalid)
        digsep |= ds
    }

    // fractional part
    if c == '.' {
        s.kind = FloatLit
        if prefix == 'o' || prefix == 'b' {
            s.error("invalid radix point in " + litname(prefix))
        }
        c, ds = s.digits(s.getr(), base, &invalid)
        digsep |= ds
    }
...

We take assignment operation a: = 333 as an example. When lexical analysis is completed, this assignment statement is represented by AssignStmt.

    AssignStmt struct {
        Op       Operator // 0 means no operation
        Lhs, Rhs Expr     // Rhs == ImplicitOne means Lhs++ (Op == Add) or Lhs-- (Op == Sub)
        simpleStmt
    }

Where Op stands for the operator, and here is the assignment operation. Lhs and Rhs represent the left and right expressions respectively. The left represents the variable a, and the right represents the integer 333. At this time, the type of the right integer is intLit

Abstract syntax tree stage

Then, when the abstract syntax tree AST is generated, the assign stmt parsing of lexical analysis will be changed into an ode, and the node structure is the abstraction of nodes in the abstract syntax tree.

type Node struct {
    // Tree structure.
    // Generic recursive walks should follow these fields.
    Left  *Node
    Right *Node
    Ninit Nodes
    Nbody Nodes
    List  Nodes
    Rlist Nodes
    E   interface{} // Opt or Val, see methods below
    ...

Still, the Left node represents the Left variable a, and the right node represents the integer 333.
In the E interface, the Right node stores the value 333 of type mpint. Mpint is used to store integer constants
The specific code is as follows. If it is of IntLit type, it is converted to Mpint type. Other types are similar.
Note, however, that the nodes on the left do not have any type at this time.

// go/src/cmd/compile/internal/gc
func (p *noder) basicLit(lit *syntax.BasicLit) Val {
    // TODO: Don't try to convert if we had syntax errors (conversions may fail).
    //       Use dummy values so we can continue to compile. Eventually, use a
    //       form of "unknown" literals that are ignored during type-checking so
    //       we can continue type-checking w/o spurious follow-up errors.
    switch s := lit.Value; lit.Kind {
    case syntax.IntLit:
        checkLangCompat(lit)
        x := new(Mpint)
        x.SetString(s)
        return Val{U: x}

    case syntax.FloatLit:
        checkLangCompat(lit)
        x := newMpflt()
        x.SetString(s)
        return Val{U: x}

As shown in the following structure of Mpint type, we can see that the AST phase integer storage uses math/big.int for high-precision storage.

// Mpint represents an integer constant.
type Mpint struct {
    Val  big.Int
    Ovf  bool // set if Val overflowed compiler limit (sticky)
    Rune bool // set if syntax indicates default type rune
}

Finally, in the stage of type checking in the abstract syntax tree, the final assignment operation will be completed. Assign the type of the right constant to the type of the left variable.
The final concrete function is typecheckas, which assigns the right type to the left

func typecheckas(n *Node) {
...
if n.Left.Name != nil && n.Left.Name.Defn == n && n.Left.Name.Param.Ntype == nil {
        n.Right = defaultlit(n.Right, nil)
        n.Left.Type = n.Right.Type
    }
}
...

The mpint type corresponds to the CTINT identity. As shown below, in the previous stage, different types correspond to different identifications. Finally, the type of the variable store on the left will change to types.Types[TINT]

func (v Val) Ctype() Ctype {
 switch x := v.U.(type) {
 default:
  Fatalf("unexpected Ctype for %T", v.U)
  panic("unreachable")
 case nil:
  return 0
 case *NilVal:
  return CTNIL
 case bool:
  return CTBOOL
 case *Mpint:
  if x.Rune {
   return CTRUNE
  }
  return CTINT
 case *Mpflt:
  return CTFLT
 case *Mpcplx:
  return CTCPLX
 case string:
  return CTSTR
 }
}

types.Types is an array that stores the actual types in the go language corresponding to different identities.

var Types [NTYPE]*Type

Type is the storage structure of types in go language. types.Types[TINT] ultimately represent the type of int. Its structure is as follows:

// A Type represents a Go type.
type Type struct {
    Extra interface{}

    // Width is the width of this Type in bytes.
    Width int64 // valid if Align > 0

    methods    Fields
    allMethods Fields

    Nod  *Node // canonical OTYPE node
    Orig *Type // original type (type literal or predefined type)

    // Cache of composite types, with this type being the element type.
    Cache struct {
        ptr   *Type // *T, or nil
        slice *Type // []T, or nil
    }

    Sym    *Sym  // symbol containing name, for named types
    Vargen int32 // unique name for OTYPE/ONAME

    Etype EType // kind of type
    Align uint8 // the required alignment of this type, in bytes (0 means Width and Align have not yet been computed)

    flags bitset8
}

Finally, we can use the following code to verify the type, and the output is: int

a :=  333
fmt.Printf("%T",a)

summary

In this paper, we introduce the connotation and significance of automatic type inference. At the same time, we point out the characteristics of automatic type inference in go language with examples.
Finally, we use a:=333 as an example to show how the go language performs automatic type inference at compile time.
Specifically, go language involves lexical analysis and abstract syntax tree stage in compilation. For the digital processing, we first use the math package for high-precision processing, and then it will be converted to the standard type of go language, int or float64. There is no detailed introduction of strings in this paper, which will be left for later articles.
see you~

Reference material

Project link
Author knows
blog
Type inference
Rob Pike:Less is exponentially more
Type inference for go

Posted by KILOGRAM on Sat, 21 Mar 2020 08:26:47 -0700

Programmer Group