Preceding text
golang quick start [2.1]-go language development environment configuration windows
golang quick start [2.2]-go language development environment configuration - macOS
golang quick start [2.3]-go language development environment configuration linux
Getting started with golang [5.1] - how the go language works - linker
Getting started with golang [5.2] - how the go language works - memory overview
Quick start to golang [5.3] - how the go language works - memory allocation
Quick start to golang [6.1] - integrated development environment - details of goland
Quick start to golang [6.2] - integrated development environment - emacs details
Quick start to golang [7.1] - project and dependency management gopath
Quick start to golang [7.2] - Northern hell magic skill - go module unique skill
Preface
In the above, we have learned various concepts of variables and the type system in go language
We will learn in this article:
What is automatic type inference
Why automatic type inference is needed
Characteristics and pitfalls of automatic type inference in go language
How the go language performs automatic type inference at compile time
Type inference
Type inference is the ability of programming languages to interpret expression data types automatically at compile time. It usually exists in functional programming languages (such as Haskell). The main advantage of type inference is that types can be omitted, which makes programming tasks easier.
It is clear that variable types are common in programming languages, and the extent to which a compiler can do this varies by language. For example, some compilers can infer values: variables, function parameters, and return values.
As a static type language, go language needs to know the type of variables at compile time
Advantages of type inference
There are two main advantages to having the compiler support things like type inference. First, if used properly, it makes the code easier to read. For example, you can use the following C + + Code:
vector<int> v;
vector<int>::iterator itr = v.iterator();
Turn into:
vector<int> v;
auto itr = v.iterator();
Although the benefits here seem trivial, the value of type inference becomes obvious if the types are more complex. In many cases, this will allow us to reduce redundant information in our code.
Type inference is also used for other functions. Haskell language can be written as follows:
succ x = x + 1
In the above function, no matter what type of variable X is, add 1 and return the result.
However, explicit indication of the type is still valid because the compiler can more easily understand what the code should actually do and is less likely to make any errors.
Type inference in go language
As mentioned above, the ability of type inference varies from language to language, and in the go language, according to developers, their goal is to reduce the confusion found in static type languages. They think that many types systems in languages like Java or C + + are too cumbersome.
So when designing Go, they borrowed ideas from these languages. One of these ideas is to use simple type inference for variables, giving people the feeling of writing dynamic type code, while still using the benefits of static type
As mentioned earlier, type inference can cover things like parameters and return values, but not in Go
In practice, you can trigger type inference in Go by simply ignoring type information when declaring a new variable or constant, or by using the: = notation
In Go, the following three statements are equivalent:
var a int = 10
var a = 10
a := 10
Go's type inference is semi complete in handling inference that contains identifiers. In essence, the compiler will not allow casts on values referenced from identifiers, for example:
The following code works normally, and the type of a is float64
a := 1 + 1.1
The following code is still correct. A will be inferred as a floating-point number, and 1 will be added to the value of A
a := 1.1
b := 1 + a
However, the following code will make an error, that is, the value of a has been inferred as an integer, while 1.1 is a floating-point number, but a cannot be cast to a floating-point number, and the addition fails. Compiler error: constant 1.1 truncated to integer
a := 1
b := a + 1.1
The following types will make the same mistake, the compiler prompts: invalid operation: a + b (mismatched types int and float64)
a := 1
b := 1.1
c := a + b
Detailed implementation description
In the previous article (how to compile go language into machine code), we introduced the process of compiler execution: lexical analysis = > syntax analysis = > type checking = > intermediate code = > code optimization = > generate machine code
The code for the compilation phase is in the go/src/cmd/compile file
Lexical analysis stage
Specifically, in the lexical analysis phase, the constant on the right side of the assignment will be resolved to an undefined type, which can be divided into the following types: as the name implies, where imagelit represents the complex number, IntLit represents the integer
//go/src/cmd/compile/internal/syntax
const (
IntLit LitKind = iota
FloatLit
ImagLit
RuneLit
StringLit
)
The source code of go language adopts UTF-8 encoding mode. When encountering constant operations that need to be assigned during lexical analysis, the UTF-8 characters of the subsequent constant will be read one by one. The first character of the string is "and the first letter of the number is" 0 '- "9". The implementation functions are located in:
// go/src/cmd/compile/internal/syntax
func (s *scanner) next() {
...
switch c {
case '0', '1', '2', '3', '4', '5', '6', '7', '8', '9':
s.number(c)
case '"':
s.stdString()
case '`':
s.rawString()
...
Therefore, it is very simple to recognize constants such as integers and decimals. Specifically, an integer is a number that is all "0" - "9". A floating-point number is a number with a "." sign in the character. A string is a number with the first character as“
The functions listed below are specific implementations of decimal and integer parsing:
// go/src/cmd/compile/internal/syntax
func (s *scanner) number(c rune) {
s.startLit()
base := 10 // number base
prefix := rune(0) // one of 0 (decimal), '0' (0-octal), 'x', 'o', or 'b'
digsep := 0 // bit 0: digit present, bit 1: '_' present
invalid := -1 // index of invalid digit in literal, or < 0
// integer part
var ds int
if c != '.' {
s.kind = IntLit
if c == '0' {
c = s.getr()
switch lower(c) {
case 'x':
c = s.getr()
base, prefix = 16, 'x'
case 'o':
c = s.getr()
base, prefix = 8, 'o'
case 'b':
c = s.getr()
base, prefix = 2, 'b'
default:
base, prefix = 8, '0'
digsep = 1 // leading 0
}
}
c, ds = s.digits(c, base, &invalid)
digsep |= ds
}
// fractional part
if c == '.' {
s.kind = FloatLit
if prefix == 'o' || prefix == 'b' {
s.error("invalid radix point in " + litname(prefix))
}
c, ds = s.digits(s.getr(), base, &invalid)
digsep |= ds
}
...
We take assignment operation a: = 333 as an example. When lexical analysis is completed, this assignment statement is represented by AssignStmt.
AssignStmt struct {
Op Operator // 0 means no operation
Lhs, Rhs Expr // Rhs == ImplicitOne means Lhs++ (Op == Add) or Lhs-- (Op == Sub)
simpleStmt
}
Where Op stands for the operator, and here is the assignment operation. Lhs and Rhs represent the left and right expressions respectively. The left represents the variable a, and the right represents the integer 333. At this time, the type of the right integer is intLit
Abstract syntax tree stage
Then, when the abstract syntax tree AST is generated, the assign stmt parsing of lexical analysis will be changed into an ode, and the node structure is the abstraction of nodes in the abstract syntax tree.
type Node struct {
// Tree structure.
// Generic recursive walks should follow these fields.
Left *Node
Right *Node
Ninit Nodes
Nbody Nodes
List Nodes
Rlist Nodes
E interface{} // Opt or Val, see methods below
...
Still, the Left node represents the Left variable a, and the right node represents the integer 333.
In the E interface, the Right node stores the value 333 of type mpint. Mpint is used to store integer constants
The specific code is as follows. If it is of IntLit type, it is converted to Mpint type. Other types are similar.
Note, however, that the nodes on the left do not have any type at this time.
// go/src/cmd/compile/internal/gc
func (p *noder) basicLit(lit *syntax.BasicLit) Val {
// TODO: Don't try to convert if we had syntax errors (conversions may fail).
// Use dummy values so we can continue to compile. Eventually, use a
// form of "unknown" literals that are ignored during type-checking so
// we can continue type-checking w/o spurious follow-up errors.
switch s := lit.Value; lit.Kind {
case syntax.IntLit:
checkLangCompat(lit)
x := new(Mpint)
x.SetString(s)
return Val{U: x}
case syntax.FloatLit:
checkLangCompat(lit)
x := newMpflt()
x.SetString(s)
return Val{U: x}
As shown in the following structure of Mpint type, we can see that the AST phase integer storage uses math/big.int for high-precision storage.
// Mpint represents an integer constant.
type Mpint struct {
Val big.Int
Ovf bool // set if Val overflowed compiler limit (sticky)
Rune bool // set if syntax indicates default type rune
}
Finally, in the stage of type checking in the abstract syntax tree, the final assignment operation will be completed. Assign the type of the right constant to the type of the left variable.
The final concrete function is typecheckas, which assigns the right type to the left
func typecheckas(n *Node) {
...
if n.Left.Name != nil && n.Left.Name.Defn == n && n.Left.Name.Param.Ntype == nil {
n.Right = defaultlit(n.Right, nil)
n.Left.Type = n.Right.Type
}
}
...
The mpint type corresponds to the CTINT identity. As shown below, in the previous stage, different types correspond to different identifications. Finally, the type of the variable store on the left will change to types.Types[TINT]
func (v Val) Ctype() Ctype {
switch x := v.U.(type) {
default:
Fatalf("unexpected Ctype for %T", v.U)
panic("unreachable")
case nil:
return 0
case *NilVal:
return CTNIL
case bool:
return CTBOOL
case *Mpint:
if x.Rune {
return CTRUNE
}
return CTINT
case *Mpflt:
return CTFLT
case *Mpcplx:
return CTCPLX
case string:
return CTSTR
}
}
types.Types is an array that stores the actual types in the go language corresponding to different identities.
var Types [NTYPE]*Type
Type is the storage structure of types in go language. types.Types[TINT] ultimately represent the type of int. Its structure is as follows:
// A Type represents a Go type.
type Type struct {
Extra interface{}
// Width is the width of this Type in bytes.
Width int64 // valid if Align > 0
methods Fields
allMethods Fields
Nod *Node // canonical OTYPE node
Orig *Type // original type (type literal or predefined type)
// Cache of composite types, with this type being the element type.
Cache struct {
ptr *Type // *T, or nil
slice *Type // []T, or nil
}
Sym *Sym // symbol containing name, for named types
Vargen int32 // unique name for OTYPE/ONAME
Etype EType // kind of type
Align uint8 // the required alignment of this type, in bytes (0 means Width and Align have not yet been computed)
flags bitset8
}
Finally, we can use the following code to verify the type, and the output is: int
a := 333
fmt.Printf("%T",a)
summary
In this paper, we introduce the connotation and significance of automatic type inference. At the same time, we point out the characteristics of automatic type inference in go language with examples.
Finally, we use a:=333 as an example to show how the go language performs automatic type inference at compile time.
Specifically, go language involves lexical analysis and abstract syntax tree stage in compilation. For the digital processing, we first use the math package for high-precision processing, and then it will be converted to the standard type of go language, int or float64. There is no detailed introduction of strings in this paper, which will be left for later articles.
see you~
Reference material
Project link
Author knows
blog
Type inference
Rob Pike:Less is exponentially more
Type inference for go