C + + review notes: the reference of C + + is a little unfathomable

Keywords: C++

1. Write in front

c + + online compilation tool, which can quickly carry out experiments: https://www.dooccn.com/cpp/

I plan to pick up c + + again during this period of time. One of my experiences from the internship is that algorithm engineers are to solve practical problems. Therefore, they should not be limited by algorithms or engineering. They should always improve their ability to solve problems. In this process, I found that cpp is very important. I was also exposed to some tasks related to c + + development during this period of time, All want to take this opportunity to learn c + + again. In the field of recommendation, the algorithm models I have come into contact with are mainly based on python, and the online services are all c + + (algorithm side, business side basically uses go). Our so-called models are generally trained to deploy online and then provide interfaces. So now I finally know why it's not good to just be familiar with Python. cpp is yyds.

Like python, this series is a review. It still won't sort out too basic things. It's more like checking deficiencies and filling gaps. However, for me, c + + has not been used for five years. This lack is very big, and it's almost learned again, so I'll review it in the next time 😉

The main references are C language Chinese network and C + + tutorial written by brother Guangcheng Then add your own understanding and programming experiments as an aid to deepen your impression.

This article is mainly a reference to C + +, which is also not available in the C language. It happens that just returned to school as a buffer, I sorted less and caught up just fine, ha ha.

Main contents:

Introduction to C + + reference
The nature of C + + references (what's the difference between references and pointers?)
C + + references cannot be bound to addressable data (temporary data of basic types, constant expressions, etc.)
The compiler will create temporary variables for const references (the principles behind the following two operations are similar, but they are unknown and magical)
The wonders of C++const reference and conversion types

Ok, let's go!

2. Introduction to C + + reference

During function call, the transfer of parameters is essentially a process of assignment. The so-called assignment is to copy the memory, and copy is to copy the data from one memory to another.

For data of basic types (supported by the language itself, like char, int, float), the memory they occupy is often only a few bytes. The above process will be fast. For complex types (arrays, structures, classes, etc. are composed of basic types), objects are a collection of a series of data, and the number of data is unlimited. At this time, Copying them frequently can take a lot of time and become inefficient.

Therefore, C/C + + prohibits the direct transfer of array contents during function calls, but forces the transfer of array pointers, but? There are no such restrictions on structures and objects. They can be content or pointers. In order to improve efficiency, it is recommended to use structure pointers (struct struct_name * var_name) in C language, while in C + +, a more convenient way to transfer aggregate type data than pointers is reference.

Reference is a great extension of C + + to C language. Reference can be regarded as an alias of data, a shortcut similar to Windows, or a person's nickname. The syntax is as follows:

type &name = data;

Type is the type of the referenced data, name is the name of the reference, and data is the referenced data. References must be initialized at the same time when they are defined, and they must be consistent in the future. Other data can no longer be referenced, which is a bit similar to constants.

In addition, it should be noted that the reference needs to be added with &. It can't be added with &. Adding & when using means taking the address.

Take an example:

#include <iostream>
using namespace std;
int main() {
    int a = 99;
    int &r = a;
    cout << a << ", " << r << endl;  // 99 99
    cout << &a << ", " << &r << endl;   // 0x7fff04542d94, 0x7fff04542d94

	r = 40;
	cout << a << "," << r << endl;   // 40 40
    return 0;
}

If you do not want to modify the original data by reference, you can add const restrictions when defining

const type &name = value;

2.1 C + + references as function parameters

When defining or declaring a function, you can specify the formal parameters of the function as references, so that when calling the function, the arguments and formal parameters will be bound together to refer to the same data. If the data of the formal parameter is modified in the function body, the data of the actual parameter will also be modified to achieve the effect of "internal function affects external function".

For example, the classic case commonly used in learning C + +

#include <iostream>
using namespace std;
void swap1(int a, int b);
void swap2(int *p1, int *p2);
void swap3(int &r1, int &r2);
int main() {
    int num1 = 1, num2 = 2;
    swap1(num1, num2);
    cout << num1 << " " << num2 << endl;  // 1 2
    num1 = 1;
    num2 = 2;
    swap2(&num1, &num2);
    cout << num1 << " " << num2 << endl;  // 2 1
    num1 = 1;
    num2 = 2;
    swap3(num1, num2);
    cout << num1 << " " << num2 << endl;  // 2 1
    return 0;
}
//Pass parameter content directly
void swap1(int a, int b) {
    int temp = a;
    a = b;
    b = temp;
}
//Pass pointer
void swap2(int *p1, int *p2) {
    int temp = *p1;
    *p1 = *p2;
    *p2 = temp;
}
//Pass parameters by reference
void swap3(int &r1, int &r2) {
    int temp = r1;
    r1 = r2;
    r2 = temp;
}

Of the three exchange methods, the latter two can really achieve the purpose of exchange.

swap1() directly passes the contents of parameters. A and B are formal parameters. They have their own independent memory, and the scope of action is limited to the internal variables of the function
swap2() passes a pointer. When called, the addresses of num1 and num2 are passed to p1 and p2, so p1 and p2 point to the data represented by a and B. the values of a and B are indirectly modified through the pointer.
swap3() is passed by reference. When calling, bind r1 and r2 to the data referred to by num1 and num2. At this time, r1 and num1, r2 and num2 represent the same data.

In comparison, the third way will be more intuitive.

2.2 C + + reference as function return value

In addition to being a function parameter, a reference can also be a function return value.

int &plus(int &r){
	r += 10;
	return r;
}

int main(){
	int num1 = 10;
	int num2 = plus(num1);
	
	cout << num1 << " " << num2 << endl;   // 20 20
}

One problem to be noted above is that local data (variables, objects, arrays, etc.) cannot be returned when the function is called, because the local data will be destroyed after the function is called, and the data may not exist when it is used next time. For example:

int &plus10(int &r) {
    int m = r + 10;
    return m;  //Returns a reference to local data
}

// Some compilers can't compile in the past. Even if some compilers compile in the past, they will have problems the second time if they are used twice. Don't play like that.

3. The essence of C + + reference

Reference is a simple encapsulation of pointer. The bottom layer is still implemented through pointer. The memory occupied by reference is the same as that occupied by pointer. It is 4 bytes in 32-bit environment and 8 bytes in 64 bit environment. However, due to the internal conversion of the compiler, it is impossible to obtain the address of the reference itself, but the reference will occupy memory. The following chestnuts:

int a = 99;
int &r = a;
r = 18;
cout<<&r<<endl

When compiling, it will be converted to the following:

int a = 99;
int *r = &a;
*r = 18;
cout << r << endl;

When using & R to get the address, the compiler will implicitly convert the code so that the code outputs the content of R (the address of a) rather than the address of R, which is why the address of the reference variable cannot be obtained. In other words, it is not that the variable r does not occupy memory, but that the compiler does not let it get its address.

The advantage of reference is that it makes the code more concise and easier to use than pointers, but behind it is pointers. But what's the difference between these two brothers?

The reference must be initialized at the time of definition, and must be consistent in the future, and cannot point to other data; The pointer does not have this restriction. There is no need to assign a value when defining the pointer, and the pointer can be changed in the future. Did a small experiment:

#include <iostream>
using namespace std;

int main() {
    int num1 = 10, num2 = 15;
    
    //int &a;
    //a = num1;   // error  'a' declared as reference but not initialized
    
    int &a = num1;
    cout << a << endl;
    
    // &a = num2;   //  lvalue required as left operand of assignment
    a = num2;
    cout << a << " " << num1 << endl;    // 15 15
    
    a = a - 10;
    cout << a << " " << num2 << " " << num1 << endl; // 5  15  5
    
    return 0;
}

// It can be found that once reference a points to num1, num1 is recognized.

You can use const pointer, but you can't play like this without const reference
```
int a = 20;
int & const r = a;  //  'const' qualifiers cannot be applied to 'int&'
```
Because r can't change the direction, why add const?
Pointers can have multiple levels, but references can only have one level. For ex amp le, int **p is legal, but int &&r is illegal. If you want to define a reference variable to refer to another reference variable, you only need to add an &.
```
int a = 10;
int &r = a;
int &rr = r;
```
Pointer and reference self increment and self subtraction have different meanings. Use + + for pointer to point to the next data, and use + + for reference to indicate the data it refers to plus 1
```
int a = 10;
int &r = a;
r++;
cout<<r<<endl;   // 11
   
int arr[2] = { 27, 84 };
int *p = arr;
p++;
cout<<*p<<endl;  // 84
```

4. C + + reference cannot be bound to unaddressed data

4.1 temporary data

This concept must first be viewed from the perspective of pointer. Pointer is the address of data or code in memory, and pointer variable points to data or code in memory. Note that the memory pointer here can only point to memory, not to registers or hard disks. Because registers and hard disks cannot be addressed.

Most of the contents in C + + are stored in memory, such as defined variables, defined objects, string constants, function parameters, function body itself, new or malloc allocated memory, etc., so these contents can & obtain the address.
But, there are some data, such as the result of an expression, the return value of a function, etc. they may be in memory or in a register. Once they are placed in a register, they cannot be used & to obtain the address, and of course, they cannot be pointed to by a pointer.

For example:

int n = 100, m = 200;
int *p1 = &(m + n);    //The result of m + n is 300
int *p2 = &(n + 100);  //The result of n + 100 is 200
bool *p4 = &(m < n);   //The result of M < n is false

int func(){
    int n = 100;
    return n;
}
int *p = &(func());

The results of the above expressions and the return value of the function are put in the register, so it is wrong to try to get the address with &.

So what kind of temporary data will be put in the register? Under inventory:

The register is close to the CPU and faster than the memory. The temporary data is put in the register to speed up the program, but the register can't put too large data and put some small data. Such as int, double, bool, char and other basic types of temporary data. The size of object, structure variable and user-defined data is unpredictable, so these two types of temporary data are put into memory.

Look at the following code:

typedef struct{
	int a;
	int b;
} S;

// Engage in an operator overload, + to realize the addition between structures
S operator + (const S &A, const S &B){
	S C;
	C.a = A.a + B.a;
	C.b = A.b + B.b;
	return C
}
// Define a function
S func(){
	S a;
	a.a = 100;
	a.b = 200;
	return a;
}

// Main function
S s1 = {23, 45};
S s2 = {90, 75};
S *p1 = &(s1 + s2);
S *p2 = &(func());
cout << p1 << " " << p2 << endl;

There is no problem with the above code, which also proves that the temporary data of structure type is put into memory.

4.2 constant expression

Expressions that do not contain variables are called constant expressions, such as 100, 200 + 34, 1 * 2, etc.

Because constant expressions do not contain variables and have no instability factors, they can be evaluated at the compilation stage. Instead of allocating memory separately to store the value of constant expressions, the compiler combines constant expressions and code into a virtual space code area.

From the assembly point of view, the value of a constant expression is an immediate number, which is "hard coded" into the instruction and cannot be addressed.

Therefore, although the value of a constant expression is in memory, there is no way to address it, nor can & be used to obtain the address, nor can a pointer be used to point to it*

// error   error: lvalue required as unary '&' operand
int *p1 = &(100);    
int *p2 = &(23 + 45 * 2);

4.3 references cannot refer to temporary data

The essence of a reference is a pointer, so references to temporary data that cannot be addressed cannot be bound, and C + + has more strict requirements for references. Under some compilers, even temporary traversal data placed in memory cannot be referred to.

//The following code is wrong under GCC and Visual C + +
int m = 100, n = 36;
int &r1 = m + n;
int &r2 = m + 28;
int &r3 = 12 * 3;
int &r4 = 50;
   
//The following code is wrong in GCC and correct in Visual C + +
S s1 = {23, 45};
S s2 = {90, 75};
S &r6 = func_s();
S &r7 = s1 + s2;

When a reference is used as a function parameter, it is easy to pass temporary data to it. Pay attention to the following usage:

bool isOdd(int &n){
    if(n%2 == 0){
        return false;
    }else{
        return true;
    }
}
int main(){
    int a = 100;
    isOdd(a);  //correct
    isOdd(a + 9);  //error
    isOdd(27);  //error
    isOdd(23 + 55);  //error
    return 0;
}

The function parameter to judge whether it is odd is a reference type. It can only pass variables, not constants or expressions. In fact, the more standard way is to pass values instead of references.

bool isOdd(int n){  //Change to value passing
    if(n%2 == 0){
        return false;
    }else{
        return true;
    }
}

So there's no problem.

5. Create a temporary variable by const reference

A reference cannot be bound to temporary data, but one exception is to use the const keyword to qualify the reference, and the reference can be bound to temporary data.

#include <iostream>
using namespace std;

typedef struct{
    int a;
    int b;
} S;
int func_int(){
    int n = 100;
    return n;
}
S func_s(){
    S a;
    a.a = 100;
    a.b = 200;
    return a;
}
S operator+(const S &A, const S &B){
    S C;
    C.a = A.a + B.a;
    C.b = A.b + B.b;
    return C;
}
int main(){
    int m = 100, n = 36;
    const int &r1 = m + n;
    const int &r2 = m + 28;
    const int &r3 = 12 * 3;
    const int &r4 = 50;
    const int &r5 = func_int();
    S s1 = {23, 45};
    S s2 = {90, 75};
    const S &r6 = func_s();
    const S &r7 = s1 + s2;
    return 0;
}

After adding const, there will be no problem in compiling this code. This is because the constant reference is bound to the temporary data. The compiler adopts a compromise mechanism: the compiler will create a new, nameless temporary variable for the temporary data, put the temporary data into the temporary variable, and then bind the reference to the temporary variable. At this time, the reference is actually bound to the temporary variable, and the temporary variable will be allocated memory.

Then, at this time, there may be a question. Why does the compiler create a temporary variable for constant references and bind it, while for ordinary references, it does not create a temporary variable?

First, after the reference is bound to a piece of data, you can reference the real operation data, including reading and writing. Writing will change the value of the data. Temporary data is often not addressable and cannot be written. Even if a temporary variable is created for temporary data, the value in the temporary variable is changed when it is modified, which will not affect the original data. At this time, there is a situation that the value of a temporary variable is changed by reference, and the original data cannot be changed, so the reference will lose its meaning. Therefore, it makes no sense to create temporary variables for ordinary references, and the compiler will not do so.
Const reference is different from ordinary reference. We can only read the data value through const reference, but can't modify its value. We don't need to consider the problem of synchronous update, and different data will not be generated. Creating temporary variables for const reference makes the reference more flexible and universal.

In short, regular references cannot modify data, but can only be read. At this time, there will be no data inconsistency. For ordinary references, if a temporary variable is established, the reference will only modify the value of the temporary variable, not the original value. At this time, there will be data inconsistency. Therefore, the compiler is very considerate and will create a temporary variable only when necessary.

bool isOdd(const int &n){  //Change to constant reference
    if(n/2 == 0){
        return false;
    }else{
        return true;
    }
}

int a = 100;
isOdd(a);  //correct
isOdd(a + 9);  //correct
isOdd(27);  //correct
isOdd(23 + 55);  //correct

This code is correct. Therefore, for the previous code, there are two modifications: Value Passing and const reference.

6. C++ const reference and conversion type

6.1 type conversion still starts with pointers

The amount of memory occupied by different types of data is different, and the processing methods are different. The type of pointer should strictly correspond to the data type it points to. The following are wrong:

int n = 100;
int *p1 = &n;  //correct
float *p2 = &n;  //error

char c = '@';
char *p3 = &c;  //correct
int *p4 = &c;  //error

Although int can be automatically converted to float and char can be automatically converted to int, the float * pointer cannot point to int and the int * pointer cannot point to char, which is reasonable. Because floating point numbers and integers occupy 4 bytes of memory, but the program processing method is different.

For int, the program takes the highest 1 bit as the sign bit and the remaining 31 bits as the value bit
For float, the program takes the highest 1 bit as the sign bit, the lowest 23 bits as the mantissa, and the middle 8 bits as the index bit

At this time, if truncation occurs in type conversion, problems often occur, which are not easy to find. For example, the following:

int main(){
    int n = 100;
    float *p = (float*)&n;
    *p = 19.625;
    printf("%d\n", n);   // 1100808192
    return 0;
}

Force float * to point to int data. At this time, you will find that the result is a strange number above. Shouldn't it be 19? This result is caused by the above reasons, so the compiler prohibits this kind of pointing. We should also follow such a rule and don't force it.

Analogy to quotation, this is the same:

int n = 100;
int &r1 = n;  //correct
float &r2 = n;  //error
char c = '@';
char &r3 = c;  //correct
int &r4 = c;  //error

6.2 add const?

The types are strictly consistent. For ordinary references, it must be observed. However, after the const restriction is added, the situation changes. The compiler allows const references to bind to data with inconsistent types.

int n = 100;
int &r1 = n;  //correct
const float &r2 = n;  //correct

char c = '@';
char &r3 = c;  //correct
const int &r4 = c;  //correct

why? Behind this is still the role of temporary variables.

When the reference type is inconsistent with the data type, if the type is similar and the automatic conversion of the type is observed, the compiler will create a temporary variable, assign the data to the temporary variable (automatic type conversion occurs at this time), and then bind the const reference to the temporary variable. This is the same as binding const references to temporary data.

Note that the type of temporary variable is the same as that of reference. Automatic type conversion will occur when assigning data to temporary variable

float f = 12.45;
const int &r = f;
printf("%d", r);   // 12

This is the same as what we think. In fact, first create an int temporary variable for f, which is 12 at this time, and then give the int reference variable r to the int temporary variable. However, when the referenced type and data type do not comply with the automatic conversion of data type, the compiler will report an error.

char *str = "http://c.biancheng.net";
const int &r = str;   // error

Mr. Xiao: after adding const to the reference, you can not only bind the reference to temporary data, but also bind the reference to data of similar types, which makes the reference more flexible and universal. The mechanism behind them is temporary variables.

Therefore, I found that many standard codes have such a habit: function parameters of reference types are generally decorated with const. Now I almost know the reason.

When a reference is used as a function parameter, if the data bound by the reference will not be modified inside the function body, try to add const restrictions to the reference.

The following example demonstrates the flexibility of const reference:

// The volume() function is used to find the volume of a box. It can receive different types of arguments, constants or expressions.
double volume(const double &len, const double &width, const double &hei){
    return len*width*2 + len*hei*2 + width*hei*2;
}
int main(){
    int a = 12, b = 3, c = 20;
    double v1 = volume(a, b, c);
    double v2 = volume(10, 20, 30);
    double v3 = volume(89.4, 32.7, 19);
    double v4 = volume(a+12.5, b+23.4, 16.78);
    double v5 = volume(a+b, a+c, b+c);
    printf("%lf, %lf, %lf, %lf, %lf\n", v1, v2, v3, v4, v5);
    return 0;
}

To sum up, there are three reasons why the form of reference type participates in const restriction:

Using const can avoid programming errors that inadvertently modify data
Using const allows the function to receive const and non const type arguments, otherwise it can only receive non const arguments
Using const references enables functions to generate and use temporary variables correctly

In large development projects, I also found that when they write function parameters, they often like to add const modification. The original reason is this. Learned, ha ha.

This article is all sorted out here. Since I just returned to school, c + + is simply sorted out as a buffer. Most of the content comes from the first article in the above link, which only extracts some knowledge I don't know. If you want to learn systematically, you can go to the above link.

Next, I plan to finish the basic knowledge of c + + in October. This series will be updated again. Other series, such as recommended model and watermelon book review, will also be updated again. However, because they are knowledge series and vernacular interpretation, they may be much slower, but they will also insist on the frequency of updating. When they go back to school, they should start to study hard, Day by day 😉

Posted by KyleVA on Wed, 06 Oct 2021 12:16:24 -0700

Programmer Group