# Advanced C language: custom types

Keywords: C

## Custom type

C language itself has built-in types such as int, float, double... And also provides customizable custom types or construction types such as structure, enumeration and union.

### structural morphology

A structure is a collection of values that are collectively referred to as member variables. Each member of the structure can be a different type of variable. Therefore, structure can be used to describe a complex object, in which members are various attributes of the object.

#### Declaration of structure

struct tag {

member_list;

}value_list;

• struct is the keyword of the structure, tag is the tag name of the structure,
• member_list is a list that defines member variables, value_list is the list of global variables created with this structure type. It can be defaulted.
struct Book {
char name[20];
char author[20];
double price;
}b1, b2;
struct Book b3 = { 0 };
int main(){
struct Book b4 = { 0 };
return 0;
}


The above is an example of a structure description book. Here b1,b2 and b3 are completely equivalent and are global variables, while b4 defined in the main function is a local variable.

##### Anonymous formal declaration
struct {
int a;
char c;
double d;
}s1, s2;


Define the anonymous structure type, omitting the tag name, so you can only create variables in the member list.

###### Example
struct {
int a;
char c;
double d;
}s1, s2;
struct {
int a;
char c;
double d;
}*ps;
int main() {
ps = &s1;//Incompatible pointer types
return 0;
}


As shown in the above code, use the same anonymous structure to create a structure pointer ps and store it in the address of s1.

At first glance, it seems OK, that is, although the members of two anonymous structures are the same, the compiler defaults to two structure types. Both belong to two types of variables, so it will prompt that the pointer types are incompatible.

#### Structure self reference

How can variables of their own type be referenced in a structure? Or is it feasible to include member variables of its own type when defining a structure?

##### Example 1
struct Node
{
int data;
struct Node next;
};


If feasible, how to calculate the size of the variable created by the structure? A structure variable of the same type is nested by itself. If it is nested all the time, the size cannot be calculated. So it's obviously wrong.

##### Example 2
struct Node
{
//Data domain
int data;
//Pointer field
struct Node* next;
};


Similar to the use of linked lists in data structures, pointers are used to store the address of the next node. As shown in the figure:

Data is called the data field, and the structure pointer next is called the pointer field. This is the correct method for structure self reference.

##### be careful
typedef struct {
int data;
Node* next;

}Node;


typedef names the structure Node, but this does not take effect until the structure is defined, so the compiler cannot recognize the type name in the structure definition.

typedef struct Node {
int data;
struct Node* next;
}Node;


The above scheme is the correct method. Of course, the typedef renaming here is still meaningless for the structure definition part, so the function of anonymous structure is very weak, no matter.

#### Definition of structure variables

struct Point {
int x;
int y;
}p2 = { 3,3 }, p3 = { 4,4 };
struct Point p4 = { 1,2 };

struct S {
double d;
struct Point p;
char n[10];
};

int main()
{
struct Point p1 = { 3,4 };
struct S s = { 3.14, {1,1}, "zhangsan"};
printf("%lf %d,%d %s\n", s.d, s.p.x, s.p.y, s.n);
return 0;
}

• p1,p2, P3 and P4 are defined and initialized in different ways.

p1 is the local variable defined inside the main function, p2 and p3 are the global variables defined in the structure variable list, and p4 is the global variable defined directly with the structure type.

• A p variable is nested inside the structure s.

#### Structural transmission parameters

In the early stage of C language, two methods of structure parameter transfer have been introduced: value transfer call and value transfer call. For example:

struct S {
int data[1000];
int num;
};
void Print1(struct S tmp) {
for (int i = 0; i < 10; i++) {
printf("%d ", tmp.data[i]);
}
printf("\n%d\n", tmp.num);
}
void Print2(struct S* ps) {
for (int i = 0; i < 10; i++) {
printf("%d ", ps->data[i]);
}
printf("\n%d\n", ps->num);
}
int main()
{
struct S s = { {1,2,3,4,5,6,7,8,9,10},100 };
Print1(s);
Print2(&s);
return 0;
}


Through the structure memory alignment, it can be seen that when the structure is called by value, it not only opens up a space of surface size. If the structure is too large, the system overhead of parameter stack pressing will be large, resulting in performance degradation. Struct parameter passing or selective address calling.

#### Structure memory alignment

After mastering the basic use of structure, further explore the size and memory space of structure. Then we have to study an inevitable problem: memory alignment. The memory alignment is particularly obvious in the structure, so it is also called structure memory alignment.

struct S1 {
char c1;
int a;
char c2;
};
struct S2 {
char c1;
char c2;
int a;
};
int main()
{
struct S1 s = { 'x',100,'y' };
printf("%d\n", sizeof(struct S1));//12
printf("%d\n", sizeof(struct S2));//8
return 0;
}


From the above code, we can see that the positions of the member variables of the structure are different, and the size of the structure is also different. As for the reasons, see the following breakdown:

##### Memory alignment rules
1. The first member of the structure is always at a position where the structure is offset by 0 from the starting position.

That is, the first member must be placed in the first position of the memory space opened up for the structure.

1. Starting from the second member, place each at an offset of an integer multiple of the alignment number of the variable. The alignment number is the smaller value of the variable's own size and the compiler's default alignment number.

There is no default alignment number in Linux environment, and the alignment number in Windows environment is 8. Generally, the bytes occupied by non variable types are greater than 8, so the alignment number is generally the size of the variable itself.

1. The total size of the structure must be an integer multiple of the maximum number of alignments for all member variables.

The author guesses that it is to gather the integer multiple of the read field width, so as not to cause unnecessary trouble by allowing the variables created later to follow.

1. If the structure is nested, the embedded structure is aligned to an integer multiple of the maximum number of alignments of its members, and the total size of the overall structure must be an integer multiple of the maximum number of alignments of its members.

It can be deduced from Article 3 that the embedded structure and the whole structure are also structures, and they should be aligned to an integer multiple of the maximum alignment number of their member variables. In general, the largest member variable of the whole structure is the embedded structure.

Offset: the number of bytes from the starting position, equivalent to the subscript position. For example, the offset of the first byte is 0 and the offset of the second byte is 1.

Now let's look at the above example:

//1.
struct S1 {
char c1;
int a;
char c2;
};
//2.
struct S2 {
char c1;
char c2;
int a;
};

• c1 is placed at the address with offset of 0. If the alignment number of a is 4, 3 bytes are wasted at the address with offset of 4, followed by c2. The alignment number is 1 and the offset is any position, so it follows closely. There are 9 bytes in total, but the total size needs to be a multiple of 4, so another 3 bytes are wasted, a total of 12 bytes.
• The alignment number of c1 and c2 is 1, and then the alignment number of a is 4, so it is placed at the address with offset of 4, which is exactly 8 bytes in total.

###### Example

Find the size of the variable created by the following structure.

//3.
struct S3 {
double d;
char c;
int a;
};
//4.
struct S4 {
char c1;//1
struct S3 s;//8
double d;//8
};


• The memory occupied by the nested structure is 16 bytes, but its maximum alignment number is 8, so the maximum alignment number of member variables of the whole structure is 8.
##### Reasons for memory alignment

This memory alignment mechanism seems to waste space and make the calculation cumbersome, but its existence is necessary. Although there is no official explanation, it can also be summarized as the following two points:

1. Reasons for transplantation

Not all hardware platforms can arbitrarily read any data on the address. Some platforms can only read specific data at specific addresses in specific ways. For example, read only at the multiple of address 4, and read 4 bytes of data at a time. Poor portability between platforms.

1. Performance reasons

The data shall be stored on the natural boundary of the address and aligned as much as possible to prevent the data in the same space from being accessed twice and improve the efficiency of reading data.

The conclusion is that memory alignment is to sacrifice space complexity, reduce time complexity and exchange space for time. Of course, what we have to do is try our best to save both space and time

Different variables in the structure are placed in different positions, and the size of the structure is different. It can save space to a certain extent by concentrating members with small space in the back.

##### Modification of default alignment number
//Sets the default number of alignments
#pragma pack(n);

struct Tag {
member_list;
};

//Restore default alignment number
#pragma pack();


The default alignment number can be modified. Set it before use and cancel it after use. When you think the default alignment number of the structure is inappropriate, you can set it yourself. Simultaneous alignment number n n n is generally set to 2 n 2^n 2n .

###### Example

The macro calculates the offset of a variable in the structure relative to the first address.

#include <stddef.h>
struct S1 {
char c1;
int a;
char c2;
};
int main()
{
printf("%d\n", offsetof(struct S1, c1));
printf("%d\n", offsetof(struct S1, c2));
printf("%d\n", offsetof(struct S1, a));
return 0;
}


### Bit segment

#### Definition of bit segment

The declaration of a bit segment is similar to that of a structure, but there are two differences.

1. Different types: the members of bit segments must be integer variables, such as char,int,unsigned int, etc.
2. The writing method is different: after the member name of the bit field, use: and numbers to specify the allocated space. For example:
struct A {
int _a : 2;
int _b : 5;
int _c : 10;
int _d : 30;
};


The size of bit segment A is calculated as 8, and the minimum of 4 integer variables is 16 bytes. It shows that bit segments can save space to A certain extent.

The "bit" in the bit field represents the binary bit, and the number after: represents the number of bits allocated by the system to the variable.

When describing an object, all bits in the attribute variable may not be used. The use of bit segments can specify the space allocated to the variable by the system. Of course, too large data will still overflow.

#### Memory allocation for bit segments

• The system opens up space for bit segments according to member variable types, and opens up a space of variable type size at one time.

If the member is int type, 4 bytes will be opened at a time. If not enough, 4 bytes will be opened again. If it is char type, 1 byte is opened.

• Many uncertain factors are involved in the use of bit segments, and the program portability is poor, so bit segments are not cross platform.

As shown in the figure, first open up 4 bytes of space. A occupies 2 bits, b occupies 5 bits, and C occupies 10 bits. There are 15 bits left in these four bytes, which is not enough for d storage. We must open up a space of 4 bytes. This is the calculated eight bytes.

The question is whether d then half is stored in the first byte, half in the second byte, or all in the newly opened space?

Different compilation environments may produce different results, which is not specified in the C standard. The author only considers the Windows environment here. Please see the following examples.

struct S {
char a : 3;
char b : 4;
char c : 5;
char d : 4;
};
struct S s = { 0 };
int main() {
s.a = 10;
s.b = 12;
s.c = 3;
s.d = 4;
return 0;
}


The assignment of bit segment variables brings another problem: do you use high address or low address first in a single byte? This is also not specified in the standard.

Let's assume that the high address is used first and then the low address is used in the bit segment. At the same time, if the remaining space is insufficient, it will be abandoned and reopened. If the final result in vs is consistent with the expectation, the assumption is correct.

We write out the memory of the bit segment according to the assumption:
(   0110   0010   0000   0011   0000   0100   ) 2 ( 6    2 0 3 0    4 ) 10 (\ 0110\ 0010\ 0000\ 0011\ 0000\ 0100\ )_{2} \\(\quad 6 \;\quad 2 \qquad 0 \qquad 3 \qquad 0 \quad\; 4\quad)_{10} ( 0110 0010 0000 0011 0000 0100 )2​(620304)10​

Vs shows that the results are completely consistent with our hypothesis. Therefore, the assumption is correct. Therefore, it can be concluded that in vs environment:

1. The number of bytes to be opened up each time is determined by the type of member variable to be opened up.
2. When using memory, first use the low byte and then the high byte, and use it from high to low in a single byte.
3. When the developed memory space is insufficient, discard the remaining memory and reopen the space of type size.

Since these rules are not clearly defined in the C standard, these conclusions vary from compiler to compiler. Therefore, the platform portability of bit segment is poor.

#### Cross platform problem of bit segment

1. It is uncertain whether the highest bit of the int bit field is treated as a sign bit.
2. The number of bits occupied by the member type in the bit segment is uncertain.

In the early 16 bit machine, int accounted for 2 bytes, a total of 16 bits, and the bit number of variable allocation should not be more than the maximum value.

1. It is uncertain whether a bit segment member uses a high address or a low address in memory first.
2. When the developed memory space is insufficient, it is uncertain whether to abandon the remaining memory for RE development or then use the remaining memory.

#### Application of bit segment

Compared with the structure, the bit segment can achieve the same effect and save space, but it needs to be used carefully and has poor cross platform performance. The bit segment can be applied to the network protocol without wasting a lot of space. Every few bits in the network transmission protocol form a group for transmitting different data.

### Enumeration type

Enumeration, as the name suggests, lists one by one. There are many data that can be listed, such as gender, month, color, etc.

#### Definition of enumeration

enum Tag {
con1,
con2,
...
con3
};

• enum is the enumeration keyword and Tag is the enumeration object name;
• con1,con2,...,con3 are lists of enumeration constants.

At the same time, enumeration is equivalent to shaping constant, so all enumeration constants are 4 bytes.

//week
enum Day {
Mon,
Tues,
Wed,
Thur,
Fri,
Sat,
Sun
};
//Gender
enum Sex {
FAMALE,
MALE,
SECRET
};
//colour
enum Color {
RED,
GREEN,
BLUE
};


Enum day, enum sex and enum color defined above are enumeration types. {} is the possible value of enumeration type, that is, enumeration constant.

Enumeration constant values start from 0 by default and increase in sequence. You can also initialize (completely or incompletely) and assign an initial value to it. The constant value before the initialized constant is not affected, and the subsequent constants are still incremented in turn.

Of course, constants can only be initialized, not assigned.

//1.
enum Color c = GREEN;
//2.
enum Color c = 1;


The above operation is to create a variable of enumeration type and assign the value to GREEN.

C language is not so strict in syntax detection. So 1 and 2 are OK. In C + +, 1 is considered a literal constant and GREEN is an enumeration constant. Second, they are not equal, so they cannot be assigned.

1. Improve code readability and maintainability

#The constants defined by define are not as meaningful as enumeration constants, and enumeration constants are more rigorous with types.

1. Prevent naming pollution

#The constants defined by define belong to global constants and are easy to conflict.

1. Easy to debug

#The constant defined by define has been replaced during precompiling and no longer exists. Enumeration types always exist, and there are values and types for debugging.

1. Easy to use

Multiple constants can be defined at a time and easy to manage.

#### Use of enumerations

/*
* Calculator
* Use enumeration constants
* */
enum Option {
EXIT,//0
SUB,//2
MUL,//3
DIV,//4
};

int Add(int x, int y) {
return x + y;
}
int Sub(int x, int y) {
return x - y;
}
int Mul(int x, int y) {
return x * y;
}
int Div(int x, int y) {
return x / y;
}
void Calc(int (*pf)(int, int)) {
int a = 0;
int b = 0;
scanf("%d %d", &a, &b);
printf("%d\n", pf(a, b));
}

printf("*************************\n");
printf("**** 3.MUL *** 4.DIV ****\n");
printf("*******  0.exit  ********\n");
printf("*************************\n");
}
int main()
{
int input = 0;

do {
scanf("%d", &input);
switch (input) {
break;
case SUB:
Calc(Sub);
break;
case MUL:
Calc(Mul);
break;
case DIV:
Calc(Div);
break;
case EXIT:
break;
default :
printf("Selection error\n");
break;
}
} while (input);
return 0;
}


### Consortium

Union is a special custom type, which also contains a series of members, especially in that these members share the same space. Therefore, a consortium is also called a common body.

#### Definition of Association

union Un {
char c;//1
int i;//4
};
int main()
{
union Un u = { 0 };
printf("%d\n", sizeof(u));
return 0;
}


Calculate that the size of the union variable is 4 bytes, but an integer and character variable also needs at least 5 bytes. Why is this?

#### Characteristics of joint

printf("%p\n", &u);//00EFF934
printf("%p\n", &u.c);//00EFF934
printf("%p\n", &u.i);//00EFF934


As can be seen from the above code, C and I share 4 bytes.

• Change i will change c, change c will change i. Therefore, only one member can be used, and the other member will be modified.

• The union members share a space, so the size of the union variable is at least the size of the largest member.

##### Example

Use the consortium to judge the size of the current machine.

int check_sys() {
union U {
char c;
int i;
}u;
u.i = 1;
return u.c;
}
int main()
{
if (check_sys() == 1) {
printf("Small end storage\n");
}
else {
printf("Big end storage\n");
}
return 0;
}


#### Calculation of joint size

The consortium also has memory alignment. This memory alignment is simpler than the structure.

//1.
union Un1 {
char c[5];
int i;
};
//2.
union Un2 {
short c[7];
int i;
};
int main() {
printf("%d\n", sizeof(union Un1));
printf("%d\n", sizeof(union Un2));
return 0;
}

• The union variable size is at least the size of the largest member.
• When the maximum member size is less than an integer multiple of the maximum number of alignments, align to an integer multiple of the maximum number of alignments.

Because all members of the consortium share a space, after calculating the maximum member size, only a few more bytes of space need to be wasted in the end to align to an integer multiple of the maximum alignment number.

If the original size should be 5 bytes and the maximum number of alignments should be 4, align to 8 bytes.

If the original size should be 14 bytes and the maximum alignment number should be 4, it should be aligned to 16 bytes.

Posted by rupturedtoad on Thu, 23 Sep 2021 02:14:06 -0700