Structure, enumeration, union


This chapter mainly studies the custom types in C language, namely structure, enumeration and union.

structural morphology

C language itself has built-in types:

char 
short 
int 
long
long long
float
double

But how to describe complex objects in C language?

A structure is a collection of values called member variables. Each member of the structure can be a different type of variable.

Why structure?

There are many complex objects in life. For example, we need to describe a person, including name, age, height, weight, etc., describe a city, and include place names, floor area, tourist attractions, special snacks, etc. the built-in type of C language can not meet the requirements, so we need to use custom types to describe it.

Structure type declaration

struct tag
{
	member-list;
}variable-list;

struct - structure keyword
tag - structure label
Member list - structure member
Variable list - structure variable

Here, the type of structure is struct tag, and variable list can be omitted.

Let's define a Book structure type, Book

struct Book
{
	char name[20];
	char author[20];
	float price;
}b1;//Global variable b1

struct Book b2;//Global variable b2
int main()
{
	struct Book b3;//Local variable b3
	return 0;
}

Special declaration of structure type

When declaring a structure, it can be declared incompletely, that is, the structure type has no name, so such a structure can only create structure variables at the same time.

struct
{
	int a;
	char b;
	float c
}s1,s2;

Let's look at the following code:

//Anonymous structure type
struct
{
	int a;
	char b;
	float c;
}x;//Structure variable x

struct
{
	int a;
	char b;
	float c;
}a[20], *p;
//Structure array a, structure pointer p

The above two structures omit the structure tag when declaring. Let's see whether the following code is correct

p = &x;

The answer is wrong. The compiler will treat the above two declarations as completely different types. So it's illegal. Use caution with anonymous struct types.

Self reference of structure

Can a structure contain a member whose type is the structure itself?

for instance
The linked list in the data structure. Each node stores data and stores the data of the next node. We define the node structure in the following form

struct Node
{
	int data;
	struct Node next;
};

Is that ok?
The answer is no, why?
By this definition, the size of the struct Node is unknown.
So how to define it?

struct Node
{
	int data;
	struct Node* next;
};

We only need to remember the address of the next node. Because the size of the pointer is fixed, the size of the struct Node is fixed.

Let's look at the following code:

typedef struct
{
	int data;
	Node* next;
}Node;
//Is it feasible to write code like this?

The answer is No. the structure must exist before the structure is named. At this time, the structure is anonymous, and the type Node * of the structure member next does not exist at this time, so it is wrong to write this.

The correct writing method is as follows:

typedef struct Node
{
	int data;
	struct Node* next;
}Node;

Before renaming, the structure already has the name Node, so the type struct Node * of the structure member next already exists.

Definition and initialization of structure variables

struct Point
{
	int x;
	int y;
}p1; //Define the variable p1 while declaring the type

struct Point p2; //Define structure variable p2

//Initialization: define variables and assign initial values at the same time.
struct Point p3 = { 1, 2 };

struct Stu //Type declaration
{
	char name[15];
	int age; 
};
struct Stu s = { "zhangsan", 20 };//initialization

struct Node
{
	int data;
	struct Point p;
	struct Node* next;
}n1 = { 10, {4,5}, NULL }; //Structure nesting initialization

struct Node n2 = { 20, {5, 6}, NULL };//Structure nesting initialization

Structure memory alignment

How do we calculate the size of the structure?

Here, you must first master the alignment rules of the structure:

Type does not occupy space, and space will be opened up only by creating variables.

1. The first member of the structure is always placed at the address with an offset of 0 from the structure variable.

2. Other member variables shall be aligned to the offset address of an integer multiple of a number (alignment number).

  this alignment number = the smaller value of the compiler's default alignment number and the member size.

  • The default number of alignments in VS is 8
  • Linux does not have a default alignment number

3. The total size of the structure is an integer multiple of the maximum alignment number (each member variable has an alignment number).

4. If a structure is nested, the nested structure is aligned to an integer multiple of its maximum alignment number, and the overall size of the structure is an integer multiple of the maximum alignment number (including the alignment number of nested structures).

//Exercise 1
struct S1
{
	char c1;//The number of 1 / 8 alignments is 1
	int i;// The number of 4 / 8 alignments is 4
	char c2;// The number of 1 / 8 alignments is 1
};
printf("%d\n", sizeof(struct S1));

c1 is placed at an offset of 0
The alignment number of i is 4, so align to a position with an offset of an integer multiple of 4
c2 the alignment number is 1 and aligns to a position where the offset is an integer multiple of 1
The maximum number of alignments is 4, so the structure size is an integer multiple of 4

The result is 12.

//Exercise 2
struct S2
{
	char c1;
	char c2;
	int i;
};
printf("%d\n", sizeof(struct S2));

The alignment number of c1 is 1
The alignment number of c2 is 1
The alignment number of i is 4
The maximum number of alignments is 4 and the structure size is a multiple of 4

The result is 8.

//Exercise 3
struct S3
{
	double d;
	char c;
	int i;
};
printf("%d\n", sizeof(struct S3));

The alignment number of d is 8
The alignment number of c is 1
The alignment number of i is 4
The maximum number of alignments is 8, so the size of the structure is a multiple of 8

The result is 16.

//Exercise 4 - Structural nesting problem
struct S4
{
	char c1;//1/8  1
	struct S3 s3;//The maximum number of alignments is 8  
	double d;//8/8   8
};
printf("%d\n", sizeof(struct S4));

The c1 alignment number is 1
s3 is aligned to an integer multiple of its maximum alignment number (8), which is an integer multiple of 8
d the alignment number is 8
The maximum alignment number of structures is 8, so the size of structures is an integral multiple of 8


The result is 32.

Why byte alignment?

Most references say this:

  1. Platform reason (migration reason): not all hardware platforms can access any data at any address; Some hardware platforms can only get certain types of data at certain addresses, otherwise hardware exceptions will be thrown.

  2. Performance reason: data structures (especially stacks) should be aligned on natural boundaries as much as possible. The reason is that in order to access misaligned memory, the processor needs to make two memory accesses; Aligned memory access requires only one access. For example, a 32-bit processor can process 4 bytes of data at a time. For the following structure types

struct S
{
	char c;
	int n;
};

If there is no alignment, the data is stored continuously in memory

At this time, if we read the data, first read 4 bytes, the first byte is c, the next 3 bytes are the first 3 bytes of N, and then read 4 bytes. At this time, we will read the data of the last byte of N. therefore, we need to read the data of n twice to get the data of n

In case of byte aligned storage

Read 4 bytes for the first time and get c. 3 bytes are wasted. Read n for the second time, so you only need to get n once.

So overall:
Memory alignment of structures is a method of trading space for time.

When designing structures, we should not only meet the alignment, but also save space. What should we do? Let the members with small space gather together as much as possible

struct S1
{
	char c1;//1/8  1
	int i;//4/8    4
	char c2;//1/8   1
};
//1+3+4+1+3 = 12


Wasted 6 bytes of space

Look at the code again

struct S2
{
	char c1;//1/8  1
	char c2;//1/8  1
	int i;//4/8    4
};
//1+1+2+4=8


S1 and S2 type as like as two peas, but the size of S1 and S2 occupies some difference, and S2 saves memory space.

Modify the default number of alignments

Use #pragma preprocessing instructions to modify the default alignment number

#pragma pack(8) / / set the default alignment number to 8
struct S1
{
	char c1;
	int i;
	char c2;
};
#pragma pack() / / unset the default alignment number and restore it to the default

#pragma pack(1) / / set the default alignment number to 8
struct S2
{
	char c1;
	int i;
	char c2;
};
#pragma pack() / / unset the default alignment number and restore it to the default

int main()
{
	//What is the output?
	printf("%d\n", sizeof(struct S1));//12
	printf("%d\n", sizeof(struct S2));//6
	return 0;
}

When the alignment of the structure is inappropriate, we can change the default alignment number. The default alignment number is generally set to the nth power of 2.

Macro offsetof

Calculates the offset of the members of the structure from the first address
offsetof is not a function, but a macro

size_t offsetof( structName, memberName );
The parameter structName is the type name

The header file is < stddef. H >

#include <stddef.h>
struct S1
{
	char c1;
	int i;
	char d;
};


int main()
{
	printf("%d\n",offsetof(struct S1,c1));//0
	printf("%d\n",offsetof(struct S1,i));//4
	printf("%d\n",offsetof(struct S1,d));//8
	return 0;
}

Structural transmission parameters

In the basic part of c, we have learned the content of structure parameter transmission. Let's review it again

struct S
{
	int data[1000];
	int num;
};

//Structural transmission parameters
void print1(struct S tmp)
{
	int i = 0;
	for (i = 0; i < 10; i++)
	{
		printf("%d ",tmp.data[i]);
	}
	printf("\nnum = %d\n",tmp.num);
}

//Structure pointer transfer parameter
void print2(struct S* ps)
{
	int i = 0;
	for (i = 0; i < 10; i++)
	{
		printf("%d ",ps->data[i]);
	}
	printf("\nnum = %d\n",ps->num);
}


int main()
{
	struct S s = { {1,2,3,4,5,6,7,8,9,10},100 };
	print1(s);//pass by value
	print2(&s);//Address transfer
	return 0;
}

When a function passes parameters, the parameters need to be pressed on the stack, which will have system overhead in time and space.
If the structure is too large when passing a structure object, the system overhead of parameter stack pressing is relatively large, which will lead to performance degradation. So to improve efficiency, we pass the address of the structure.
When the structure pointer is passed, if we are not allowed to modify the content of the structure inside the function, the formal parameter can be defined as const struct S* ps, so const is used to protect the members of the structure.

Conclusion: when the structure passes parameters, the address of the structure should be passed.

Bit segment

Structure implements a bit segment (bit field).
"Bit" in the bit field refers to binary bits.

The declaration and structure of bit segments are similar, with two differences:
1. The member of bit segment must be int, unsigned int or signed int (integer family can be).
2. There is a colon and a number after the member name of the bit field.

struct A
{
	int _a : 2;//2 bit
	int _b : 5;//5 bit
	int _c : 10;//10 bit
	int _d : 30;//30 bit
};

There are 47 bit s in total, which can be stored with 8 bytes.
Bit segments can save space.

Memory allocation for bit segments

  1. The member of the bit segment can be int unsigned int signed int or char (belonging to the shaping family).
  2. The space of bit segments is opened up in the form of 4 bytes (int) or 1 byte (char) as needed.
  3. Bit segment involves many uncertain factors. Bit segment is not cross platform. Pay attention to portable programs and avoid using bit segment

What is the result of the following code?

struct S
{
	char a:3;
	char b:4;
	char c:5;
	char d:4;
};

int main()
{
	struct S s = {0};
	s.a = 10;
	s.b = 12;
	s.c = 3;
	s.d = 4;
	return 0;
}

Here, the member variable a accounts for 3 bits, b for 4 bits, c for 5 bits and d for 4 bits. There are 16 bits in total. Theoretically, two bytes are just enough. Let's analyze it. Suppose a byte of space is used from right to left. Suppose that when the remaining space of a byte is not enough to put down the following members, we will reopen a byte, The remaining space of the previous byte is wasted. According to our assumptions, draw the following figure

s.a = 10;//1010
s.b = 12;//1100
s.c = 3;//0011
s.d = 4;//0100

a occupies 3 bit s, so truncation occurs and 010 is stored in the corresponding position. The same is true for B, C and D


According to our hypothesis, the memory should be stored in hex ox620304, so is it? We can know by running the code test under VS2019. Run the following code:

struct S
{
	char a : 3;
	char b : 4;
	char c : 5;
	char d : 4;
};

int main()
{
	struct S s = { 0 };
	s.a = 10;
	s.b = 12;
	s.c = 3;
	s.d = 4;
	printf("%#x\n",*((char*)&s));
	printf("%#x\n",*((char*)&s+1));
	printf("%#x\n",*((char*)&s+2));
	return 0;
}

Check the memory and find that the data stored in the memory and the printed results are consistent with our assumptions.

In other words, in VS2019 integrated development environment, for the use of bit segments, the space of a byte is used from right to left. When the remaining space of a byte is insufficient to store the following members, the remaining space will be discarded and the storage space will be reopened. So how do you use bit segments in other compilers?

Cross platform problem of bit segment

  1. It is uncertain whether the int bit field is treated as a signed number or an unsigned number.
  2. The number of the largest bits in the bit segment cannot be determined. (16 bit machines are 16 at most and 32-bit machines are 32 at most. Writing 27 on 16 bit machines will cause problems.)
  3. Whether members in the bit segment are allocated from left to right or from right to left in memory has not been defined.
  4. When a structure contains two bit segments, and the member of the second bit segment is too large to accommodate the remaining bits of the first bit segment, it is uncertain whether to discard the remaining bits or use them.

Therefore, the use of bit segments may be different under different compilers, and the specific use is determined by the compiler.

Summary:
Compared with the structure, bit segment can achieve the same effect, but it can save space (advantages), but there are cross platform problems (disadvantages: write different code for different platforms).

Application of bit segment

  encapsulation of data in network

enumeration

Enumeration is to enumerate one by one.

List the possible values one by one.

For example, in our real life:
Monday to Sunday of a week is a limited seven days, which can be listed one by one.
Gender: male, female and confidential, which can also be listed one by one.
The month has 12 months, which can also be listed one by one
Colors can also be listed one by one.
Here you can use enumeration to describe.

Definition of enumeration type

enum is a keyword of C language. It is used to define enumeration data types. Enumeration describes a set of integer like values.

In the elementary C language, we have learned that there are four kinds of constants: literal constants, const modified constant variables, #define defined identifier constants and enumeration constants.

Enumeration type is the substitution of constant defined by preprocessing instruction #define. Enumeration is very similar to macro. Macro replaces name with corresponding value in preprocessing stage and enumeration replaces name with corresponding value in compilation stage. Enumeration can be understood as macro in compilation stage.

How to define enumeration types?

enum TypeName
{
	member1,
	member2,
	...;
};

enum typeName is the name of an enumeration type. In curly braces are all possible values of an enumeration type. It is called an enumeration member (also called an enumeration constant). It is a constant rather than a variable. Therefore, they cannot be assigned values, and their values can only be assigned to other variables.

Let's look at the following definitions:

enum Day//week
{
	Mon,
	Tues,
	Wed,
	Thur,
	Fri,
	Sat,
	Sun
};


Define enumeration variables

With enumeration types, how to define enumeration variables?

Method 1: first define the enumeration type, and then define the enumeration variable

enum Sex//Gender
{
	MALE,
	FEMALE,
	SECRET
};

int main()
{
	enum Sex s;
}

Method 2: define enumeration variables while defining enumeration types

enum Sex//Gender
{
	MALE,
	FEMALE,
	SECRET
}s;

Special declaration of enumeration type

When an enumeration type and an enumeration variable are defined together, the name of the enumeration type can be omitted without writing, which is similar to the anonymous structure declaration. At this time, the enumeration type has no name, so the enumeration variable can only be defined at the time of declaration, otherwise there is no way to define the enumeration variable later.

enum//colour
{
	RED, 
	GREEN,
	BLUE
}c;

Characteristics of enumeration

1. By default, the value of enumeration constant starts from 0 and increases by 1

enum week
{
	Mon,//0
	Tues,//1
	Wed,//2
	Thur,//3
	Fri,//4
	Sat,//5
	Sun//6
};

int main() 
{
	printf("%d\n",Mon);//0
	printf("%d\n",Tues);//1
	printf("%d\n",Wed);//2
	return 0;
}

2. You can specify the value of an enumeration constant. For enumeration constants without a specified value, the default increment is 1

enum Color
{
	RED = 2,
	BLUE,
	GREEN = 5,
	WHITE
};

int main()
{
	printf("%d\n",RED);//2
	printf("%d\n",BLUE);//3
	printf("%d\n",GREEN);//5
	printf("%d\n",WHITE);//6
	return 0;
}

3. The value of enumeration constant is an integer, but not an integer. Enumeration constant is an enumeration type.

Enumeration itself is a type. The number of bytes occupied by an enumeration constant is 4 bytes, which is exactly the same as that occupied by variables of type int. it is obviously inappropriate to explain these enumeration constants as integer, character or other types, because enumeration type is a basic data type. Enumeration members are constants and cannot be assigned (can be initialized).

4. The value of an enumeration variable can only be an enumeration constant
The type of enumeration constant is not integer. Both enumeration constant and enumeration variable are enumeration types. Therefore, integer values cannot be directly assigned to enumeration variables. Forced type conversion is required.

enum Grade
{
	A,
	B,
	C,
	D,
	E
};

int main()
{
	enum Grade g = (enum Grade)1;
	printf("%d\n",g);
	return 0;
}

5. Different enumeration members in the same enumeration type can have the same value

enum Sex
{
	MALE,
	FAMLE,
	SECRET = 1
};


int main()
{
	printf("%d\n",FAMLE);//1
	printf("%d\n",SECRET);//1
	return 0;
}

6. Enumeration types with the same name cannot be defined in the same program, and enumeration members with the same name cannot exist in different enumeration types, that is, only one enumeration constant with the same name can exist.

The scope of enumeration constants in the enumeration list is global. More strictly, they are inside the main() function. Therefore, variables with the same name cannot be defined, otherwise they will be redefined.

7. Range of numeric values of enumeration type
The size of the enumeration constant is 4 bytes, which is exactly the same as the number of bytes occupied by variables of type int. at the same time, the value of the enumeration constant can be understood as an integer constant (not accurate). Let's test it

Upper test limit:

#include <limits.h>
enum Color
{
	RED = INT_MAX,
};


int main()
{
	enum Color c = RED;
	printf("%d %d\n",INT_MAX,c);
	return 0;
}



Description the maximum value of enumeration type is INT_MAX.

Lower test limit:

#include <limits.h>
enum Color
{
	RED = INT_MIN,
};


int main()
{
	enum Color c = RED;
	printf("%d %d\n",INT_MIN,c);
	return 0;
}


The minimum value of enumeration type is INT_MIN.
Therefore, the scope of enumeration type is INT_MIN ~ INT_MAX.

Advantages of enumeration

Why use enumeration?

We can use #define to define constants. Why do we have to use enumeration?

Advantages of enumeration:
1. Increase the readability and maintainability of the code

2. Compared with #define defined identifiers, enumeration has type checking, which is more rigorous.
#The identifier constant defined by define has no type, but the enumeration constant belongs to enumeration type and has type check

3. Prevent naming pollution (packaging)
#The identifier constant defined by define is exposed in the global scope

4. Convenient for debugging
#The identifier constant defined by define is completely replaced in the preprocessing stage, so the identifier constant cannot be found during debugging, only the corresponding value. The macro replaces the name with the corresponding value in the preprocessing stage, and the enumeration replaces the name with the corresponding value in the compilation stage. Enumeration can be understood as a macro in the compilation stage.

See the following code:

When we debug the code, we can find that the value of MON can be viewed during debugging, and MON is of week type
Look at the following code:


For the identifier constant defined by #define, we cannot find it during debugging because it has been replaced by a value in the preprocessing stage.

Enumeration constants do not occupy the memory of the data area, but are directly compiled into the command and put into the code area, so you can't try to use & to get their addresses.

5. Easy to use, multiple constants can be defined at a time

Use of enumerations

void menu()
{
	printf("*****1.add    2.sub****\n");
	printf("*****3.mul    4.div****\n");
	printf("*****0.exit        ****\n");
}

int add(int a,int b)
{
	int c = 0;
	c = a + b;
	return c;
}

int sub(int a,int b)
{
	int c = 0;
	c = a - b;
	return c;
}

int mul(int a, int b)
{
	int c = 0;
	c = a * b;
	return c;
}

int div(int a,int b)
{
	int c = 0;
	c = a / b;
	return c;
}

enum choice
{
	EXIT,
	ADD,
	SUB,
	MUL,
	DIV
};
int main()
{
	int choice = 0;
	do
	{
		menu();
		printf("Please select:>");
		scanf("%d", &choice);
		printf("Please enter two integers:>");
		int a = 0;
		int b = 0;
		int c = 0;
		scanf("%d %d", &a, &b);
		switch (choice)
		{
		case ADD:
			c = add(a, b);
			break;
		case SUB:
			c = sub(a, b);
			break;
		case MUL:
			c = mul(a, b);
			break;
		case DIV:
			c = div(a, b);
			break;
		case EXIT:
			printf("sign out!\n");
			break;
		default:
			printf("Incorrect input!\n");
			break;
		}
		printf("The result is:%d\n",c);
	} while (choice);
	
	return 0;
}

The case keyword must be followed by an integer constant or an integer constant expression. The enumeration constants add, sub, Mul, div and exit will eventually be replaced with an integer, so they can be placed after the case.

Consortium (Consortium)

Definition of union type

Union is also a special user-defined type. The variables defined by this type also contain a series of members. The characteristic is that these members share the same space (so union is also called common body).

//Declaration of union type
union Un
{
char c;
int i;
};
//Definition of joint variables
union Un un;
//Calculate the size of multiple variables
printf("%d\n", sizeof(un));

The result is 4

union U
{
	char c;
	int i;
};

int main()
{
	union U u = { 1000 };
	//1000 
	//Original code: 00000000 00000000 00000011 11101000

	printf("%d\n",u.i);
	printf("%d\n",u.c);//-24
	
	//Stored in u.c in memory: 11101000
	//Printing u.c with a signed number of% d requires integer promotion, because the type of c is char, which is a signed number,
	//Integer lifting: lifting with sign bits
	//11111111111111111111111111111111111111101000 - complement
	//10000000 0000000 00010111 - inverse code
	//10000000 00000000 00011000 - original code
	//-24

	return 0;
}

Assume small end storage

Characteristics of joint

The members of the union share the same memory space. The size of such a union variable is at least the size of the largest member (because the union must be able to save the largest member at least). The size of the union > = the size of the largest member.

union Un
{
	int i;
	char c;
};
int main()
{
	union Un un;
	// Is the output below the same?
	printf("%d\n", &(un.i));
	printf("%d\n", &(un.c));
	//What is the output below?
	un.i = 0x11223344;
	un.c = 0x55;
	printf("%x\n", un.i);
	return 0;
}

Assume small end storage


&(un.i) = &(un.c)

un.i= 0x11223355

Joint classic example

Determine the size of the current computer

Use union

union U
{
	char c;
	int i;
};

int main()
{
	union U u;
	u.i = 1;
	if (u.c == 1)
	{
		printf("Small end storage\n");
	}
	else
	{
		printf("Big end storage\n");
	}
	return 0;
}

Size side: only when the size of a data is greater than 1 byte can it be said as the size side.

Calculation of joint size

1. The size of the union is at least the size of the largest member;
2. When the maximum member size is not an integer multiple of the maximum alignment number, it should be aligned to an integer multiple of the maximum alignment number.

Let's look at the following code:

union Un1
{
	char c[5];// 1/8  1
	int i;// 4/8   4
};

Because the number of elements of array c is 5, the array is equivalent to storing 5 char type variables, so the alignment number is 1 and the alignment number of i is 4. Therefore, the maximum alignment number of the consortium is 4 and the maximum member size is 5. However, 5 is not an integer multiple of 4, so the size of the consortium is 8

union Un2
{
	short c[7];//2/8 2
	int i;//4/8   4
};

Array c is equivalent to seven short variables. The alignment number is 2, the alignment number of i is 4, and the maximum member size of the consortium is 14. 14 is not an integer multiple of 4, so the size of the consortium is 16

Comparison between structure and consortium:
1. The member space of the structure is independent of each other, and the members of the consortium share a space.

2. For different members of the structure, the assignment is not affected by each other; When assigning a value to any variable of the consortium, other variables are overwritten, that is, only the newly assigned variable exists, and the others are overwritten.

3. Both structure and consortium are applicable to situations where a group of variables need to be considered together. Structure is convenient for describing multiple attributes of an object, but it occupies a large memory space; The consortium has only one variable at the same time, which occupies less resources, but it is not convenient to use.

4. Byte alignment exists in both structure and union.

End of this chapter.

Posted by samohtwerdna on Thu, 07 Oct 2021 22:03:30 -0700