[C language] super detailed explanation of string function & & memory function (detailed explanation + code demonstration + simulation implementation)

Keywords: C C++ Programming Programmer

This article mainly introduces some commonly used string functions and memory operation functions. If there is a problem with what is written, please comment

catalogue

preface

1, strlen

First, experience it through the following code

  Function declaration

The following error demonstration

Simulation Implementation

About const

On the validity of function parameters

2, strcpy

Let's experience it first through the following code

Function declaration

Simulation Implementation

be careful

3, strcat

Experience it first through the following code

  Function declaration

Incorrect usage

  Simulation Implementation

  4, strcmp

Upper Code:

Function prototype

Simulation Implementation

  5, strstr

Sample code

Function prototype

Simulation Implementation

6, strtok

Function declaration

Code demonstration

Specific process of segmentation

How to continue to look back from the last position?

Disadvantages of strtok

7, memcpy

Function declaration

Code demonstration

Simulation Implementation

8, memmove

Memory space coverage

Function declaration

Simulation Implementation

9, memcmp

Function declaration

Code demonstration

Simulation Implementation

10, memset

Function declaration

Code demonstration

  Simulation Implementation

preface

The library functions in C language have been written by others, and we can use them directly, which reduces the development threshold and improves the development efficiency

We should not only be able to use these functions here, but also know how to implement them internally, and we should also be able to implement them ourselves.

When learning library functions, you should check the official documents and see the description of the functions in the documents to better learn library functions

Two C language online documents are recommended:

cplusplus.com - The C++ Resources Network

cppreference.com

If you don't understand anything when learning C language library functions, you can help yourself understand library functions by querying documents

Library function classification:

  1. Standard library functions   (C language comes with)
  2. Third party library functions (to use, additional download and installation are required)

1, strlen

This function is used to find the length of the string

First, experience it through the following code

#include <stdio.h>
#include <string.h>
int main() {
	char ch[] = "abcdef";
	int len = strlen(ch);
	printf("len = %d\n", len);   
	return 0;
}

Operation results:

  Function declaration

size_t strlen (const char* str);

Header file: < string. H >

  The string ends with '\ 0'. The strlen function returns the number of characters (excluding '\ 0') that appear before '\ 0' in the string. The string pointed to by the parameter str must end with '\ 0'. Otherwise, this operation is not defined in the standard, and the running result is a random value

Note that the return value of the function is size_t, which is unsigned. If we use size_t in writing code, we must pay more attention, otherwise we may write a bug

The following error demonstration

#include <stdio.h>
#include <string.h>

int main() {
	char ch[] = {'h','e','l','l','o'};
	int len = strlen(ch);
	printf("len = %d\n", len);   
	return 0;
}

analysis:

This is a problematic code. If there is no '\ 0' in the required character array, it is not a string, but a character array. strlen cannot be used to calculate the length, because this is an undefined behavior in the standard, and the result is unpredictable

strlen starts counting from the character 'h' pointed to by the parameter ch, but does not meet '\ 0', counting all the way back. Finally, the access has exceeded the boundary, and it is still counting back until an ascii code 0 with a value of '\ 0' may be followed, resulting in the error of the obtained value

  Operation results:

To avoid errors, we can manually add a '\ 0' to it, so that there will be no cross-border access  

Simulation Implementation

Next, let's simulate and implement the strlen function ourselves

#include <stdio.h>
#include <assert.h>
size_t my_strlen(const char* arr) {
	assert(arr != NULL);   //Assert
	int i = 0;
	int len = 0;
	while (arr[i] != '\0') {
		i++;
		len++;
	}
	return len;
}
int main() {
	//Simulated implementation of strlen
	char arr[] = "abcdefgh";
	printf("%d\n", my_strlen(arr));
	return 0;
}

Code run:

Function simulation analysis:  

When simulating the implementation, we still use the size_t type for the return value of the function, because the string length can't be negative

When calculating the length of a string, we will not change the length of the string, so we use const char * type here

At the beginning of the function, assert with assert to check the validity of the pointer, and then we use the str pointer to traverse the whole string. In this process, len is used to count until the end of '\ 0', the length of the string is calculated, and then len is returned as the return value

About const

const has three cases here

  1. const char* str
  2. char const* str
  3. char* const str

In these three cases, the first two cases are the same, so that the content pointed to by the pointer does not change, and the third case is to make the pointer variable itself do not change

In this function, only the length of the string is calculated, and the content pointed to by the pointer will not be changed, so the parameter str is decorated with const

On the validity of function parameters

When implementing a function, we must check the validity of parameters (very important). There are two ways:

  1. if statement
  2. Assert

It is better to use assertion. If the condition of assertion is false, the program will directly crash and give an error message; if the condition of assertion is true, the program will continue to execute.

2, strcpy

This function is used to copy strings

Let's experience it first through the following code

#include <stdio.h>
#include <string.h>
int main() {
	char ch[20] = "abcd";
	char ch2[] = "efg";
	strcpy(ch, ch2);
	printf("%s\n", ch);
	return 0;
}

Operation results:

  This code copies the contents of ch2 string into ch through strcpy function (along with '\ 0')

Function declaration

char* strcpy (char* destination, const char* source);

From this function declaration, we can see the information of this function

Return value: char*       Returns the destination address after copying

Parameters:

destination: this pointer points to the target string to copy

Source: this pointer points to the source string to be copied   (it will not change during copying, so const is added)

strcpy copies the contents of the source to the destination

Simulation Implementation

#include <stdio.h>
#include <string.h>
#include <assert.h>
char* my_strcpy(char* dest, const char* src) {
	assert(dest != NULL);
	assert(src != NULL);
	assert(*src != '\0');
	int i = 0;
	while (src[i] != '\0') {
		dest[i] = src[i];
		i++;
	}
	dest[i] = '\0';
	return dest;
}

int main() {
	char arr1[100] = "hello";
	char arr2[100] = "abc";
	my_strcpy(arr1, arr2);
	printf("%s\n", arr1);
	return 0;
}

Analog code analysis:

First of all, I won't say much about the parameters of the function, which is consistent with the original function

In the function implementation, assert first to check the validity of the pointer

During copying, pay attention to the conditions for jumping out of the loop, * SRC! = '\ 0';

Returns the address of the target string at the end

Why return the address of the target string:

  To enable chain access

be careful

  1.   strcpy function is copied together with '\ 0' when copying
  2. When using, the memory space corresponding to the destination is required to be large enough to accommodate the string pointed to by src. If the memory space of the destination is insufficient, the boundary will be crossed, resulting in undefined behavior

3, strcat

This function is used to splice strings

Experience it first through the following code

#include <stdio.h>
#include <string.h>int main() {
	char ch[100] = "abcdef";
	char ch2[] = "ghi";
	strcat(ch, ch2);     //Append the contents of ch2 to ch
	printf("%s\n", ch);
	return 0;
}

Operation results

  Function declaration

char* strcat (char* destination, const char* source);

strcat information

Return value: returns the string address after string splicing

Parameters:

Destination: points to the destination string

source: points to the string to be appended to dest. It is modified with const and cannot be modified

Incorrect usage

#include <stdio.h>
#include <string.h>
int main() {
	char ch[] = "abcdef";
	char ch2[] = "hello";
	strcat(ch, ch2);
	printf("%s\n", ch);
	return 0;
}

  Operation results:

  This is because ch is only 7 bytes in size and cannot hold the spliced string

be careful:

The memory space corresponding to destination should be large enough to accommodate the final splicing results

  Simulation Implementation

#include <assert.h>
#include <stdio.h>
#include <string.h>
char* myStrcat(char* dest, const char* src) {
	// dest should be large enough
	assert(dest != NULL);
	assert(src != NULL);    //Add assertions to verify the validity of pointers
	int i = 0;
	while (dest[i] != '\0') {
		i++;
	}   //Found '\ 0' of dest
	int j = 0;
	while (src[j] != '\0') {
		dest[i] = src[j];
		i++;
		j++;
	}
	dest[i] = '\0';
	return dest;
}

int main() {
	char arr1[100] = "abcd";
	char arr2[100] = "efg";
	myStrcat(arr1, arr2);
	printf("%s\n", arr1);
	return 0;
}

Operation results:

  4, strcmp

This function is used to compare two strings (comparison rule: "dictionary order" -- word sorting method of English Dictionary)

Start the comparison from the first character of the two strings. If they are the same, continue to compare the next until the comparison result is obtained

The comparison result is determined by the return value of strcmp

Upper Code:

#include <stdio.h>
#include <string.h>
int main() {
	char ch[] = "abc";
	char ch2[] = "ahello";
	int ret = strcmp(ch, ch2);
	printf("%d\n", ret);
	return 0;
}

Operation results

  analysis

Here, strcmp is used to compare the two strings ch and ch2

First, compare the first elements a and a, which are equal. Continue to compare the next one. It is found that b is smaller than h, so the string comparison result is that ch is smaller than ch2, and - 1 is returned

Function prototype

int strcmp (const char* str1, const char* str2);

Return value: int type

str1    >    str2       Return 1

str1     <      str2       Return - 1

str1     ==    str2       Return 0

Parameters: str1 and str2 point to the two strings to be compared (the function only compares the two strings without changing the string, so const is added)

Note: the ascii code value of '\ 0' is 0, so '\ 0' will also participate in the comparison

Simulation Implementation

#include <stdio.h>
#include <string.h>
#include <assert.h>
int myStrcmp(const char* str1, const char* str2) {
	assert(str1 != NULL);
	assert(str2 != NULL);
	while (*str1 != '\0' && *str2 != '\0') {
		if (*str1 > *str2) {
			return 1;
		}
		else if (*str1 < *str2) {
			return -1;
		}
		else {
			str1++;
			str2++;
		}
	}
	if (*str1 > *str2) {
		return 1;
	} 
	else if (*str1 < *str2) {
		return -1;
	}
	else {
		return 0;
	}
}

int main() {
	char arr1[] = "abcd";
	char arr2[] = "abcd";
	printf("%d\n", myStrcmp(arr1, arr2));
	return 0;
}

Operation results:

  5, strstr

This function is used for string matching

It is to find another string (substring) from one string (main string) and return the position where the substring of the main string first appears

For example, to find "de" in "abcdef", start from the first character of the main string until "de" is found, and then the address of "de" in the main string will be returned

Sample code

#include <stdio.h>
#include <string.h>
int main() {
	char ch[] = "abcdef";
	char ch2[] = "de";
	printf("%p\n", ch);
	printf("%p\n", strstr(ch, ch2));
	return 0;
}

Operation results:

  You can see the function of STR through the above code

Function prototype

const char* strstr ( const char* str1, const char* str2 );

Return value: const char * type, which returns the address where str2 first appears in str1. If str1 does not contain str2, NULL is returned

Parameters:

str1: point to the main string

str2: point to substring      

(only search without changing the original string, so add const)

Simulation Implementation

Brute force cracking (BF algorithm):

#include <assert.h>
#include <stdio.h>
#include <string.h>
const char* myStrstr(const char* str1, const char* str2) {
	assert(str1 != NULL);
	assert(str2 != NULL);
	assert(*str1 != '\0');
	assert(*str2 != '\0');
	int i = 0;
	int j = 0;
	while (str2[j] != '\0' && str1[i] != '\0') {
		if (str1[i] == str2[j]) {
			i++;
			j++;
		}
		else {
			i = i - j + 1;
			j = 0;
		}
	}
	if (str2[j] == '\0') {
		return &str1[i - j];
	}
	return NULL;
}

int main() {
	char arr1[] = "abcdefg";
	char arr2[] = "de";
	printf("%p\n", arr1);
	printf("%s\n", myStrstr(arr1, arr2));
	return 0;
}

Operation results:

  be careful:

When simulating the implementation, assert whether str1 and str2 are empty strings

Code idea:

Search from the first character of str1. If it is the same as the first character of str2, compare the next character and continue to compare later. If it is different, compare with str2 from the second character of str1, and i fallback to i - j + 1;     J directly back to 0

6, strtok

This function is used for string segmentation -- cutting a string into multiple parts according to a certain separator (after cutting, the starting position of each part will be saved)

eg. we have a string "a,b,c,d,e". We use this string to segment, and we can get five parts, "a","b","c","d","e"

Function declaration

char* strtok (char* str, const char* delimiters);

In general, the library function can be called once to achieve our purpose, but this function is different. By checking the document, we can find that calling this function once is useless. It needs to be called many times in succession to complete segmentation, and the parameters passed in each time in the process of continuous calls are different

Code demonstration

/* strtok example */
#include <stdio.h>
#include <string.h>

int main ()
{
  char str[] ="This is a book";
  char* pch;
  pch = strtok(str," ");
  while (pch != NULL)
  {
    printf ("%s\n",pch);
    pch = strtok(NULL, " ");
  }
  return 0;
}

  Operation results:

 

  "This is a book" is cut into four parts through strtok and printed

  code analysis

In this code, the strtok function is called for the first time. The first parameter passes the starting address of the string, and the second parameter passes the delimiter. Then it enters the loop to judge whether the pointer pch is empty. If it is not empty, the segmentation is not finished. Continue to call strtok to segment (the first parameter is NULL, and continue to segment from the last segmentation position)

Specific process of segmentation

In code, strtok is used to cut strings.

Call strtok(str, "") for the first time;       Then this function will look back from the starting position of the string. After finding '', change '' '' to '\ 0', and return the pointer to T. printf prints the first part in the loop

The second call strtok(NULL, "");     Since the first parameter is NULL, we continue to look for "" from the last segmentation position, find "" after is, change it to '\ 0', and return the pointer to i. printf prints is

The third call, strtok(NULL, "");       The first parameter is NULL. Continue from the last segmentation position, change it to '\ 0', then return the pointer to a, and printf prints a

The fourth call, strtok(NULL, "");       The first parameter is NULL. Continue to look for "" from the last segmentation position. No spaces are found, but '\ 0' is encountered. At this time, the function also returns the pointer to 'b' of book, and then printf prints book

The fifth call, strtok(NULL, "");       The first parameter is NULL. Continue to look for spaces from the last segmentation position, but '\ 0' has been found last time, so there is no need to look back. The string segmentation process is completed and NULL is returned

How to continue to look back from the last position?

There must be a variable inside the function to record the position of the last shard. The next call will be up to the position of the last shard

However, after a function call, the variables inside the function are local variables and have been destroyed. How can you remember the location of the last segmentation?

This is why static is used inside the function. Modifying local variables with static will prolong the declaration cycle of local variables (become the life cycle of the whole program)

Disadvantages of strtok

  1. It needs to be called several times continuously to achieve the purpose of segmentation
  2. Call multiple times, and the parameters passed in each time are inconsistent
  3. Strtok internally records a state (the location of the last shard), which will lead to thread insecurity, that is, strtok cannot be used in multi-threaded situations

The above series of string functions can only perform corresponding operations on strings, which are relatively limited. Let's look at the memory operation functions

7, memcpy

This function is used for memory copying

Function declaration

void* memcpy (void* destination, const void* source, size_t num );

Return value: destination is returned after copying   (chain access available)

Parameters:

destination:     Destination address to copy   

source:     Points to the string to be copied   (no need to change, add const)

num:     Number of bytes to copy (size_t type)

The types of dest and src are void *, and any type of pointer can be accepted with void *

Inside the function, the data type of the space pointed to by the pointer does not need to be considered. It is copied directly according to the number of bytes num

Code demonstration

#include <stdio.h>
#include <string.h>
int main() {
	int arr[] = { 1,2,3,4 };
	int arr2[] = { 5,6,7,8 };
	memcpy(arr, arr2, sizeof(arr2));
	for (int i = 0; i < 4; i++) {
		printf("%d ", arr[i]);
	}
	return 0;
}

Operation results:

  This code is to copy the contents of arr2 to arr1

Simulation Implementation

#include <stdio.h>
#include <string.h>
void* myMemcpy(void* dest, const void* src, size_t num) {
	assert(dest != NULL);
	assert(src != NULL);
	assert(num != 0);
	char* pdest = (char*)dest;
	char* psrc = (char*)src;
	for (size_t i = 0; i < num; i++) {
		pdest[i] = psrc[i];
	}
	return dest;
}
int main() {
	char arr1[10] = "abcdefg";
	char arr2[10] = "de";
	myMemcpy(arr1, arr2, sizeof(arr2));
	printf("%s\n", arr1);
	return 0;
}

Operation results:

  be careful:

1. During simulation implementation, void * pointers cannot be dereferenced and must be converted to char * to access one byte at a time

2.dest's memory space should be large enough to accommodate the length of src

8, memmove

The function is basically the same as that of memcpy, which performs memory copying. The difference is that memcpy does not support copying when memory overlaps, but memmove can identify memory overlaps so that the copies will not be overwritten

Memory space coverage

As shown below:

  src and dest are 4-byte spaces. Now copy all the contents of src to dest. If you copy according to the memcpy function, first copy a to the position pointed by DeST, and then copy later. However, when copying a, the contents of the last space of src have been overwritten, resulting in copy errors. After copying, the contents of DeST are abca

However, if the memmove function is used, the memory coverage will be identified. If the memory overlaps, the memmove function will copy backwards, from back to front, so that there will be no error.

Function declaration

void* memmove (void* destination, const void* source, size_t num);

Return value: returns dest after copy (for chain access)

Parameters:

destination:     Destination address to copy   

source:     Points to the string to be copied   (no need to change, add const)

num:     Number of bytes to copy (size_t type)

The types of dest and src are void *, and any type of pointer can be accepted with void *

The usage of this function is the same as memcpy, which will not be repeated here

Simulation Implementation

#include <stdio.h>
#include <string.h>
#include <assert.h>
void* my_memmove(void* dest, const void* src, size_t count) {
	assert(dest != NULL);
	assert(src != NULL);
	assert(count != 0);
	char* pdest = (char*)dest;
	char* psrc = (char*)src;
	if (pdest > psrc && pdest < psrc + count)
	{
		// Memory overlap
		while (count > 0) {
			pdest[count] = psrc[count];
			count--;
		}
	}
	else {
		// Not overlapping
		int i = 0;
		while (count > 0) {
			pdest[i] = psrc[i];
			count--;
			i++;
		}
	}
	return dest;
}

int main() {
	char arr[100] = "abc";
	char arr2[100] = "hello";
	my_memmove(arr, arr2, 6);
	printf("%s\n", arr);
	return 0;
}

Operation results:

9, memcmp

This function is used to compare the size of the contents in memory of a specified size

Function declaration

int memcmp (const void* ptr1, const void* ptr2, size_t num);

  Compare the contents of num bytes in the two memory spaces pointed to by numptr1 and ptr2 respectively

Return value:

If PTR1 > ptr2         Return 1

ptr1   == ptr2         Return 0

ptr1    < ptr2           Return - 1

Parameters:

ptr1 and ptr2 point to the two memory spaces to compare

num indicates the size of memory space to compare in bytes

Code demonstration

#include <stdio.h>
#include <string.h>
int main() {
	char ch[] = "abcde";
	char ch2[] = "aaabce";
	int ret = memcmp(ch, ch2, 1);
	printf("ret = %d\n", ret);
	return 0;
}

This code is used to compare the contents of the previous byte in ch and ch2. Each character in ch and ch2 accounts for one byte. If the contents of the previous byte are the same as' a ', the return value of memcmp is 0

Operation results:

  Now change the third parameter of memcmp to 2 to see the result:

  This is the comparison from the first character. A total of two bytes are compared to compare the two strings ch and ch2. That is, compare the first two characters. The first character is equal, and then compare the second character. b is larger than a, so the return value is 1

Simulation Implementation

#include <stdio.h>
#include <assert.h>
#include <string.h>
int my_memcmp(const void* ptr1, const void* ptr2, size_t num) {
	assert(ptr1 != NULL);
	assert(ptr2 != NULL);
	assert(num != 0);        // Note whether the num passed from the assertion is 0
	char* pptr1 = (char*)ptr1;
	char* pptr2 = (char*)ptr2;
	int i = 0;
	while (num > 0) {
		if (pptr1[i] > pptr2[i]) {
			return 1;
		}
		else if (pptr1[i] < pptr2[i]) {
			return -1;
		}
		else {
			i++;
			num--;
		}
	}
	return 0;
}
int main() {
	char ch[] = "abcdef";
	char ch2[] = "abcde";
	printf("%d\n", my_memcmp(ch, ch2, sizeof(ch)));
	return 0;
}

The simulation implementation of memcpy is relatively easy

It should be noted that the pointer of void * type cannot be dereferenced. It needs to be manually forced to char * type, and then dereference comparison

10, memset

This function is used to modify the memory space content of the first num bytes in the specified memory space to val value

Function declaration

void* memset (void* ptr, int value, size_t num);

Return value:

Set the contents of the first num bytes of memory to val, and then return ptr

Parameters:

ptr: point to the memory to be modified

Value: the value to be set to (passed as int, but the unsigned char form of the value is used when the function fills memory)

num: indicates how many bytes of memory space to modify

Code demonstration

#include <stdio.h>
#include <string.h>
int main ()
{
    char str[] = "almost every programmer should know memset!";
    memset (str,'-',6);
    printf("%s\n", str);
    return 0;
}

Through this code, the first 6 bytes of str string are modified to '-'

Operation results:

  Simulation Implementation

void* my_memset(void* ptr, int val, size_t num) {
	assert(ptr != NULL);
	assert(num != 0);
	char* pptr = (char*)ptr;
	int i = 0;
	while (num) {
		pptr[i] = val;
		num--;
		i++;
	}
	return ptr;
}

The above is about some functions of string and memory operation. We should not only be able to use these functions, but also implement them ourselves. Especially, the inspection operation in the implementation cannot be missing

-----------------------------------------------------------------

-----------C language string function & & memory operation function end---------

Welcome to pay attention!!!

Learn and communicate together!!!

Let's finish the programming!!!

--------------It's not easy to organize. Please support the third company------------------
 

Posted by dizzy1 on Sun, 31 Oct 2021 17:42:00 -0700