[introduction to Linux] basic IO

Keywords: Python Linux Algorithm leetcode

✔ Review the interface of C file

When learning C language, we learned some interfaces of C language for file operation.
There are fopen, fclose, fputc, fgetc, fputs, fgets, fprintf, fscanf, fread, fwrite, etc

Briefly review with a piece of code:

   #include<stdio.h>
   #include<string.h>
   int main()
   {
     	FILE* fp=fopen("myfile","a+");                                                               
     	if(!fp)
     	{
      		printf("fopen error\n");
     	}
    	int count=5;
    	const char *m="hello linux\n";
    	while(count--){
    	//Write to file
    	fwrite(m,strlen(m),1,fp);
    }
    //Close file
    fclose(fp);
    return 0;
  }

We must first understand that to write the content to a file, we must first have the file, and then open the file.
In the code above

FILE* fp=fopen("myfile","a+");

The first parameter: the path / file name of the file (without a path, the file will be created under the current path).
Current path: the path where the current process runs.
The second parameter is how to.

When learning Linux, we often hear that "everything is a file".

Are monitors and keyboards files?
In C language, we often use the printf() function to display the content on the display.
Now, we don't use the printf() function to print the content.

     #include<stdio.h>
     #include<string.h>
     int main()
     {
    	char *m="hello linux\n";  
        fwrite(m,strlen(m),1,stdout);                                                                                             
       	return 0;
    }


In this way, we can understand that the display can also be regarded as a file and can also be written with the fwrite() function.

Here's the point: when any process runs, it will open three input and output streams by default.
namely:
Standard input (keyboard) stdin
Standard output (display) stdin
Standard error (display) stderr
The types of these three streams are FILE *, FILE pointer.

✔ System file I/O

In addition to the above C interface, there is also our system interface for file operation.

We use C interface to operate files on Linux. In fact, C is calling the system interface of Linux.
Therefore, the interface of C library file is an encapsulation of the system call interface.

The first interface: the open interface. The difference from C is that there is no f in the front.

   #include <sys/types.h>
   #include <sys/stat.h>
   #include <fcntl.h>

   int open(const char *pathname, int flags);
   int open(const char *pathname, int flags, mode_t mode);

The parameters are path or file name, options, and permissions.

Options:

  • O_RDONLY: open read only
  • O_WRONLY: write only open
  • O_RDWR: read, write on
    You must specify one and only one of the three constants
  • O_create: if the file does not exist, create it. You need to use the mode option to indicate the access rights of the new file
  • O_APPEND: append write

Note that the return type of these interfaces is int.
After the file is opened successfully, a small non negative integer will be returned, indicating the file descriptor of the file.
Failure returns - 1.

Feel it with a piece of code.

     #include<stdio.h>
     #include<sys/stat.h>
     #include<sys/types.h>
     #include<fcntl.h>
     #include<unistd.h>
     #include<string.h>
     int main()
     {
       umask(0);
      int fd=open("myfile",O_WRONLY|O_CREAT,0666);
      if(fd<0)
      {
        perror("open");
        return 1;
      }
      char *buf="hello linux\n";                                                                
      write(fd,buf,strlen(buf));
    
      close(fd);
      return 0;
    }


One of them: 0666 in open indicates 666 of the file permission when creating the file. Of course, set the default mask.
Two of them: O_WRONLY|O_CREAT means that if there is a file, the file will be opened in write only mode. If there is no file, the file will be created with permission of 666 and opened in write only mode.

Why use O_WRONLY|O_CREAT?
It is not difficult to see that these options expressed in uppercase letters are macros. These macros correspond to a bit bit, just like a bitmap.
When we passed O_WRONLY|O_CREAT,
If (o_wronly & F) will be used to judge this option, etc.

The first parameter in write is the file descriptor. What is the file descriptor?

✔ File descriptor

Let's use a piece of code to feel the file descriptor:

   #include<sys/types.h>
   #include<sys/stat.h>
   #include<fcntl.h>
   #include<stdio.h> 
   int main()
   {
     umask(0);
    int fd1=open("myfile",O_WRONLY|O_CREAT,0666); 
    int fd2=open("myfile",O_WRONLY|O_CREAT,0666);
    int fd3=open("myfile",O_WRONLY|O_CREAT,0666);  
    int fd4=open("myfile",O_WRONLY|O_CREAT,0666);
  
    printf("fd1:%d\n",fd1);
    printf("fd2:%d\n",fd2);
    printf("fd3:%d\n",fd3);
    printf("fd4:%d\n",fd4);                                                                                        
    return 0;
  }


open successfully returns a small nonnegative integer, that is, the file descriptor.
From the execution effect of the above code, it is a bit like the subscript of an array.
In fact, this is the subscript of an array, in which the subscripts 0, 1 and 2 of the array are occupied by the keyboard (standard input), the display (standard output) and the display (standard error).
Therefore, 3, 4, 5 and 6 are allocated.

And why arrays?
Then we must first understand the memory files and disk files.

The above code creates the myfile file, and its file attributes (file size, file name, last modification time, etc.) will be saved in memory in the struct file structure.
The contents of the file are on the disk, that is, the disk file.
These struct file s will be organized by the operating system in the form of double linked list, which is similar to PCB.

A process creation will create a PCB, in which a pointer in the PCB points to a structure called files_struct. Part of the structure exists in the form of pointer array, and the content stored in it is the address of the struct file structure.

The process finds the place where the file address is stored through the file descriptor, and then operates the file.

Now we know that the file descriptor is a small integer starting from 0. When we open the file, the operating system will create a corresponding data structure in memory to describe the target file.
So there is the file structure. It represents an open file object. The process executes the open system call, so it must let in
Procedures are associated with files. Each process has a pointer * files, which points to a table files_struct. The most important part of the table is to include a pointer array, and each element is a pointer to the open file! Therefore, in essence, the file descriptor is the subscript of the number group. Therefore, as long as you hold the file descriptor, you can find the corresponding file

Allocation rules for file descriptors

If we close(0)

     #include<sys/stat.h>
     #include<sys/types.h>
     #include<fcntl.h>
     #include<stdio.h>  
     #include<unistd.h>
     int main()
     {
      close(0);
      umask(0);
      int fd1=open("myfile",O_WRONLY|O_CREAT,0666);
    
      printf("fd1:%d\n",fd1);
      return 0;
    }


Allocation rules for file descriptors: in the files_struct array, find the smallest subscript that is not currently used as a new file descriptor.

redirect

If we close(1)

	 //Output redirection
     #include<sys/stat.h>
     #include<sys/types.h>
     #include<fcntl.h>
     #include<stdio.h>
     #include<unistd.h>
     #include<string.h>
     
     int main()
     {
      close(1);
      umask(0);
   	  int fd1=open("myfile",O_WRONLY|O_CREAT,0666);    
      char *duf="hello linux\n";
	  
	  //printf("%s",duf);
	  //fflush(stdout); / / update the user space buffer data of the stream
      write(1,duf,strlen(duf));                                                                 
      return 0;
     }

✔FILE

In learning the function of FILE operation in C language, there are FILE * types.

FILE *fopen(const char *path, const char *mode);

So what is FILE *?

FILE is a structure, and FILE * is a pointer to a structure.
As we all know, IO related functions in C library are actually encapsulation of system calls.
In the IO interface of system call, the type of open function is int.

int open(const char *pathname, int flags, mode_t mode);

The open function returns a file descriptor. You can find the corresponding file through the file descriptor.

There is an int variable in the FILE structure to represent the FILE descriptor, which is why IO in C can also find the corresponding FILE, which is a kind of encapsulation.

Let's look at the code in the FILE structure.

stay/sur/include/stdio.h Can find
FILE Medium int _fileno Is the encapsulation of file descriptors.
struct _IO_FILE {
 int _flags; /* High-order word is _IO_MAGIC; rest is flags. */
#define _IO_file_flags _flags
 //Buffer correlation
 /* The following pointers correspond to the C++ streambuf protocol. */
 /* Note: Tk uses the _IO_read_ptr and _IO_read_end fields directly. */
 char* _IO_read_ptr; /* Current read pointer */
 char* _IO_read_end; /* End of get area. */
 char* _IO_read_base; /* Start of putback+get area. */
 char* _IO_write_base; /* Start of put area. */
 char* _IO_write_ptr; /* Current put pointer. */
 char* _IO_write_end; /* End of put area. */
 char* _IO_buf_base; /* Start of reserve area. */
 char* _IO_buf_end; /* End of reserve area. */
 /* The following fields are used to support backing up and undo. */
 char *_IO_save_base; /* Pointer to start of non-current get area. */
 char *_IO_backup_base; /* Pointer to first valid character of backup area */
 char *_IO_save_end; /* Pointer to end of non-current get area. */
 struct _IO_marker *_markers;
 struct _IO_FILE *_chain;

 int _fileno; //Encapsulated file descriptor

//......

Let's look at the following code:

#include<stdio.h>
#include<sys/stat.h>
#include<sys/tyoes.h>
#include<fcntl.h>
#include<string.h>

int main()
{
	close(1);
	int fd=open("myfile",O_WRONLY|O_CREAT,0666);
	
	const char* arr="hello linux\n";
	
	fwrite("arr",strlen(arr),1,stdout);
	
	return 0;
}


The string is not printed on the display, but written to the myfile file.

Why?

Before that, you should understand that the stdin, stdout and stdree streams in C are of FILE * type, and the FILE descriptors in the three files are fixed as 0, 1 and 2. This is why you can find the keyboard, display and display with stdin, stdout and stderr in C.

In the above code, 1 is closed, and the FILE descriptor of myfile is 1, so stdout is still written to myfile in fwrite function. This is why the display does not print, but is written to the myfile FILE. Through this code, we should now understand what FILE * is.

Finally, what did fopen do?
1. Apply for the struct FILE structure variable to the calling user and return the address (FILE *)
2. At the bottom, open the FILE through open, return fd, and fill fd into fileno of the FILE variable.

buffer

There are two codes:

#include<stdio.h>
#include<unistd.h>
void A()
{
	printf("hello linux\n");
	sleep(3);
}

void B()
{
	printf("hello linux");
	sleep(3);
}

int main()
{
	A();
	B();
	return 0;
}

Where A is to display hello Linux first, and then wait for 3 seconds.
B is to wait for 3 seconds before displaying hello Linux.

When the content is echoed to the display, the content is written into the buffer first. The line cache is used. When it is encountered, the buffer will be refreshed and the content will be written into the display. When the buffer is full, the buffer will also be refreshed.

The buffers are:

  1. No buffer
  2. Line buffer: when it encounters \ n, it will refresh the contents before the buffer \ n, otherwise it will wait until the buffer is full. Balance efficiency and usability.
  3. Full buffer: wait until the buffer is full

(it is common to refresh the content of the display by using row cache, so that we can see our content faster)

Look at the code:

#include<stdio.h>
#include<unistd.h>
#include<string.h>

int main()
{
  printf("hello printf\n");
  fprintf(stdout,"hello fprintf\n"); 
  const char*mag2="hello write\n";
  write(1,mag2,strlen(mag2));
  fork();
  return 0;
}

The operation result is:

hello printf
hello fprintf
hello write

However, we redirect input to the process to a file:. / a.out > myfile

hello write
hello printf
hello fprintf
hello printf
hello fprintf


Because:

When we reset back, file descriptor 1 no longer represents the display, but our file. At this time, the buffer adopts full cache.
The IO interface of the system call is unbuffered and can be written directly.
When "hello printf\n" and "hello fprintf\n" are stored in the buffer, a child process is created. return 0; The write time copy was made before, so the string was printed twice at last.

The buffer is provided by C and maintained by the FILE structure.
The buffer is in memory, at the user level.
The data refresh of the buffer is not directly refreshed to the file, but needs to be written to the file through the kernel area. Here is the OS's own refresh mechanism, which is not discussed here (I haven't learned yet, ha ha).

fclose and close

fclose: flushed the buffer in C before closing 1. Content can be written to a file.

#include<stdio.h>
#include<sys/types.h>
#include<sys/stat.h>
#include<fcntl.h>
#include<string.h>
#include<unistd.h>

int main()
{
  close(1);
  int fd=open("myfile",O_WRONLY|O_CREAT,0666);
  char *arr="hello linux\n";
  fprintf(stdout,arr);
  fclose(stdout);
  return 0;
}


Close: due to the use of full buffer, when close(1), the buffer of C cannot be seen by the system call, and it is closed without refreshing the buffer, so it is not written to the file.

#include<stdio.h>
#include<sys/types.h>
#include<sys/stat.h>
#include<fcntl.h>
#include<string.h>
#include<unistd.h>

int main()
{
  close(1);
  int fd=open("myfile",O_WRONLY|O_CREAT,0666);
  char *arr="hello linux\n";
  fprintf(stdout,arr);
  //fflush(stdout);// You can flush the buffer before calling close.
  fclose(stdout);
  return 0;
}


Call fclose by calling ffiush first and then close.

✔ dup2 system call

In the above redirection, we have to close(1) first and then open the file, which is very cumbersome. We have a simpler way.

int dup2(int oldfd, int newfd);
#include<stdio.h>
#include<sys/types.h>
#include<sys/stat.h>
#include<fcntl.h>
#include<string.h>
#include<unistd.h>

int main()
{
  int fd=open("myfile",O_WRONLY|O_CREAT,0666);  
  char *arr="hello linux\n"; 
  close(1);
  dup2(fd,1);  
  printf("%s",arr);
  return 0;
}

✔ Understanding file systems

The file system is an important part of Linux. After playing in Linux for some time, I have been confused about how to create files? Through learning, I gradually learned a little about myself.

File = file attribute + file content. When we view the file size, we display the size of the file content, and its attribute information is not included. This shows that the file attributes and file content are stored separately on the disk. File attributes are called meta information.

Let's start with a brief overview of disks:
Disk has sectors, tracks, cylinders, heads
When a file is written to the disk, the disk will be addressed, and the column head, track and sector will be used to find the place to write the content.

Assuming that the disk size is 500GB, the system uses partitions for managing such a large space (just as there are provinces, cities and counties in China)

inode

Inode is a collection of attributes of any file. Almost every file in Linux has an inode number.

The meta information of the file is stored in inode, which is a structure.

The above figure shows the system diagram of disk files (kernel memory mapping must be different). Disk is a typical block device, and the partition of disk is divided into blocks. The size of a block is determined during formatting and cannot be changed. For example, the - b option of mke2fs can set the block size to 1024, 2048, or 4096 bytes. In the above figure, the size of the Boot Block is determined.

  • Block Group: the ext2 file system is divided into several block groups according to the size of the partition. Each Block Group has the same structure. For example, the government manages districts
  • Super Block: stores the structure information of the file system itself. The recorded information mainly includes: the total amount of bolck and inode, the number of unused blocks and inodes, the size of a block and inode, the last time of mounting, the last time of writing data, the last time of checking the disk, and other file system related information. The information of Super Block is destroyed, so the whole file system structure is destroyed
  • GDT, Group Descriptor Table: block group descriptor, which describes block group attribute information. You can learn about it if you are interested
  • Block Bitmap: the Block Bitmap records which Data Block in the Data Block has been occupied and which Data Block has not been occupied
  • inode Bitmap: each bit indicates whether an inode is free and available.
  • inode Table: store file attributes, such as file size, owner, latest modification time, etc
  • Data blocks: store file contents

The data area is block by block, and the size of each block is 4KB, which is used to store data. (there is a multi-level index. I won't talk about it before I learn it)

There is an array (int block[12]) in the inode structure to record the location of the data block.
The creation of an ordinary file.

First, find the unused inode in the inode bitmap and apply to record the attributes of the file. If you want to write content to the file, the system will apply for the required free block in the block bitmap according to the size of the content and write the content. The kernel records the above block list in the disk distribution area on the inode. After that, the kernel will add the inode number and file name of the file to the directory file. The inode number of the file corresponds to the file name of the file.

Directory creation

The directory is also a file and has its own inode number. The process of creating a directory is somewhat similar to the creation of the above ordinary files. The difference is that the contents of the directory file are the file names and inode pointers in the directory, so that these file names and inode pointers correspond one by one.

ls command:

ls -l command

It can also be seen that directories and files are linked before.

Deletion of files

The deletion of files is not so complicated. Just change the data in the bitmap of the corresponding inode (Set 1 to 0) and the corresponding block bitmap data (Set 1 to 0). This is why deleted files can be recovered. Just set the bitmap back.

There are four main operations for creating a new file:

  1. Storage properties
    The kernel first finds an idle i node (263466 here). The kernel records the file information.
  2. Store data
    The file needs to be stored in three disk blocks, and the kernel found three free blocks: 300500800. Copy the first block of data in the kernel buffer to 300, the next block to 500, and so on.
  3. Record allocation
    The contents of the documents shall be stored in the order of 300500800. The kernel records the above block list in the disk distribution area on the inode.
  4. Add a file name to the directory with a new file name abc. How does linux record this file in the current directory? The kernel adds the entry (263466 abc) to the directory file. The correspondence between file name and inode connects the file name with the contents and attributes of the file.

Hard link

ln File name the file name to create


The inode numbers of the two files are the same, indicating that myfile-s is not an independent file, but a new file name is added to the directory data, and the ionde corresponding to the file name is the same as myfile.

Number of hard connections

The number of hard connections is how many files have the same inode number.
The inode numbers of myfile and myfile-s files are the same, so hard connect the digits 2.

To free the disk space corresponding to this file, change the number of hard connections to 0. Delete a file with the same inode number, and the number of hard connections is - 1.

in other words:

  • The link states of abc and def are exactly the same. They are called hard links to files. The kernel records the number of connections, inode
    The number of hard connections of 263466 is 2.

  • We did two things when deleting files: 1. Delete the corresponding records in the directory; 2. Set the number of hard connections to - 1. If it is 0, the corresponding disk will be released.

Soft link

ln -s File name the file name to create


Hard links refer to another file through inode, and soft links refer to another file by name.

Three times of file

use stat The file name can be viewed

  • Access last access time
    Modify file content last modified time
    Last modification time of Change property

Posted by utdfederation on Tue, 16 Nov 2021 17:48:11 -0800