How HDFS works

Keywords: Hadoop hdfs

How HDFS works

1. NameNode and DataNode

HDFS adopts master/slave architecture. An HDFS cluster consists of a NameNode and a certain number of datanodes. NameNode is a central server, which is responsible for managing the namespace of the file system and the access of clients to files. A DataNode in a cluster is usually one node, which is responsible for managing the storage on its node. HDFS exposes the namespace of the file system, on which users can store data in the form of files.

Internally, a file is actually divided into one or more data blocks, which are stored on a group of datanodes. Namenode performs namespace operations of the file system, such as opening, closing, and renaming files or directories. It is also responsible for determining the mapping of data blocks to specific DataNode nodes. DataNode is responsible for processing read and write requests from file system clients. Create, delete and copy data blocks under the unified scheduling of namenode.

NameNode is responsible for maintaining the file system namespace. Any changes to the file system namespace or attributes will be recorded by NameNode. The application can set the number of copies of files saved by HDFS. The number of file copies is called the copy coefficient of the file, and this information is also saved by the NameNode.

2. HDFS startup process

2.1. NameNode loading and persistence

1,NameNode At startup, it will be loaded first name Recent in directory fsimage File, will fsimage The metadata saved in is loaded into memory, so that there are all the metadata saved in the previous checkpoint in the content, but there is still some data from the last checkpoint to the system shutdown, that is edit Data stored in log files.
2,load edit Log file: load all log files from the latest checkpoint to the content, and repeat the client operation, so that the content is the metadata of the latest file system.
3,Set checkpoints, NameNode Summarize previously used edit File, create a new log file, and then merge all unconsolidated edit Log files and fsimage The files are merged and a new one is generated fsimage
4,In safe mode, wait DataNode Node heartbeat feedback, when received 99.9%After the block has at least one copy, exit safe mode and start to turn to normal state

2.2. Safety mode

When namenode starts, it enters a special state called safe mode. Namenode in safe mode will not copy data blocks. Namenode receives heartbeat signals and block status reports from all datanodes. The block status report includes a list of all data blocks of a Datanode. Each data block has a specified minimum number of copies. When the namenode detection confirms that the number of copies of a data block reaches the minimum value, the data block will be considered as safe replicated; After a certain percentage (this parameter can be configured) of data blocks are detected and confirmed to be safe by namenode (plus an additional 30 seconds waiting time), namenode will exit the safe mode state. Next, it will determine which data blocks have not reached the specified number of copies, and copy these data blocks to other datanodes.

2.3,Secondary NameNode

NameNode appends and saves the changes to the file system to a log file (edits) on the local file system. When a NameNode starts, it first reads the status of HDFS from an image file (fsimage), and then applies the edits operation in the log file. It then writes the new HDFS state to (fsimage) and starts normal operation with an empty edits file. Because the NameNode merges fsimage and edits only during the startup phase, the log file may become very large over time, especially for large clusters. Another side effect of too large log file is that the next NameNode startup will take a long time.

Secondary NameNode Periodic consolidation fsimage and edits Log, will edits The log file size is controlled within a limit. Because of memory requirements and NameNode In an order of magnitude, so usually secondary NameNode and NameNode Run on different machines. Secondary NameNode adopt bin/ stay conf/masters Starts on the node specified in.
Secondary NameNode The start of the checkpoint process is controlled by two configuration parameters:
1,fs.checkpoint.period,Specifies the maximum time interval between two consecutive checkpoints. The default value is 1 hour.
2,fs.checkpoint.size,Defined edits The maximum value of the log file. If this value is exceeded, the checkpoint will be enforced (even if the maximum time interval of the checkpoint is not reached). The default value is 64 MB. 

Secondary NameNode The directory where the latest checkpoint is saved NameNode The directory structure is the same. therefore NameNode It can be read when needed Secondary NameNode Checkpoint mirroring on.
If NameNode In addition to the latest checkpoint, all other historical images and edits The files are missing, NameNode This latest checkpoint can be introduced. The following operations can achieve this function:
1,In configuration parameters Create an empty folder at the specified location;
2,Assign the location of the checkpoint directory to the configuration parameter fs.checkpoint.dir;
3,start-up NameNode,And add-importCheckpoint. 

NameNode will read the checkpoint from fs.checkpoint.dir directory and save it in directory. If there is a legal image file in the directory, the NameNode will fail to start. NameNode will check the consistency of the image file in fs.checkpoint.dir directory, but will not change it.

3. HDFS file upload and download

3.1. HDFS access mode

3.1.1. Web interface

NameNode and DataNode start a built-in Web server respectively, which displays the current basic status and information of the cluster. In the default configuration, the first page address of the NameNode is http://namenode-name:50070/ . This page lists all datanodes in the cluster and the basic status of the cluster. This Web interface can also be used to browse the entire file system (using the "Browse the file system" link on the NameNode home page). You can also JAVA API Access by.

3.1.2 Shell command

Hadoop includes a series of shell like commands that can directly interact with HDFS and other file systems supported by Hadoop. The bin/hadoop fs -help command lists all commands supported by the Hadoop Shell. The bin/hadoop fs -help command name command can display detailed information about a command. These commands support most common file system operations, such as copying files, changing file permissions, and so on. It also supports some HDFS specific operations, such as changing the number of file copies.

3.2. HDFS file upload

3.1 Access methods in, browser JAVA Procedure and execution Shell Command server, accessing HDFS File system can be regarded as HDFS The client of the file system. Client can upload files to HDFS System, the process is as follows
1,Client to NameNode Send file upload request
2,NameNode Check whether you have permission to write, whether the parent directory of the file exists, whether the file under the current directory exists, etc. if the check passes, go to step 3, otherwise an exception will be thrown
3,The client sends a file request, which contains the file size
4,NameNode After receiving the request, according to the file size and block size( dfs.blocksize),Number of copies of the reconstituted configuration block( dfs.replication),Calculate the number of blocks to be occupied by the file, NameNode What files will exist DataNode and DataNode Which location information is returned to the client.
Note: in most cases, copy factor( dfs.replication)It's 3, HDFS The storage strategy of is to store one copy on the node of the local rack, one copy on another node of the same rack, and the last copy on the node of different racks, which can improve efficiency.
5,After receiving the information, the client will cut the file to be uploaded into blocks, and the size of each block is dfs.blocksize
6,Start uploading the first data block and build the data block upload channel pipeline,Client to packet Unit( dfs.write.packet.size The default is 64 K),Transfer data to DataNode01,DataNode01 Pass to DataNode02,DataNode02 Pass to DataNode03. packet After successful data storage, DataNode3 Feedback results to DataNode2,DataNode2 Feedback results to DataNode1(pipeline The essence of channel is RPC Call, client passes to DataNode01,DataNode01 Pass to DataNode02,DataNode02 Pass to DataNode03,thus pipeline After the channel is established, the client transmits one channel every time packet Will be put into a reply queue to wait for the reply. After the whole block transmission is completed, clear the queue and close the channel pipeline. 
7,Repeat step 6 for the following data blocks until the file transfer is completed (a new channel will be created after each block is transferred) pipeline). 
8,Transmission completed one block After the block, DataNode Will to NameNode Respond, then NameNode Record DataNode Medium block Block information

3.3. HDFS file download

1,Client to NameNode Send file download request
2,NameNode Check whether the file exists, permissions, etc. if it passes the check, return the file metadata to be downloaded to the client through metadata, including the nodes and locations of the data.
3,The client downloads the data of the first block according to the metadata and the proximity principle, and then appends the downloaded data of the second block to the first block.
4,When the file download is completed, the client sends the NameNode Response.

Posted by SamLiu on Fri, 03 Dec 2021 06:17:26 -0800