2021-11-06 hadoop security mode

1 what is safe mode

Security mode is a special state of HDFS. In this state, the file system only accepts data read requests, but does not accept change requests such as deletion and modification.

When the namenode master node is started, HDFS first enters the safe mode. When the datanode is started, it will report the available block and other status to the namenode. When the whole system meets the safety standard, HDFS will automatically leave the safe mode. If the HDFS is in the safe mode, the file block cannot perform any replication operations. Therefore, the minimum number of replicas is determined based on the state when the datanode is started, and no replication will be performed during startup (so as to meet the minimum number of replicas)

2. Relevant configuration of security mode

When the system meets some conditions, the cluster can leave safe mode. Let's first look at the common configurations of security mode:

dfs.namenode.replication.min: the minimum number of block replicas. The default value is 1, that is, the default number of block replicas is 1, which meets the requirements;
dfs.namenode.safemode.threshold-pct: the percentage of blocks with the minimum number of copies in the total number of blocks in the system. When the actual proportion exceeds the configuration, you can leave the safe mode (but other conditions need to be met). The default value is 0.999f, that is, when the proportion of blocks meeting the minimum number of copies exceeds 99.9%, and other conditions are met, you can leave the safe mode. If it is less than or equal to 0, it will not wait for any copy to meet the requirements to leave. If greater than 1, it is always in safe mode.

dfs.namenode.safemode.min.datanodes: the minimum number of live datanodes that leave the security mode. The default is 0. That is, you can leave safe mode even if all datanodes are unavailable.
dfs.namenode.safemode.extension: the cluster leaves the security mode only after the proportion of available block s and available datanode s of the cluster meet the requirements, and the requirements can still be met after the time period configured by the extension. The unit is milliseconds. The default value is 30000, i.e. 30s.

To sum up, the following conditions need to be met to leave the safe mode:

NN resources, such as editLog directory resources, NN security mode

The following is the safe mode exit mode of BlockManager

Blocks that meet the minimum replica requirements need to reach a certain proportion;
Number of datanode surviving;
After meeting conditions 2 and 3, you have to wait for a certain period of time (extension) to see whether it continues to be met;

3 related commands

hadoop dfsadmin -safemode <command>

command	function
get	View current status
enter	Enter safe mode
leave	Force out of safe mode
wait	Wait until safe mode ends

4 source code

The book continues , when the namenode is started, it will start the public service at the time of initialize. During the process of starting the public service, it will initialize the parameters related to the security mode, and some cases will enter the security mode (BlockManageSafeMode). Whether to enter the security mode is a part of the enterState when the namenode is initialized (the resource of nn)

protected void initialize(Configuration conf) throws IOException {
    ...
	// Start some public services
    startCommonServices(conf);
    startMetricsLogger(conf);
}

protected NameNode(Configuration conf, NamenodeRole role) throws IOException {
    ...
    try {
          initializeGenericKeys(conf, nsId, namenodeId);
          // NameNode initialization
          initialize(getConf());
          state.prepareToEnterState(haContext);
          try {
            haContext.writeLock();
            // Start the services in the corresponding state, such as active and standby. During this process, you will see whether to enter the safe mode according to the actual situation
            state.enterState(haContext);
          } finally {
            haContext.writeUnlock();
          }
        } catch (IOException e) {
          this.stopAtException(e);
          throw e;
        } catch (HadoopIllegalArgumentException e) {
          this.stopAtException(e);
          throw e;
        }
    ...
 }

4.1 startCommonServices

/** 
 * FSNamesystem#startCommonServices
 * Start the services common to active and standby states 
 */
private void startCommonServices(Configuration conf) throws IOException {
    // Start the general service. Please refer to the following method for specific steps
    namesystem.startCommonServices(conf, haContext);
    registerNNSMXBean();
    // If the current role of nn is not NAMENODE, that is, backup or checkpoint, start the httpServer of nn
    if (NamenodeRole.NAMENODE != role) {
        startHttpServer(conf);
        httpServer.setNameNodeAddress(getNameNodeAddress());
        httpServer.setFSImage(getFSImage());
        if (levelDBAliasMapServer != null) {
            httpServer.setAliasMap(levelDBAliasMapServer.getAliasMap());
        }
    }
    rpcServer.start();
    try {
        plugins = conf.getInstances(DFS_NAMENODE_PLUGINS_KEY,
                                    ServicePlugin.class);
    } catch (RuntimeException e) {
        String pluginsValue = conf.get(DFS_NAMENODE_PLUGINS_KEY);
        LOG.error("Unable to load NameNode plugins. Specified list of plugins: " +
                  pluginsValue, e);
        throw e;
    }
    // Start nn configured plug-ins
    for (ServicePlugin p: plugins) {
        try {
            p.start(this);
        } catch (Throwable t) {
            LOG.warn("ServicePlugin " + p + " could not be started", t);
        }
    }
    LOG.info(getRole() + " RPC up at: " + getNameNodeAddress());
    if (rpcServer.getServiceRpcAddress() != null) {
        LOG.info(getRole() + " service RPC up at: "
                 + rpcServer.getServiceRpcAddress());
    }
}

/** 
 * Start services common to both active and standby states
*/
void startCommonServices(Configuration conf, HAContext haContext) throws IOException {
    this.registerMBean(); // register the MBean for the FSNamesystemState
    writeLock();
    this.haContext = haContext;
    try {
        /**
       * This object mainly sets three configurations:
       *    * dfs.namenode.resource.du.reserved: nn Reserved space for storage, 100MB by default
       *    * dfs.namenode.resource.checked.volumes: In addition to the edit directory, a list of local directories to be checked by the NameNode resource checker
       *    * dfs.namenode.resource.checked.volumes.minimum: nn Minimum number of redundant volumes required
       *  In addition, the directory configuration of local edit will be obtained through conf configuration and added to the detection line. Whether it is necessary to check is determined according to dfs.namenode.edits.dir.required
       */
        nnResourceChecker = new NameNodeResourceChecker(conf);
        /**
       * Check whether the volume added in the NameNodeResourceChecker meets the requirements of duReserve (100MB) and mark hasResourcesAvailable
       * The judgment logic is actually quite simple. Make circular judgment according to each data volume added in the previous step
       *    1. Whether it is necessary, cycle here:
       *      true: Directly judge whether > 100MB is satisfied. If not, return false, required volume+1
       *      false: The redundant volume count is + 1, and then judge whether the resource meets the 100MB limit. If not, the unavailable redundant volume is + 1
       *    2. End of cycle, redundant is 0
       *      true: required volume Is it greater than 0
       *      false: Redundant volume data - is the number of unavailable redundant volumes greater than the minimum redundancy of nn
       */
        checkAvailableResources();
        assert !blockManager.isPopulatingReplQueues();
        StartupProgress prog = NameNode.getStartupProgress();
        prog.beginPhase(Phase.SAFEMODE);
        // Get the block that has been completed (the number of copies meets the requirements)
        long completeBlocksTotal = getCompleteBlocksTotal();
        // Wait for dn to report block status to nn
        prog.setTotal(Phase.SAFEMODE, STEP_AWAITING_REPORTED_BLOCKS,
                      completeBlocksTotal);
        // The security mode of blockManager will also be entered here, mainly based on the reasonable threshold of block (0.999f) and the number of DNS
        blockManager.activate(conf, completeBlocksTotal);
    } finally {
        writeUnlock("startCommonServices");
    }

    registerMXBean();
    DefaultMetricsSystem.instance().register(this);
    if (inodeAttributeProvider != null) {
        inodeAttributeProvider.start();
        dir.setINodeAttributeProvider(inodeAttributeProvider);
    }
    snapshotManager.registerMXBean();
    InetSocketAddress serviceAddress = NameNode.getServiceAddress(conf, true);
    this.nameNodeHostName = (serviceAddress != null) ?
        serviceAddress.getHostName() : "";
}

4.2 security mode entry with insufficient resources

Here we only analyze the logic of ActiveState, which is an implementation of the abstract class HAStae:

// Active#enterState
@Override
public void enterState(HAContext context) throws ServiceFailedException {
    try {
        // Continue to the startActiveServices method of NameNode
        context.startActiveServices();
    } catch (IOException e) {
        throw new ServiceFailedException("Failed to start active services", e);
    }
}

// NameNode#startActiveServices
@Override
public void startActiveServices() throws IOException {
    try {
        // Follow up
        namesystem.startActiveServices();
        startTrashEmptier(getConf());
    } catch (Throwable t) {
        doImmediateShutdown(t);
    }
}

The following is FSNamesystem#startActiveServices

/**
 * Start services required in active state
 * @throws IOException
 */
void startActiveServices() throws IOException {
    startingActiveService = true;
    LOG.info("Starting services required for active state");
    writeLock();
    try {
        // Get editLog, where the FsImage involved is filled in when FSNamesystem is initialized, while FSNamesystem completes loadNamesystem when NameNode is initialized
        FSEditLog editLog = getFSImage().getEditLog();

        /**
       * Judge whether the state of FSEditLog is IN_SEGMENT(nn just started) or BETWEEN_LOG_SEGMENT(nn just switched from standby state)
       * OPEN_FOR_WRITE Status: once NN has been initialized
       */
        if (!editLog.isOpenForWrite()) {
            // During startup, we're already open for write during initialization.
            editLog.initJournalsForWrite();
            // May need to recover
            editLog.recoverUnclosedStreams();

            LOG.info("Catching up to latest edits from old active before " +
                     "taking over writer role in edits logs");
            editLogTailer.catchupDuringFailover();

            blockManager.setPostponeBlocksFromFuture(false);
            blockManager.getDatanodeManager().markAllDatanodesStale();
            blockManager.clearQueues();
            blockManager.processAllPendingDNMessages();
            blockManager.getBlockIdManager().applyImpendingGenerationStamp();

            // Only need to re-process the queue, If not in SafeMode.
            if (!isInSafeMode()) {
                LOG.info("Reprocessing replication and invalidation queues");
                blockManager.initializeReplQueues();
            }

            if (LOG.isDebugEnabled()) {
                LOG.debug("NameNode metadata after re-processing " +
                          "replication and invalidation queues during failover:\n" +
                          metaSaveAsString());
            }

            long nextTxId = getFSImage().getLastAppliedTxId() + 1;
            LOG.info("Will take over writing edit logs at txnid " + 
                     nextTxId);
            editLog.setNextTxId(nextTxId);

            getFSImage().editLog.openForWrite(getEffectiveLayoutVersion());
        }

        // Initialize the quota.
        dir.updateCountForQuota();
        // Enable quota checks.
        dir.enableQuotaChecks();
        dir.ezManager.startReencryptThreads();

        if (haEnabled) {
            // Renew all of the leases before becoming active.
            // This is because, while we were in standby mode,
            // the leases weren't getting renewed on this NN.
            // Give them all a fresh start here.
            leaseManager.renewAllLeases();
        }
        leaseManager.startMonitor();
        startSecretManagerIfNecessary();

        //ResourceMonitor required only at ActiveNN. See HDFS-2914
        // A background thread of NameNodeResourceMonitor will be started here. Once the resource does not meet the requirements in the process, it will enter safe mode
        this.nnrmthread = new Daemon(new NameNodeResourceMonitor());
        nnrmthread.start();
      ...
	} finally {
      startingActiveService = false;
      blockManager.checkSafeMode();
      writeUnlock("startActiveServices");
    }
}

Take a look at the NameNodeResourceMonitor

/**
   * Periodically calls hasAvailableResources of NameNodeResourceChecker, and if
   * there are found to be insufficient resources available, causes the NN to
   * enter safe mode. If resources are later found to have returned to
   * acceptable levels, this daemon will cause the NN to exit safe mode.
   */
class NameNodeResourceMonitor implements Runnable  {
    boolean shouldNNRmRun = true;
    @Override
    public void run () {
        try {
            while (fsRunning && shouldNNRmRun) {
                checkAvailableResources();
                // This is the hasResourcesAvailable variable marked in startCommonService. If the resource is unavailable, it will enter safe mode
                if(!nameNodeHasResourcesAvailable()) {
                    String lowResourcesMsg = "NameNode low on available disk space. ";
                    if (!isInSafeMode()) {
                        LOG.warn(lowResourcesMsg + "Entering safe mode.");
                    } else {
                        LOG.warn(lowResourcesMsg + "Already in safe mode.");
                    }
                    // Enter FSNamesystem safe mode
                    enterSafeMode(true);
                }
                try {
                    Thread.sleep(resourceRecheckInterval);
                } catch (InterruptedException ie) {
                    // Deliberately ignore
                }
            }
        } catch (Exception e) {
            FSNamesystem.LOG.error("Exception in NameNodeResourceMonitor: ", e);
        }
    }

    public void stopMonitor() {
        shouldNNRmRun = false;
    }
}

4.3 security mode of blockmanager

In FSNamesystem#startCommonServices, we will activate blockManager, and in the process of activating blockManager, we will start the safe mode.

Before that, let's briefly understand BlockManager. If Hdfs is compared to a person's trunk, NameNode is his brain, which understands the state information of each node and controls and manages the operation of limbs (datanodes). The BlockManager is the heart of the whole trunk. It continuously receives blood (BlockInfo) from the limbs and then transmits the blood (Command) back to the limbs to make it work normally.

For block manager, it is mainly used to retain information related to blocks stored in Hadoop clusters. For block state management, it attempts to maintain the security attribute of "number of live copies = = expected redundancy" under any event (such as retirement, namenode failover, datanode failure).

The logic for starting the safe mode is here:

...
blockManager.activate(conf, completeBlocksTotal);
...

// The services in blockManager began to be excited and activated
public void activate(Configuration conf, long blockTotal) {
    pendingReconstruction.start();
    datanodeManager.activate(conf);
    this.redundancyThread.setName("RedundancyMonitor");
    this.redundancyThread.start();
    storageInfoDefragmenterThread.setName("StorageInfoMonitor");
    storageInfoDefragmenterThread.start();
    this.blockReportThread.start();
    mxBeanName = MBeans.register("NameNode", "BlockStats", this);
    // Trigger the start of safe mode
    bmSafeMode.activate(blockTotal);
  }
/**
 * Initialize the safe mode information.
 * @param total initial total blocks
 */
void activate(long total) {
    assert namesystem.hasWriteLock();
    assert status == BMSafeModeStatus.OFF;

    startTime = monotonicNow();
    // Set the total number of blocks and calculate the threshold of blocks
    setBlockTotal(total);
    // If the requirements are met (see areThresholdsMet method below), exit the safe mode, otherwise enter the safe mode
    if (areThresholdsMet()) {
        boolean exitResult = leaveSafeMode(false);
        Preconditions.checkState(exitResult, "Failed to leave safe mode.");
    } else {
        // enter safe mode
        // Enter the security mode, Pending on more safe blocks or live datanode. In this mode, it will enter the EXTENSION state after passing
        status = BMSafeModeStatus.PENDING_THRESHOLD;
        initializeReplQueuesIfNecessary();
        reportStatus("STATE* Safe mode ON.", true);
        lastStatusReport = monotonicNow();
    }
}

/**
 * @return true if both block and datanode threshold are met else false.
 */
private boolean areThresholdsMet() {
    assert namesystem.hasWriteLock();
    // Calculating the number of live datanodes is time-consuming
    // in large clusters. Skip it when datanodeThreshold is zero.
    // We need to evaluate getNumLiveDataNodes only when
    // (blockSafe >= blockThreshold) is true and hence moving evaluation
    // of datanodeNum conditional to isBlockThresholdMet as well
    synchronized (this) {
        // Whether the legal block meets the threshold value. The default value is 0.999
        boolean isBlockThresholdMet = (blockSafe >= blockThreshold);
        boolean isDatanodeThresholdMet = true;
        // If the threshold of the number of datanode s required to survive is > 0, subsequent judgment can be made
        if (isBlockThresholdMet && datanodeThreshold > 0) {
            int datanodeNum = blockManager.getDatanodeManager().
                getNumLiveDataNodes();
            isDatanodeThresholdMet = (datanodeNum >= datanodeThreshold);
        }
        return isBlockThresholdMet && isDatanodeThresholdMet;
    }
}

The security mode of blockManager is mainly managed through BlockManagerSafeMode.

/**
 * Block manager safe mode info.
 *
 * During name node startup, counts the number of <em>safe blocks</em>, those
 * that have at least the minimal number of replicas, and calculates the ratio
 * of safe blocks to the total number of blocks in the system, which is the size
 * of blocks. When the ratio reaches the {@link #threshold} and enough live data
 * nodes have registered, it needs to wait for the safe mode {@link #extension}
 * interval. After the extension period has passed, it will not leave safe mode
 * until the safe blocks ratio reaches the {@link #threshold} and enough live
 * data node registered.
 */
@InterfaceAudience.Private
@InterfaceStability.Evolving
class BlockManagerSafeMode {
  enum BMSafeModeStatus {
    PENDING_THRESHOLD, /** Pending on more safe blocks or live datanode. */
    EXTENSION,         /** In extension period. */
    OFF                /** Safe mode is off. */
  }
  ...
}

Posted by Thumper on Sat, 06 Nov 2021 11:06:32 -0700

Programmer Group