compact Source of Hbase Source Code

Keywords: PHP HBase shell less REST

In compact 1, the scheduling process of HBASE compact is introduced. This article mainly introduces the actual process of compact. First access from the chore in the above, in the compact Checker chore method in HRegionserver, we will determine whether compact is needed, as follows:

protected void chore() {
      //Loop detection by traversing all online region s under instance
      //Online Regions is a collection of all online Regions that can provide effective services stored on HRegion Server.
      for (HRegion r : this.instance.onlineRegions.values()) {
        if (r == null)
          continue;
        //Remove the store for each region
        for (Store s : r.getStores().values()) {
          try {
            //Check whether the time interval of compact is needed. Generally, compact is triggered after memstore flush or other events, but sometimes different compact strategies are needed.
            // So periodic check interval = hbase.server.compactchecker.interval.multiplier * hbase.server.thread.wakefrequency, default 1000;
            long multiplier = s.getCompactionCheckMultiplier();
            assert multiplier > 0;
            // If the iteration factor iteration is an integer multiple of the merge check multiplier, the check will be initiated only if the iteration factor iteration is an integer multiple of the merge check multiplier.
            if (iteration % multiplier != 0) continue;
            if (s.needsCompaction()) {//// Initiate System Compaction requests if merging is required
              // Queue a compaction. Will recognize if major is needed.
              this.instance.compactSplitThread.requestSystemCompaction(r, s, getName()
                  + " requests compaction");
            } else if (s.isMajorCompaction()) {//If it is majorcompact, it will follow the request Compaction method
              if (majorCompactPriority == DEFAULT_PRIORITY
                  || majorCompactPriority > r.getCompactPriority()) {
                this.instance.compactSplitThread.requestCompaction(r, s, getName()
                    + " requests major compaction; use default priority", null);
              } else {
                this.instance.compactSplitThread.requestCompaction(r, s, getName()
                    + " requests major compaction; use configured priority",
                  this.majorCompactPriority, null);
              }
            }
          } catch (IOException e) {
            LOG.warn("Failed major compaction check on " + r, e);
          }
        }
      }
      iteration = (iteration == Long.MAX_VALUE) ? 0 : (iteration + 1);
    }

When judging s.needsCompaction(), the compactsplitThread.requstSystemCompaction() method is called for compact; if judging that compact is not needed at this time, isMajorCompaction is called to determine whether major compact is needed, and if major compact is used, the CompactSplitThread.requestCompaction() method is called. . Whether it's the requestSystem Compaction method or the requestCompaction method, it's the requestCompactionInternal method that is invoked eventually, except that the method parameters are different. Let's start with the request System Compaction. The specific logic of request System Compaction is as follows:

 public void requestSystemCompaction(
      final HRegion r, final Store s, final String why) throws IOException {
    requestCompactionInternal(r, s, why, Store.NO_PRIORITY, null, false);
  }

Continue to follow up on the request Compaction International method:

private synchronized CompactionRequest requestCompactionInternal(final HRegion r, final Store s,
      final String why, int priority, CompactionRequest request, boolean selectNow)
          throws IOException {
    //Preferred to make some necessary environmental judgments, such as whether HRegion Server has stopped and whether the tables corresponding to HRegion allow Compact operations
    if (this.server.isStopped()
        || (r.getTableDesc() != null && !r.getTableDesc().isCompactionEnabled())) {
      return null;
    }

    CompactionContext compaction = null;
    //system compaction triggered automatically by the system. The selectNow parameter is false. If the merge triggered by hbase shell and others is artificial, the selectNow parameter is true.
    if (selectNow) {
      // The main compaction triggered by the hbase shell, selectNow is true. Here, the actual selection of the file to be merged is performed.
      compaction = selectCompaction(r, s, priority, request);
      if (compaction == null) return null; // message logged inside
    }

    // We assume that most compactions are small. So, put system compactions into small
    // pool; we will do selection there, and move to large pool if necessary.
    // We assume that most mergers are small. So, put the merger initiated by the system into the small pool.
    // There we make choices and move to large pool if necessary.
    // That is to say, if the selectNow is false, the merger initiated by the system itself, such as MemStore flush, compact checking threads, etc., is integrated into short Compactions, that is, small pool.
    // If it is triggered artificially, such as HBase shell, it also depends on whether the merge request size in HStore exceeds the threshold, and if it exceeds, it is put into long Compactions, that is, large pool, or small pool.

    //Size is the total size of all hfile files for compact
    long size = selectNow ? compaction.getRequest().getSize() : 0;
    ThreadPoolExecutor pool = (!selectNow && s.throttleCompaction(size))
      ? longCompactions : shortCompactions;
    pool.execute(new CompactionRunner(s, r, compaction, pool));
    if (LOG.isDebugEnabled()) {
      String type = (pool == shortCompactions) ? "Small " : "Large ";
      LOG.debug(type + "Compaction requested: " + (selectNow ? compaction.toString() : "system")
          + (why != null && !why.isEmpty() ? "; Because: " + why : "") + "; " + this);
    }
    return selectNow ? compaction.getRequest() : null;
  }

In the request Compaction International method, the logical process is summarized as follows:

  1. First, check, for example, to determine whether the current regionserver is stopped, and if stop, return directly.
  2. Determine whether the selectNow parameter is valid. This parameter determines whether the compact is triggered artificially or automatically, if the system triggers automatically to false, such as here. If triggered artificially, it is true.
    1. If triggered artificially, the files that actually need compact are selected by the selectCompaction method.
  3. According to size and selectNow, we can determine which thread pool the current compact needs, long compaction or short compaction.
  4. Construct CompactionRunner to run in the thread pool.
    1. Here the compaction in CompactionRunner is null, because when judging selectNow, it is automatically executed by the system and the value is false, so the selectCompaction method will not be invoked to assign a value to it.
    2. So, in the run method in CompactionRunner, you have a logic to decide which files to use compact are re-selected.

The detailed process of Compaction Runner is as follows:

private class CompactionRunner implements Runnable, Comparable<CompactionRunner> {
    private final Store store;
    private final HRegion region;
    private CompactionContext compaction;
    private int queuedPriority;
    private ThreadPoolExecutor parent;

    public CompactionRunner(Store store, HRegion region,
        CompactionContext compaction, ThreadPoolExecutor parent) {
      super();
      this.store = store;
      this.region = region;
      this.compaction = compaction;
      // If the merge context compaction is empty, the priority of the merge queue is obtained by the getCompactPriority() method of HStor, otherwise it is obtained directly from the merge request.
      // In a merge request, it is actually passed in by calling the priority of the requestCompactionInternal() method
      this.queuedPriority = (this.compaction == null)
          ? store.getCompactPriority() : compaction.getRequest().getPriority();
      this.parent = parent;
    }

    @Override
    public String toString() {
      return (this.compaction != null) ? ("Request = " + compaction.getRequest())
          : ("Store = " + store.toString() + ", pri = " + queuedPriority);
    }

    @Override
    public void run() {
      Preconditions.checkNotNull(server);
      // Preferred to make some necessary environmental judgments, such as whether HRegion Server has stopped and whether the tables corresponding to HRegion allow Compact operations

      if (server.isStopped()
          || (region.getTableDesc() != null && !region.getTableDesc().isCompactionEnabled())) {
        return;
      }
      // Common case - system compaction without a file selection. Select now.
      // Often, system merges have not yet selected the files to be merged. Now choose.

      if (this.compaction == null) {
        int oldPriority = this.queuedPriority;
        this.queuedPriority = this.store.getCompactPriority();
        // If the current priority queuedPriority is greater than the previous old Priority

        if (this.queuedPriority > oldPriority) {
          // Store priority decreased while we were in queue (due to some other compaction?),
          // requeue with new priority to avoid blocking potential higher priorities.
          // Throw the CompactionRunner back into the thread pool

          this.parent.execute(this);
          return;
        }
        try {
          //Select candidate hfile
          this.compaction = selectCompaction(this.region, this.store, queuedPriority, null);
        } catch (IOException ex) {
          LOG.error("Compaction selection failed " + this, ex);
          server.checkFileSystem();
          return;
        }
        if (this.compaction == null) return; // nothing to do
        // Now see if we are in correct pool for the size; if not, go to the correct one.
        // We might end up waiting for a while, so cancel the selection.
        assert this.compaction.hasSelection();
        ThreadPoolExecutor pool = store.throttleCompaction(
            compaction.getRequest().getSize()) ? longCompactions : shortCompactions;
        if (this.parent != pool) {// Change pool
          this.store.cancelRequestedCompaction(this.compaction);          // HStore Cancels Merge Request

          this.compaction = null;          // Reset compaction to null

          this.parent = pool;          // Changing pool

          this.parent.execute(this);          // Put it in the thread pool and then reinitialize compaction

          return;
        }
      }
      // Finally we can compact something.
      assert this.compaction != null;
      // Prior to implementation

      this.compaction.getRequest().beforeExecute();
      try {
        // Note: please don't put single-compaction logic here;
        //       put it into region/store/etc. This is CST logic.
        long start = EnvironmentEdgeManager.currentTime();

        // Call HRegion's compact to execute compact for the store

        boolean completed =
            region.compact(compaction, store, compactionThroughputController);
        long now = EnvironmentEdgeManager.currentTime();
        LOG.info(((completed) ? "Completed" : "Aborted") + " compaction: " +
              this + "; duration=" + StringUtils.formatTimeDiff(now, start));
        if (completed) {
          // degenerate case: blocked regions require recursive enqueues
          if (store.getCompactPriority() <= 0) {
            // If the priority Priority is less than or equal to 0, which means that there are too many current files, a System Compaction needs to be initiated.

            requestSystemCompaction(region, store, "Recursive enqueue");
          } else {
            // Request splitting, in fact, is to see whether the size of the Region exceeds the threshold, thus causing splitting.

            // see if the compaction has caused us to exceed max region size
            requestSplit(region);
          }
        }
      } catch (IOException ex) {
        IOException remoteEx = RemoteExceptionHandler.checkIOException(ex);
        LOG.error("Compaction failed " + this, remoteEx);
        if (remoteEx != ex) {
          LOG.info("Compaction failed at original callstack: " + formatStackTrace(ex));
        }
        server.checkFileSystem();
      } catch (Exception ex) {
        LOG.error("Compaction failed " + this, ex);
        server.checkFileSystem();
      } finally {
        LOG.debug("CompactSplitThread Status: " + CompactSplitThread.this);
      }
      this.compaction.getRequest().afterExecute();
    }

As shown above, in Compact Runner:

  1. If the parameter compaction is empty
    1. Determine whether the priority has changed, and if the priority has changed, throw the CompactRunner back into the thread pool
    2. Call the selectCompaction method to select the candidate hfile
    3. Use store. throttle COmpaction to determine which thread pool to use. If the pool is changed, cancel compact is dropped back to the thread pool, which will be initialized later.
  2. Call the compaction.beforeExecute() method to do some work before compact: the default is an empty method, and no processing is done. If coprocessor is added, the corresponding hook will be executed.
  3. Get starttime
  4. Call the compact method of region. Compact for store
  5. According to the compact results, if the compact is successful, the priority after the compact determines whether to continue to perform a compact or split operation.
  6. Execute the compaction.afterExecute method.

Next, let's look at what we have done at each stage, starting with the selectCompaction method. This method chooses the file to be compact and constructs a compactionContext object to return. The specific logic is as follows:

private CompactionContext selectCompaction(final HRegion r, final Store s,
      int priority, CompactionRequest request) throws IOException {

    // Call HStor's requestCompaction() method to get CompactionContext
    CompactionContext compaction = s.requestCompaction(priority, request);
    if (compaction == null) {
      if(LOG.isDebugEnabled()) {
        LOG.debug("Not compacting " + r.getRegionNameAsString() +
            " because compaction request was cancelled");
      }
      return null;
    }
    // Ensure that merge request is not empty in CompactionContext

    assert compaction.hasSelection();
    if (priority != Store.NO_PRIORITY) {
      compaction.getRequest().setPriority(priority);
    }
    return compaction;
  }

As you can see, ultimately the requestCompaction method of the store is invoked to get the compactionContext. Keep following it up and see what happens.

public CompactionContext requestCompaction(int priority, CompactionRequest baseRequest)
      throws IOException {
    // don't even select for compaction if writes are disabled
    // If the corresponding HRegion is not writable, return null directly
    if (!this.areWritesEnabled()) {
      return null;
    }

    // Before we do compaction, try to get rid of unneeded files to simplify things.
    // Before we merge, try to simplify things by getting rid of unnecessary files

    removeUnneededFiles();
    // Create merge context CompactionContext through storage engine storeEngine

    CompactionContext compaction = storeEngine.createCompaction();
    CompactionRequest request = null;
    // Read-in lock

    this.lock.readLock().lock();
    try {
      synchronized (filesCompacting) {
        // First, see if coprocessor would want to override selection.
        if (this.getCoprocessorHost() != null) {
          // Select StoreFile through the preSelect() method of CompactionContext and return to the StoreFile list

          List<StoreFile> candidatesForCoproc = compaction.preSelect(this.filesCompacting);
          boolean override = this.getCoprocessorHost().preCompactSelection(
              this, candidatesForCoproc, baseRequest);
          if (override) {
            // Coprocessor is overriding normal file selection.
            compaction.forceSelect(new CompactionRequest(candidatesForCoproc));
          }
        }

        // Normal case - coprocessor is not overriding file selection.

        if (!compaction.hasSelection()) {// If the merge request is empty, there is no coprocessor
          // Is it UserCompaction?

          boolean isUserCompaction = priority == Store.PRIORITY_USER;
          boolean mayUseOffPeak = offPeakHours.isOffPeakHour() &&
              offPeakCompactionTracker.compareAndSet(false, true);
          try {
            // Call the select() method of CompactionContext

            compaction.select(this.filesCompacting, isUserCompaction,
              mayUseOffPeak, forceMajor && filesCompacting.isEmpty());
          } catch (IOException e) {
            if (mayUseOffPeak) {
              offPeakCompactionTracker.set(false);
            }
            throw e;
          }
          assert compaction.hasSelection();
          if (mayUseOffPeak && !compaction.getRequest().isOffPeak()) {
            // Compaction policy doesn't want to take advantage of off-peak.
            offPeakCompactionTracker.set(false);
          }
        }
        if (this.getCoprocessorHost() != null) {
          this.getCoprocessorHost().postCompactSelection(
              this, ImmutableList.copyOf(compaction.getRequest().getFiles()), baseRequest);
        }

        // Selected files; see if we have a compaction with some custom base request.
        // If the previous incoming request is not empty, merge it
        if (baseRequest != null) {
          // Update the request with what the system thinks the request should be;
          // its up to the request if it wants to listen.
          compaction.forceSelect(
              baseRequest.combineWith(compaction.getRequest()));
        }
        // Finally, we have the resulting files list. Check if we have any files at all.
        // Get merge request
        request = compaction.getRequest();
        // Get the set of selectedFiles to be merged from the merge request
        final Collection<StoreFile> selectedFiles = request.getFiles();
        if (selectedFiles.isEmpty()) {
          return null;
        }
        // Add the selected collection of files to filesCompacting to answer questions from previous articles

        addToCompactingFiles(selectedFiles);
        // Is it a major merge?

        // If we're enqueuing a major, clear the force flag.
        this.forceMajor = this.forceMajor && !request.isMajor();

        // Set common request properties.
        // Set priority, either override value supplied by caller or from store.
        request.setPriority((priority != Store.NO_PRIORITY) ? priority : getCompactPriority());
        request.setDescription(getRegionInfo().getRegionNameAsString(), getColumnFamilyName());
      }
    } finally {
      this.lock.readLock().unlock();
    }

    LOG.debug(getRegionInfo().getEncodedName() + " - "  + getColumnFamilyName()
        + ": Initiating " + (request.isMajor() ? "major" : "minor") + " compaction"
        + (request.isAllFiles() ? " (all files)" : ""));
    // Call HRegion's reportCompactionRequestStart() method to report the start of a compact request

    this.region.reportCompactionRequestStart(request.isMajor());
    // Return merge context compaction

    return compaction;
  }

Let's summarize the logical process of the above process.

  1. First try to get rid of unnecessary files and simplify the process: remove Unneeded Files
  2. By store Engine create Compaction ()
  3. Call the compactContext.select method to select the file
  4. Add the selected file to compact context and return

Let's first look at the removeUnneededFiles method, which excludes unnecessary files based on the maximum timestamp of the file and adds expired files to compacting files:

private void removeUnneededFiles() throws IOException {
    if (!conf.getBoolean("hbase.store.delete.expired.storefile", true)) return;
    if (getFamily().getMinVersions() > 0) {
      LOG.debug("Skipping expired store file removal due to min version being " +
          getFamily().getMinVersions());
      return;
    }
    this.lock.readLock().lock();
    Collection<StoreFile> delSfs = null;
    try {
      synchronized (filesCompacting) {
//Gets the set ttl time. If not, the default is long.maxnium long cfTtl = getStoreFileTtl(); if (cfTtl != Long.MAX_VALUE) {//If not forever
//Final call to getUnneededFiles delSfs = storeEngine.getStoreFileManager().getUnneededFiles( EnvironmentEdgeManager.currentTime() - cfTtl, filesCompacting);
//Add files after unneede to compacting files addToCompactingFiles(delSfs); } } } finally { this.lock.readLock().unlock(); } if (delSfs == null || delSfs.isEmpty()) return; Collection<StoreFile> newFiles = new ArrayList<StoreFile>(); // No new files. writeCompactionWalRecord(delSfs, newFiles); replaceStoreFiles(delSfs, newFiles); completeCompaction(delSfs); LOG.info("Completed removal of " + delSfs.size() + " unnecessary (expired) file(s) in " + this + " of " + this.getRegionInfo().getRegionNameAsString() + "; total size for store is " + TraditionalBinaryPrefix.long2String(storeSize, "", 1)); }

The getUnneededFiles method logic is as follows

public Collection<StoreFile> getUnneededFiles(long maxTs, List<StoreFile> filesCompacting) {
    Collection<StoreFile> expiredStoreFiles = null;
    ImmutableList<StoreFile> files = storefiles;
    // 1) We can never get rid of the last file which has the maximum seqid.
    // 2) Files that are not the latest can't become one due to (1), so the rest are fair game.
   for (int i = 0; i < files.size() - 1; ++i) { StoreFile sf = files.get(i); long fileTs = sf.getReader().getMaxTimestamp();
//If the maximum timestamp of the file is less than the set ttl size and is not in the compacting file if (fileTs < maxTs && !filesCompacting.contains(sf)) { LOG.info("Found an expired store file: " + sf.getPath() + " whose maxTimeStamp is " + fileTs + ", which is below " + maxTs); if (expiredStoreFiles == null) { expiredStoreFiles = new ArrayList<StoreFile>(); } expiredStoreFiles.add(sf); } }
//Returns a list of files to be excluded return expiredStoreFiles; }  

 

As you can see, you call the select method of compactionContext to select the file.

public boolean select(List<StoreFile> filesCompacting, boolean isUserCompaction,
        boolean mayUseOffPeak, boolean forceMajor) throws IOException {

      // Using the selectCompaction() method of the merge policy compactionPolicy to obtain the merge request

      request = compactionPolicy.selectCompaction(storeFileManager.getStorefiles(),
          filesCompacting, isUserCompaction, mayUseOffPeak, forceMajor);

      // Returns the flag whether request is received, true or false

      return request != null;
    }

It can be seen that in select, select Compaction according to the specified compact policy and select files. Our online environment is not specified, but default ratio is used, as follows:

public CompactionRequest selectCompaction(Collection<StoreFile> candidateFiles,
      final List<StoreFile> filesCompacting, final boolean isUserCompaction,
      final boolean mayUseOffPeak, final boolean forceMajor) throws IOException {
    // Preliminary compaction subject to filters
    // Preliminary compression filters, which create a list of candidate StoreFile s based on the incoming parameter candidateFiles

    ArrayList<StoreFile> candidateSelection = new ArrayList<StoreFile>(candidateFiles);
    // Stuck and not compacting enough (estimate). It is not guaranteed that we will be
    // able to compact more if stuck and compacting, because ratio policy excludes some
    // non-compacting files from consideration during compaction (see getCurrentEligibleFiles).
    // Determine futureFiles, 0 if filesCompacting is empty, or 1

    int futureFiles = filesCompacting.isEmpty() ? 0 : 1;
//According to the blocking store files configuration, determine whether or not blocking occurs boolean mayBeStuck = (candidateFiles.size() - filesCompacting.size() + futureFiles) >= storeConfigInfo.getBlockingFileCount(); // Exclude the file being merged from the candidate list candidate selection, that is, the file in filesCompacting candidateSelection = getCurrentEligibleFiles(candidateSelection, filesCompacting); LOG.debug("Selecting compaction from " + candidateFiles.size() + " store files, " + filesCompacting.size() + " compacting, " + candidateSelection.size() + " eligible, " + storeConfigInfo.getBlockingFileCount() + " blocking"); // If we can't have all files, we cannot do major anyway // Verify that all files are included, and set the flag bit isAllFiles to determine if the candidate selection size at this time is equal to the initial candidate Files list size. // Candidate Files represents all the files under Store boolean isAllFiles = candidateFiles.size() == candidateSelection.size(); // It is not possible to merge for a Major if all files are not included if (!(forceMajor && isAllFiles)) { // If it is not a mandatory Major merge and does not contain all files, skipLargeFiles() method is called to skip larger files candidateSelection = skipLargeFiles(candidateSelection); // Re-determine the token isAllFiles isAllFiles = candidateFiles.size() == candidateSelection.size(); } // Try a major compaction if this is a user-requested major compaction, // or if we do not have too many files to compact and this was requested as a major compaction // IsTrying Major is determined in three cases: // 1. Force Major to merge into true and contain all the question files, and it is a user merge // 2. Force a Major merge that contains all the query files, or a Major merge if judged by itself. At the same time, the number of candidate Selection must be less than the maximum number of files configured to meet the merge criteria. boolean isTryingMajor = (forceMajor && isAllFiles && isUserCompaction) || (((forceMajor && isAllFiles) || isMajorCompaction(candidateSelection)) && (candidateSelection.size() < comConf.getMaxFilesToCompact())); // Or, if there are any references among the candidates. // If a reference exists in candidates, it is considered to be a split file. boolean isAfterSplit = StoreUtils.hasReferences(candidateSelection); // If not Trying Major, and not after splitting if (!isTryingMajor && !isAfterSplit) { // We're are not compacting all files, let's see what files are applicable // File screening again //Remove files that should not be merged in Minor by the filterBulk() method; candidateSelection = filterBulk(candidateSelection); // By using the applyCompactionPolicy() method and some algorithms, the files are filtered. candidateSelection = applyCompactionPolicy(candidateSelection, mayUseOffPeak, mayBeStuck); //By checkMinFiles Criteria () method, the minimum number of files required for merging is determined. candidateSelection = checkMinFilesCriteria(candidateSelection); } // Remove excessive files from candidate selection candidateSelection = removeExcessFiles(candidateSelection, isUserCompaction, isTryingMajor); // Now we have the final file list, so we can determine if we can do major/all files. // See if it's all files isAllFiles = (candidateFiles.size() == candidateSelection.size()); // Using candidateSelection to construct the result of merge request CompactionRequest object CompactionRequest result = new CompactionRequest(candidateSelection); result.setOffPeak(!candidateSelection.isEmpty() && !isAllFiles && mayUseOffPeak); result.setIsMajor(isTryingMajor && isAllFiles, isAllFiles); return result; }

The main logic is filterbulk, applyCOmpactPolicy and checkMinFiles Criteria, which are described in turn below.

 private ArrayList<StoreFile> filterBulk(ArrayList<StoreFile> candidates) {
    candidates.removeAll(Collections2.filter(candidates,
        new Predicate<StoreFile>() {
          @Override
          public boolean apply(StoreFile input) {
            return input.excludeFromMinorCompaction();
          }
        }));
    return candidates;
  }

In filterbulk, the fileinfo field of hfile is used to determine whether it is excluded from mincompact.

The important thing is the applyCompactionPolicy method, which has the following specific logic:

ArrayList<StoreFile> applyCompactionPolicy(ArrayList<StoreFile> candidates,
      boolean mayUseOffPeak, boolean mayBeStuck) throws IOException {
    if (candidates.isEmpty()) {
      return candidates;
    }

    // we're doing a minor compaction, let's see what files are applicable
    int start = 0;
    // Get the file merge ratio: take the parameter hbase.hstore.compaction.ratio, default 1.2

    double ratio = comConf.getCompactionRatio();
    if (mayUseOffPeak) {
      // Take the parameter hbase.hstore.compaction.ratio.offpeak, which defaults to 5.0

      ratio = comConf.getCompactionRatioOffPeak();
      LOG.info("Running an off-peak compaction, selection ratio = " + ratio);
    }

    // get store file sizes for incremental compacting selection.
    final int countOfFiles = candidates.size();
    long[] fileSizes = new long[countOfFiles];
    long[] sumSize = new long[countOfFiles];
    for (int i = countOfFiles - 1; i >= 0; --i) {
      StoreFile file = candidates.get(i);
      fileSizes[i] = file.getReader().length();
      // calculate the sum of fileSizes[i,i+maxFilesToCompact-1) for algo
      // tooFar represents the file size at which the maximum number of files is moved, which is actually the file that just satisfies the maximum number of files.
      // That is, the number from i to tooFar is the maximum number of files allowed for merging.

      int tooFar = i + comConf.getMaxFilesToCompact() - 1;
      sumSize[i] = fileSizes[i]
        + ((i + 1 < countOfFiles) ? sumSize[i + 1] : 0)
        - ((tooFar < countOfFiles) ? fileSizes[tooFar] : 0);
    }

    // Inverse loop, if the number of files satisfies the minimum number of files allowed for merging and the size of the file at that location,
    // If the minimum size of a file is larger than the allowable size of the merged file and the total size of the next file window is multiplied by the larger one in a certain proportion, it will continue.
    // In fact, it is to select a set of files with the smallest size that can be satisfied in a file window.
    while (countOfFiles - start >= comConf.getMinFilesToCompact() &&
      fileSizes[start] > Math.max(comConf.getMinCompactSize(),
          (long) (sumSize[start + 1] * ratio))) {
      ++start;
    }
    if (start < countOfFiles) {
      LOG.info("Default compaction algorithm has selected " + (countOfFiles - start)
        + " files from " + countOfFiles + " candidates");
    } else if (mayBeStuck) {
      // We may be stuck. Compact the latest files if we can.
      // Requirements for minimum number of documents

      int filesToLeave = candidates.size() - comConf.getMinFilesToCompact();
      if (filesToLeave >= 0) {
        start = filesToLeave;
      }
    }
    candidates.subList(0, start).clear();
    return candidates;
  }

The process described above can be referred to in the Rational Compaction Policy strategy, which should be covered in a large number of articles. The process is not described in detail here.

The following is the checkMinFiles Criteria method to determine whether the file selected by the applyCompactionPolicy policy meets the minimum number of files required for merging. If the requirement is not met, the candidates will be cleared directly.

  private ArrayList<StoreFile> checkMinFilesCriteria(ArrayList<StoreFile> candidates) {
    int minFiles = comConf.getMinFilesToCompact();
    if (candidates.size() < minFiles) {
      if(LOG.isDebugEnabled()) {
        LOG.debug("Not compacting files because we only have " + candidates.size() +
          " files ready for compaction. Need " + minFiles + " to initiate.");
      }
      candidates.clear();
    }
    return candidates;
  }

After selecting candidates file, it is necessary to determine whether the number of selected files is larger than the value of compact.files.max parameter in the configuration by removeExcess Files method. If exceeded, the deletion value meets the configuration requirements.

Finally, compactionRequest is constructed according to candidates files

So much is said about the Select Compaction part of the CompactRunner run method. Here is the real implementation of compact, which is implemented through the region.compact method.

public boolean compact(CompactionContext compaction, Store store,
      CompactionThroughputController throughputController) throws IOException {
    assert compaction != null && compaction.hasSelection();
    assert !compaction.getRequest().getFiles().isEmpty();
    //If the region is closing or has close d, cancel compact
    if (this.closing.get() || this.closed.get()) {
      LOG.debug("Skipping compaction on " + this + " because closing/closed");
      store.cancelRequestedCompaction(compaction);
      return false;
    }
    MonitoredTask status = null;
    boolean requestNeedsCancellation = true;
    // block waiting for the lock for compaction
    lock.readLock().lock();
    try {
      byte[] cf = Bytes.toBytes(store.getColumnFamilyName());
      //Perform a series of checks
      if (stores.get(cf) != store) {
        LOG.warn("Store " + store.getColumnFamilyName() + " on region " + this
            + " has been re-instantiated, cancel this compaction request. "
            + " It may be caused by the roll back of split transaction");
        return false;
      }

      status = TaskMonitor.get().createStatus("Compacting " + store + " in " + this);
      if (this.closed.get()) {
        String msg = "Skipping compaction on " + this + " because closed";
        LOG.debug(msg);
        status.abort(msg);
        return false;
      }
      boolean wasStateSet = false;
      try {
        synchronized (writestate) {
          if (writestate.writesEnabled) {//This state is unreadable by default, read only is false write Enabled is true
              //Add the compacting value of writestate to one
            wasStateSet = true;
            ++writestate.compacting;
          } else {
            String msg = "NOT compacting region " + this + ". Writes disabled.";
            LOG.info(msg);
            status.abort(msg);
            return false;
          }
        }
        LOG.info("Starting compaction on " + store + " in region " + this
            + (compaction.getRequest().isOffPeak()?" as an off-peak compaction":""));
        doRegionCompactionPrep();
        try {
          status.setStatus("Compacting store " + store);
          // We no longer need to cancel the request on the way out of this
          // method because Store#compact will clean up unconditionally
          requestNeedsCancellation = false;
          //Finally, the compact method of store is called for compact
          store.compact(compaction, throughputController);
        } catch (InterruptedIOException iioe) {
          String msg = "compaction interrupted";
          LOG.info(msg, iioe);
          status.abort(msg);
          return false;
        }
      } finally {
        if (wasStateSet) {
          synchronized (writestate) {
            --writestate.compacting;
            if (writestate.compacting <= 0) {
              writestate.notifyAll();
            }
          }
        }
      }
      status.markComplete("Compaction complete");
      return true;
    } finally {
      try {
        if (requestNeedsCancellation) store.cancelRequestedCompaction(compaction);
        if (status != null) status.cleanup();
      } finally {
        lock.readLock().unlock();
      }
    }
  }

The following is the store.compact method, which takes a certain amount of time. It calls the compact method of compactContext, which is called compactor to execute compact. Specific logic to be continued

Posted by gizzmo on Tue, 06 Aug 2019 02:52:14 -0700