Several methods of calculating file Checksum in small class

Recall, did you see Checksum, a string of strings, when downloading files on the website?

For example, we can download the dependency package Apache POI for operating Excel on the Apache website, and you can see the checksum: SHA-256, SHA-512, as shown in the following figure:

Take the poi-bin-4.1.0-20190412.tar.gz file as an example, click the links of SHA-256 and SHA-512 to view the relevant values as follows:

##Value of SHA-256
d8db4f8228d87935ca46b0af72db68ad83f45b31d885e67b089d195b5ee800bb

##Value of SHA-512
87499ab94882605ee2f407fc66e24c613ae98896b8d5f527b6cd8c604574922fc72d148da42962b2ee30ad18cd712e3de42bfe14770261b07217717c52a738a9

This article will briefly introduce checksum (meaning and function) and how to use java programs to calculate checksum values of different algorithms, including MD5, SHA-1, SHA-256 and SHA-512.

Checksum: Sum check code, checksum.
In the field of data processing and data communication, the sum of a set of data items for verification purposes.
These data items can be numbers or other strings that are considered numbers in the calculation of the check sum.
It is usually expressed in hexadecimal system.

[Function] it is used to check the integrity of files and detect whether files have been maliciously tampered, such as file transfer (such as plug-ins, firmware upgrade packages, etc.) scenarios.

Next, let's take a look at how to use java programs to generate relevant checksum values. This paper takes the file poi-bin-4.1.0-20190412.tar.gz as an example, which can be downloaded through the following path:

http://mirror.bit.edu.cn/apache/poi/release/bin/poi-bin-4.1.0-20190412.tar.gz

To use the checksum values of different algorithms, including MD5, SHA-1, SHA-256 and SHA-512, define an enumeration class to distinguish different algorithms.

package com.wangmengjun.tutorial.checksum;

public enum CheckSumAlgoType {

  MD5("MD5"), SHA_256("SHA-256"), SHA_512("SHA-512"), SHA_1("SHA1");


  private String name;

  private CheckSumAlgoType(String name) {
    this.name = name;
  }

  public String getName() {
    return name;
  }

  public void setName(String name) {
    this.name = name;
  }

}

Next, let's take a look at several methods to calculate the file checksum:

  1. Use java.security.MessageDigest
  2. Use org.apache.commons.codec.digest.DigestUtils
  3. Use com.google.common.io.Files.hash

1, Use java.security.MessageDigest

  public static String genChecksum1(File file, String checkSumAlgo) throws NoSuchAlgorithmException, IOException {
    MessageDigest messageDigest = MessageDigest.getInstance(checkSumAlgo);
    messageDigest.update(Files.readAllBytes(file.toPath()));
    byte[] digestBytes = messageDigest.digest();
    StringBuffer sb = new StringBuffer();
    for (byte b : digestBytes) {
      sb.append(Integer.toString((b & 0xff) + 0x100, 16).substring(1));
    }
    return sb.toString();
  }

Among them, the following code,

StringBuffer sb = new StringBuffer();
    for (byte b : digestBytes) {
      sb.append(Integer.toString((b & 0xff) + 0x100, 16).substring(1));
    }
    return sb.toString();

javax.xml.bind.DatatypeConverter can be used to do this. The simplified code is as follows:

  public static String genChecksum1(File file, String checkSumAlgo) throws NoSuchAlgorithmException, IOException {
    MessageDigest messageDigest = MessageDigest.getInstance(checkSumAlgo);
    messageDigest.update(Files.readAllBytes(file.toPath()));
    byte[] digestBytes = messageDigest.digest();
    return DatatypeConverter.printHexBinary(digestBytes).toLowerCase();
  }

Because the characters returned by DatatypeConverter.printHexBinary(digestBytes) are capitalized,

Therefore, the toLowerCase () method is added to maintain its consistency.

2, Use org.apache.commons.codec.digest.DigestUtils

Common codec is used to complete the Maven project. Dependent packages need to be added, such as:

<!-- https://mvnrepository.com/artifact/commons-codec/commons-codec -->
<dependency>
    <groupId>commons-codec</groupId>
    <artifactId>commons-codec</artifactId>
    <version>1.13</version>
</dependency>

The simple code is as follows. You can complete the calculation of the specified checksum by calling the static method of class DigestUtils:

  public static String genChecksum2(File file, CheckSumAlgoType checkSumAlgoType)
      throws FileNotFoundException, IOException {
    /**
     * Use org.apache.commons.codec.digest.DigestUtils
     */
    String checksum = null;
    switch (checkSumAlgoType) {
    case MD5:
      checksum = DigestUtils.md5Hex(new FileInputStream(file));
      break;

    case SHA_1:
      checksum = DigestUtils.sha1Hex(new FileInputStream(file));
      break;

    case SHA_256:
      checksum = DigestUtils.sha256Hex(new FileInputStream(file));
      break;
    case SHA_512:
      checksum = DigestUtils.sha512Hex(new FileInputStream(file));
      break;
    default:
      checksum = DigestUtils.md5Hex(new FileInputStream(file));
    }

    return checksum;
  }

3, Use com.google.common.io.Files.hash

To complete the Maven project with Guava, you need to add dependent packages, such as:

<!-- https://mvnrepository.com/artifact/com.google.guava/guava -->
<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>23.0</version>
</dependency>

The simple code is as follows. You can call the hash method of com.google.common.io.Files:

  public static String genChecksum3(File file, CheckSumAlgoType checkSumAlgoType) throws IOException {
    /**
     * Use Guava
     */
    String checksum = null;
    switch (checkSumAlgoType) {
    case MD5:
      checksum = com.google.common.io.Files.hash(file, Hashing.md5()).toString();
      break;
    case SHA_1:
      checksum = com.google.common.io.Files.hash(file, Hashing.sha1()).toString();
      break;

    case SHA_256:
      checksum = com.google.common.io.Files.hash(file, Hashing.sha256()).toString();
      break;
    case SHA_512:
      checksum = com.google.common.io.Files.hash(file, Hashing.sha512()).toString();
      break;
    default:
      checksum = com.google.common.io.Files.hash(file, Hashing.md5()).toString();
    }
    return checksum;
  }

verification

Finally, let's verify the checksum calculation of files by the above methods.

  public static void main(String[] args) throws NoSuchAlgorithmException, IOException {
    File file = new File("/users/wmj/Downloads/poi-bin-4.1.0-20190412.tar.gz");
    for (CheckSumAlgoType type : CheckSumAlgoType.values()) {
      System.out.println("use" + type.getName() + "calculation checksum");
      System.out.println(
          String.format("method=%s,checksum=%s", "genChecksum1", genChecksum1(file, type.getName())));
      System.out.println(String.format("method=%s,checksum=%s", "genChecksum2", genChecksum2(file, type)));
      System.out.println(String.format("method=%s,checksum=%s", "genChecksum3", genChecksum3(file, type)));
      System.out.println();
    }
  }

The operation results are as follows:

use MD5 calculation checksum
method=genChecksum1,checksum=2fa39c79790c29c53368ec0c14fdea97
method=genChecksum2,checksum=2fa39c79790c29c53368ec0c14fdea97
method=genChecksum3,checksum=2fa39c79790c29c53368ec0c14fdea97

use SHA-256 calculation checksum
method=genChecksum1,checksum=d8db4f8228d87935ca46b0af72db68ad83f45b31d885e67b089d195b5ee800bb
method=genChecksum2,checksum=d8db4f8228d87935ca46b0af72db68ad83f45b31d885e67b089d195b5ee800bb
method=genChecksum3,checksum=d8db4f8228d87935ca46b0af72db68ad83f45b31d885e67b089d195b5ee800bb

use SHA-512 calculation checksum
method=genChecksum1,checksum=87499ab94882605ee2f407fc66e24c613ae98896b8d5f527b6cd8c604574922fc72d148da42962b2ee30ad18cd712e3de42bfe14770261b07217717c52a738a9
method=genChecksum2,checksum=87499ab94882605ee2f407fc66e24c613ae98896b8d5f527b6cd8c604574922fc72d148da42962b2ee30ad18cd712e3de42bfe14770261b07217717c52a738a9
method=genChecksum3,checksum=87499ab94882605ee2f407fc66e24c613ae98896b8d5f527b6cd8c604574922fc72d148da42962b2ee30ad18cd712e3de42bfe14770261b07217717c52a738a9

use SHA1 calculation checksum
method=genChecksum1,checksum=f56e42474fa81676d82a38ae6a8df67194a50b93
method=genChecksum2,checksum=f56e42474fa81676d82a38ae6a8df67194a50b93
method=genChecksum3,checksum=f56e42474fa81676d82a38ae6a8df67194a50b93

We can see that the calculation results are consistent with the checksum displayed on Apache.

This paper mainly gives three ways to calculate checksum, including:

  1. Use java.security.MessageDigest
  2. Use org.apache.commons.codec.digest.DigestUtils
  3. Use com.google.common.io.Files.hash

Of course, there may be other implementation methods and toolkits. If readers find others, they can also synchronize and learn together.

Posted by glenelkins on Mon, 22 Nov 2021 21:31:19 -0800