Large file segmentation and merging in Java

Recently, there is a requirement to divide a large file into several small files, and then merge and restore the small files into large files.

The requirements are simple and the implementation is simple.

  • File splitting is to read the large file, then read it to the buffer according to the specified size, and then write the buffer to the small file.
  • File merging is to read small files to the buffer in order. After reading all small files, write the buffer to large files at one time

Don't say much, just look at the code:

/**
 * Cut large files into small files
 *
 * @param inputFile  Big file
 * @param tmpPath    Temporary directory of small files
 * @param bufferSize Cut small file size
 */
public static void splitFile(String inputFile, String tmpPath, Integer bufferSize) {
    FileInputStream fis = null;
    FileOutputStream fos = null;
    try {
        // Original large file
        fis = new FileInputStream(inputFile);

        // File read cache
        byte[] buffer = new byte[bufferSize];
        int len = 0;

        // File count after cutting (also file name)
        int fileNum = 0;

        // Large files cut into small files
        while ((len = fis.read(buffer)) != -1) {
            fos = new FileOutputStream(tmpPath + "/" + fileNum);
            fos.write(buffer, 0, len);
            fos.close();
            fileNum++;
        }
        System.out.println("Split file" + inputFile + "Complete, total build" + fileNum + "Files");
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        if (fis != null) {
            try {
                fis.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        if (fos != null) {
            try {
                fos.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

/**
 * Merge cut small file into large file
 *
 * @param tmpPath    Temporary directory of small files
 * @param outputPath Output path
 * @param bufferSize Cut small file size
 */
public static void mergeFile(String tmpPath, String outputPath, Integer bufferSize) {
    FileInputStream fis = null;
    FileOutputStream fos = null;
    try {
        // Get the number of small files to cut
        File tempFilePath = new File(tmpPath);
        File[] files = tempFilePath.listFiles();
        if (files == null) {
            System.out.println("No file.");
            return;
        }
        int fileNum = files.length;

        // Restored large file path
        String outputFile = outputPath + "/" + generateFileName();
        fos = new FileOutputStream(outputFile);

        // File read cache
        byte[] buffer = new byte[bufferSize];
        int len = 0;

        // Restore all cut small files to one large file
        for (int i = 0; i < fileNum; i++) {
            fis = new FileInputStream(tmpPath + "/" + i);
            len = fis.read(buffer);
            fos.write(buffer, 0, len);
        }
        System.out.println("Merge catalog file:" + tmpPath + "Complete, the generated file is:" + outputFile);
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        if (fis != null) {
            try {
                fis.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        if (fos != null) {
            try {
                fos.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

/**
 * Generate random filename
 *
 * @return file name
 */
public static String generateFileName() {
    String time = DateFormatUtils.format(new Date(), "yyyMMddHHmmss");
    return time + ".7z";
}

Test the above code

String localInputFile = "F:/test/file/in/in.7z";
String localTmpPath = "F:/test/file/tmp";
String localOutputPath = "F:/test/file/out";
Integer localBufferSize = 1024 * 1024;
splitFile(localInputFile, localTmpPath, localBufferSize);
mergeFile(localTmpPath, localOutputPath, localBufferSize);

The above is the code of the cutting file and the merging file. The logic is very simple. It is read in binary mode, so the opened temporary file will be garbled.

If you want to read it as a string, you can avoid the problem of garbled code.

So how to read it as a string?
We can read the binary stream into, and then encode it with Base64, so it can become a string. But at the same time, there is also a problem, that is, the small file size will slightly expand.

Here is the implementation code:

/**
 * Cut large file into small file (string)
 *
 * @param inputFile  Large files to cut
 * @param tmpPath    Temporary directory of small files
 * @param bufferSize Cut size (binary read size)
 */
public static void splitFileByChar(String inputFile, String tmpPath, Integer bufferSize) {
    FileInputStream fis = null;
    FileWriter fw = null;
    try {
        // Original large file
        fis = new FileInputStream(inputFile);

        // File read cache
        byte[] buffer = new byte[bufferSize];

        // File count after cutting (also file name)
        int fileNum = 0;

        // Large files cut into small files
        while ((fis.read(buffer)) != -1) {
            fw = new FileWriter(tmpPath + "/" + fileNum + ".txt");
            // base64 flows binary to string
            String tmpStr = Base64.getEncoder().encodeToString(buffer);
            fw.write(tmpStr);
            fw.close();
            fileNum++;
        }
        System.out.println("Split file" + inputFile + "Complete, total build" + fileNum + "Files");
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        if (fis != null) {
            try {
                fis.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        if (fw != null) {
            try {
                fw.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

/**
 * Merge small files into large files
 *
 * @param tmpPath    Temporary directory of small files
 * @param outputPath Large file output directory
 * @param bufferSize Cut size (binary read size)
 */
public static void mergeFileByChar(String tmpPath, String outputPath, Integer bufferSize) {
    FileReader fr = null;
    FileOutputStream fos = null;
    try {
        // Get the number of small files to cut
        File tempFilePath = new File(tmpPath);
        File[] files = tempFilePath.listFiles();
        if (files == null || files.length <= 0) {
            System.out.println("No file.");
            return;
        }
        int fileNum = files.length;

        // Generated large file path
        String outputFile = outputPath + "/" + generateFileName();
        fos = new FileOutputStream(outputFile);

        // Restore all cut small files to one large file
        for (int i = 0; i < fileNum; i++) {
            fr = new FileReader(tmpPath + "/" + i + ".txt");

            // Read out the base64 encoded data (* 2 reduce the number of reads, because the file will expand slightly after Base64)
            char[] buffer = new char[bufferSize * 2];
            int len;
            StringBuilder tmpStr = new StringBuilder();
            while ((len = fr.read(buffer)) != -1) {
                tmpStr.append(new String(buffer, 0, len));
            }

            // base64 convert character to binary stream
            byte[] tmpBuffer = Base64.getDecoder().decode(tmpStr.toString());
            fos.write(tmpBuffer, 0, tmpBuffer.length);
        }
        System.out.println("Merge catalog file:" + tmpPath + "Complete, the generated file is:" + outputFile);
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        if (fr != null) {
            try {
                fr.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        if (fos != null) {
            try {
                fos.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

/**
 * Generate random filename
 *
 * @return file name
 */
public static String generateFileName() {
    String time = DateFormatUtils.format(new Date(), "yyyMMddHHmmss");
    return time + ".7z";
}

Test the code above

String localInputFile = "F:/test/file/in/in.7z";
String localTmpPath = "F:/test/file/tmp";
String localOutputPath = "F:/test/file/out";
Integer localBufferSize = 1024 * 1024;
splitFileByChar(localInputFile, localTmpPath, localBufferSize);
mergeFileByChar(localTmpPath, localOutputPath, localBufferSize);

Open the temporary directory and you will see the small file in string format we want. However, each small file size exceeds our preset size of 1M, about 1366K.

Divide and restore files in two ways, and record.

Posted by baze on Fri, 26 Jun 2020 19:15:10 -0700