Record a merge sort optimization

Keywords: Java Big Data

Core idea of merging and sorting:

It's easy to combine two ordered arrays into one.

Steps:

  1. Divide an array into two arrays from the middle
  2. Repeat step 1 for the two arrays after bisection until the length of the array is 1
  3. Merge the two divided arrays. At this time, the two divided arrays are in order

During this period of time, I reviewed the knowledge of data structure, re wrote the merging and sorting, using the java language, but through testing 200000 random data, I found that the efficiency of merging and sorting is almost the same as that of selecting and sorting. I fell into deep doubt about this. Which place is wrong?

Here, first paste out the original code:

public int[] mergeSort(int[] array) {

        //Take the idea of "divide first and rule later"
        if(array.length > 1){

            //branch
            int[] left = mergeSort(leftArray(array));
            int[] right = mergeSort(rightArray(array));
            //close
            array = mergeArray(left,right);
        }
        return array;
    }

    //Get the left half of the array
    private int[] leftArray(int[] a){
        int[] b = new int[(a.length + 1)/2];
        for (int i = 0; i < b.length; i++) {
            b[i] = a[i];
        }
        return b;
    }

    //Get the right half of the array
    private int[] rightArray(int[] a){
        int[] b = new int[a.length/2];
        int lenA = (a.length + 1) / 2;
        for (int i = 0; i < b.length; i++) {
            b[i] = a[i + lenA];
        }
        return b;
    }

    //Merge two ordered arrays
    private int[] mergeArray(int[] left, int[] right){
        int[] a = new int[left.length + right.length];
        int indexL = 0;
        int indexR = 0;
        int lenL = left.length;
        int lenR = right.length;
        for (int i = 0; i < a.length; i++) {
            a[i] = left[indexL]  < right[indexR] ? left[indexL++] : right[indexR++];
            if(indexL == lenL || indexR == lenR){
                //Jump out of the loop when conditions are met
                break;
            }
        }
        if(indexL == lenL){
            //Put the right array in the merge array
            for (int i = indexL + indexR; i < a.length; i++) {
                a[i] = right[indexR++];
            }
        }else if(indexL == indexR){
            //Load the left array into the merge array in turn
            for (int i = indexL + indexR; i < a.length; i++) {
                a[i] = left[indexL++];
            }
        }
        return a;
    }

After thinking about the factors that may cause the slow speed (I don't think it's a problem of merging thought that killed me), under the leadership of the Communist Party, I found out the murderer of that problem. Yes, there is only one murderer, that is, constantly applying for new arrays, and it takes a lot of time to open up space. No wonder the efficiency is so slow. Most of the time was used for mining! Shame, shame, this is the bloody example of contemporary programmers abusing space. In the memory of ancient times, predecessors used a variable to think about whether it was necessary to use a variable, for fear of causing a little space waste, well, far away...

Knowing the cause of the problem, we should start to solve it. Since it takes time to open up space, we will not open up space. Try to use the original array space and add a temporary array at most. Can this problem be solved? To tell you the truth, I don't know how much efficiency can be improved now, but the improvement is positive. I'm looking forward to it

The modified code is as follows:

public int[] mergeSort(int[] array) {
        if(array == null || array.length == 0){
            return null;
        }
        //The temp is set in advance to reduce the performance degradation caused by creating temporary arrays in doMerge
        int[] temp = new int[array.length];
        merge(array, 0, array.length - 1, temp);
        return array;
    }

    private void merge(int[] array, int start, int end, int[] temp){
        if(start < end){
            //branch
            int mid = (start + end) >> 1;
            merge(array, start, mid, temp);
            merge(array, mid + 1, end, temp);
            //cure
            doMerge(array,start,mid,end, temp);
        }
    }

    private void doMerge(int[] array, int start, int mid, int end, int[] temp){
        int left = start;
        int right = mid + 1;
        int index = start;
        while(left <= mid && right <= end){
            temp[index++] = array[left] > array[right] ? array[left++] : array[right++];
        }

        //Import the array of the remaining sides in turn
        while(left <= mid){
            temp[index++] = array[left++];
        }

        while (right <= end){
            temp[index++] = array[right++];
        }

        //Import temporary array into array
        for (int i = start; i <= end; i++) {
            array[i] = temp[i];
        }
    }

Don't say much. If you have any questions, run and say again. When you do, the results are as follows:

OMG! How can it be like this, ah ah ah ah ah ah!

I must have read it wrong. I can't go to the ophthalmology department. What's the situation? I have changed so much that I haven't changed at all, and it's basically the same as choosing the sorting time.

It seems that what I found is basically the same as the time of sorting=

Ha ha, merge sort calls select sort

Hot eyes, ==

Change it back and see how efficient it is to merge and sort

By contrast, change the sorting time before merging

Well, it's proved that my changes are quite good. 80% of the data is faster after 20. Now increase the data volume and see how it works.

First look at the sorting time of 100 million random numbers in the old version:

Looking at the sorting time of 100 million random numbers in the new version

Well, on the whole, it's 25% faster when it comes to big data. It still doesn't meet my expectation. I think it's at least twice as fast, sang Xin

Finally, some suggestions are given. If you want to understand the idea of merging and sorting, you can look at the old version of the code. If you want to speed up, you can look at the new version of the code

Published 7 original articles, won praise 3, visited 2683
Private letter follow

Posted by watski on Thu, 12 Mar 2020 03:59:07 -0700