Algorithm and data structure - bucket sorting

Keywords: Java Algorithm data structure

Bucket sorting:

The sorting under the bucket sorting idea is not based on comparison, and the application range is limited. The data status of the sample needs to meet the bucket division. The following describes the two sorting methods under the bucket sorting idea
(1) Count sort (2) cardinality sort

Time complexity

O(N)

Count sort:

All the sorting algorithms previously introduced, such as insert sorting and heap sorting, are based on comparison. In addition to these sorting algorithms, there are sorting not based on comparison:
For example, sort the age of the company's employees. Considering the actual age range, you can open up an array with a size of 100, count the age of all employees and add them to the array. For example, the number of employees with an age of 1 is at position 0 of the array, and so on.
Finally, expand all the ages according to the value (i.e. frequency) of each position on the array to get an ordered age array. This sort is called counting sort,

Code implementation:

public class CountSort {
    public static void countSort(int[] arr) {
        if (arr == null || arr.length < 2) {
            return;
        }
        int max = Integer.MIN_VALUE;
        for (int i = 0; i < arr.length; i++) {
            max = Math.max(max, arr[i]);
        }
        int[] bucket = new int[max + 1];
        for (int i = 0; i < arr.length; i++) {
            bucket[arr[i]]++;
        }
        int i = 0;
        for (int j = 0; j < bucket.length; j++) {
            while (bucket[j]-- > 0) {
                arr[i++] = j;
            }
        }
    }
}

Cardinality sorting:

Suppose the array to be sorted is [17,13,25100,72]:
1. Determine the number of digits of the maximum number in all numbers. The maximum number in the array is 100, that is, 3 bits. Fill all the numbers in the array to 3 bits to get [017013025100072]
2. In all figures, the number category is 0 ~ 9, so 10 barrels of 0-9 are required. The queue is used as the data structure of the bucket, so that the bucket is first in first out when outputting numbers (it can be stack, array and other structures)
3. Put the numbers into the bucket according to the size of different digits in the filled array:
3.1 look at the single digit first, and you can get [7,3,5,0,2]. Put the filled numbers into these 8 buckets respectively according to the single digit value:
No. 0: [100] No. 1: [] No. 2: [072] No. 3: [013]
No. 4: [] No. 5: [025] No. 6: [] No. 7: [017]
Return the numbers in the bucket in order to get a new array [100072013025017]
3.2 looking at the ten digits, we can get [0,7,1,2,1]. According to the ten digits, put the filled numbers into the eight buckets respectively:
No. 0: [100] No. 1: [013017] No. 2: [025] No. 3: []
No. 4: [] No. 5: [] No. 6: [] No. 7: [072]
Return the numbers in the bucket in order to get a new array [100013017025072]
3.3 finally, look at the hundreds, and you can get [1,0,0,0,0]. Put the filled numbers into these 8 buckets respectively according to the hundreds:
No. 0: [013017025072] No. 1: [100] No. 2: [] No. 3: []
No. 4: [] No. 5: [] No. 6: [] No. 7: []
Return the numbers in the bucket in order to get a new array [0130177025072100]

There are some differences between code implementation and analysis. Suppose array arr=[13,21,11,52,62]:
1. Create a count array (size 10) to record the number of digits corresponding to each single digit, so the count array is [0,2,2,1,0,0,0,0,0,0,0]
2. Calculate the prefix sum of different position values of the current count array and fill it in the count array, [0,2,4,5,5,5,5,5,5,5]. Among them, 2 = 0 + 2, 4 = 0 + 2 + 2, 5 = 0 + 2 + 2 + 1, and so on. At this time, the meaning of count array is different from that in 1. For example, 4 means that there are 4 digits < = 2 in the original array
3. At this time, traverse the original array from right to left, that is, look at 62 first. Its single digit is 2, so the position after sorting is the third bit of bucket array (auxiliary array, which is used to temporarily store the currently sorted number)
Why? This 3 = 4-1, 4 is the value of count on the single digit of 2. Because the array starts from 0, 62 is placed in position 3 (that is, there are still 3 numbers with single digit of 2, which should be placed in positions 0, 1 and 2). In this way, there are 4 numbers that exactly match the single digit < = 2 in the original array. Traversal from the right is just in line with the order of entering the bucket, which is first in first out. At this time, the number of single digits in the count array should be reduced by 1. Note that the number in other positions of count does not need to be changed. The final bucket array is [(21,11), (52,62), (13)]. The number in each bracket can be regarded as the number in a bucket
4. Re assign the number in the bucket array to the arr array, and then continue the above operations 1 ~ 3 for its decimal and hundred digits

Code implementation:

public class RadixSort {
    // For non negative values only
    public static void radixSort(int[] arr) {
        if (arr == null || arr.length < 2) {
            return;
        }
        radixSort(arr, 0, arr.length - 1, maxbits(arr));

    }

    // arr[L...R] sort
    public static void radixSort(int[] arr, int L, int R, int digit) {
        // radix is set to 10 because the range of numbers (i.e. the number of barrels) is 0 ~ 9
        final int radix = 10;
        int i = 0, j = 0;
        int[] bucket = new int[R - L + 1];
        // Determine how many times to enter and exit the bucket according to the size of decimal digits
        for (int d = 1; d <= digit; d++) {
            // count represents a prefix and array
            // count[0]: how many numbers the current bit (d bit) is (0)
            // count[1]: how many numbers the current bit (d bit) is (0 and 1)
            // count[i]: how many numbers the current bit (d bit) is (0~i)
            int[] count = new int[radix];

            // First, calculate the number of different barrels to be placed
            for (i = L; i < R; i++) {
                j = getDigit(arr[i], d);
                count[j]++;
            }
            // Recalculate prefix and
            for (i = 1; i < radix; i++) {
                count[i] = count[i] + count[i - 1];
            }

            // Reposition the original array from right to left into the bucket array (an auxiliary array)
            for (i = R; i >= L; i--) {
                j = getDigit(arr[i], d);
                bucket[count[j] - 1] = arr[i];
                count[j]--;
            }
            // Reposition the bucket array into the original array
            for (i = L, j = 0; i <= R; i++, j++) {
                arr[i] = bucket[j];
            }

        }
    }

    // Get the decimal value of digit x in bit d (from right to left) (it can be regarded as filling each number in the array)
    public static int getDigit(int x, int d) {
        return ((x / ((int) Math.pow(10, d - 1))) % 10);
    }

    // How many decimal places does the maximum value in the calculation array have
    public static int maxbits(int[] arr) {
        int max = Integer.MIN_VALUE;
        // Select the maximum value in the array
        for (int i = 0; i < arr.length; i++) {
            max = Math.max(max, arr[i]);
        }
        int res = 0;
        while (max != 0) {
            res++;
            max /= 10;
        }
        return res;
    }
}

tips:

Compared with counting sorting, the requirements of cardinality sorting for data range are much lower. Even if the number of barrels is 100 million, the number of barrels is controllable. However, there are certain requirements for data types. Only binary data objects can be used for sorting. Verify again that the sorting not based on comparison is based on data status

Posted by Chronos on Wed, 17 Nov 2021 07:34:48 -0800