[data structure and algorithm] merge sort and cardinality sort of sorting algorithm

Keywords: less

Merge sort and cardinality sort

Merge sort

Merge sort is a sort method based on the idea of merge. The classic divide and conquer strategy is adopted in this algorithm (divide the problem into some small problems and solve them recursively, while in the stage of conquer, the answers obtained in the different stages are "patched" together, that is, divide and conquer).

basic thought

It can be seen that this structure is very similar to a complete binary tree. In this paper, we use recursion (or iteration) to implement the merging and sorting. Stages can be understood as the process of recursive disassembly of molecular sequences.

Merge adjacent ordered subsequences:
In the governance stage, we need to merge two ordered subsequences into an ordered sequence. For example, in the last merge in the figure above, we need to merge [4,5,7,8] and [1,2,3,6] into the final sequence [1,2,3,4,5,6,7,8]. Let's see the next implementation steps

Implementation code of merging and sorting

  • Time complexity O(nlogn)
    //Opening + closing method
	public static void mergeSort(int[] arr, int left, int right, int[] temp) {
		if(left < right) {
			int mid = (left + right) / 2; //Intermediate index
			//Decompose recursively to the left
			mergeSort(arr, left, mid, temp);
			//Right recursion for decomposition
			mergeSort(arr, mid + 1, right, temp);
			//merge
			merge(arr, left, mid, right, temp);

		}
	}

	//Method of merger
	/**
	 *
	 * @param arr Raw array of sorts
	 * @param left Initial index of left ordered sequence
	 * @param mid Intermediate index
	 * @param right Right index
	 * @param temp Array for transit
	 */
	public static void merge(int[] arr, int left, int mid, int right, int[] temp) {

		int i = left; // Initialization i, initial index of left ordered sequence
		int j = mid + 1; //Initialize j, initial index of right ordered sequence
		int t = 0; // Current index to temp array

		//(1)
		//First, fill the left and right (ordered) data into the temp array according to the rules
		//Until the left and right side of the orderly sequence, one side of the processing is completed
		while (i <= mid && j <= right) {//Continue
			//If the current element of the ordered sequence on the left is less than or equal to the current element of the ordered sequence on the right
			//The current element on the left will be filled into the temp array 
			//Then t++, i++
			if(arr[i] <= arr[j]) {
				temp[t] = arr[i];
				t += 1;
				i += 1;
			} else { //Instead, fill the current element of the ordered sequence on the right into the temp array
				temp[t] = arr[j];
				t += 1;
				j += 1;
			}
		}

		//(two)
		//Fill the data on the side with remaining data into temp in turn
		while( i <= mid) { //The ordered sequence on the left and the remaining elements are all filled into temp
			temp[t] = arr[i];
			t += 1;
			i += 1;
		}

		while( j <= right) { //The ordered sequence on the right and the remaining elements are all filled into temp
			temp[t] = arr[j];
			t += 1;
			j += 1;
		}


		//(three)
		//Copy elements of temp array to arr
		//Note that not all copies are made every time
		t = 0;
		int tempLeft = left; // 
		//First merge templeft = 0, right = 1 / / templeft = 2 right = 3 / / TL = 0 RI = 3
		//Last templeft = 0 right = 7
		while(tempLeft <= right) {
			arr[tempLeft] = temp[t];
			t += 1;
			tempLeft += 1;
		}

	}

Test code

public static void main(String[] args) {
		//int arr[] = { 8, 4, 5, 7, 1, 3, 6, 2 }; //

		//Test the execution speed of fast platoon
		// Create a random array to give 80000
		int[] arr = new int[8000000];
		for (int i = 0; i < 8000000; i++) {
			arr[i] = (int) (Math.random() * 8000000); // Generate a [0, 8000000) number
		}
		System.out.println("Before sorting");
		Date data1 = new Date();
		SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
		String date1Str = simpleDateFormat.format(data1);
		System.out.println("The time before sorting is=" + date1Str);

		int temp[] = new int[arr.length]; //Merge sort requires an extra space
		mergeSort(arr, 0, arr.length - 1, temp);

		Date data2 = new Date();
		String date2Str = simpleDateFormat.format(data2);
		System.out.println("The time before sorting is=" + date2Str);

		//System.out.println("after merging and sorting," + Arrays.toString(arr));
	}

According to the test, it takes about 2s to sort 800w data, which is about the same as the fast sorting speed

Radix sorting

  • radix sort belongs to distribution sort, also known as bucket sort or bin sort. As the name implies, it allocates elements to be sorted to some buckets by the value of each bit of key value to achieve sorting function

  • The cardinality sorting method belongs to the stable sorting, while the cardinality sorting method belongs to the stable sorting with high efficiency

  • Radix sort is an extension of bucket sort

  • The cardinal order was invented by Herman Horsley in 1887. It is realized by cutting integers into different numbers according to the number of digits, and then comparing them according to each digit.

  • All the values to be compared shall be unified into the same digit length, and zero shall be filled in front of the shorter digit. Then, start from the lowest order and sort one by one. In this way, from the lowest ranking to the completion of the highest ranking, the sequence becomes an ordered sequence.

In this way, it's difficult to understand. Let's take a picture and text explanation to understand the steps of Radix sorting (for example, sorting three digits)

First round of sorting: compare digits

Second round of sorting: compare tens

Third round sorting: compare hundreds

Code simulation of the above algorithm

 public static void redixSort(int[] arr){
        // First round of sorting, sorting for single digits

        //Define a bucket, representing ten buckets, each bucket is a one bit array
        //Be careful:
        // 1. Define a two-dimensional data, including 10 arrays, to store data with n digits
        // 2. To prevent data overflow when putting data, define the size of each bucket as arr.length
        // 3. It is necessary to know that Radix sorting is a classical algorithm of exchanging time for space
        int[][] bucket = new int[10][arr.length];

        // Define an array to record the number of elements in each bucket
        // Columns such as bucketElementCounts[0] are the number of elements in the bucket (array) with subscript 0
        int[] bucketElementCounts = new int[10];

        // Traverse the array, and put the number in the array into the corresponding bucket according to the bit value
        for (int i=0;i<arr.length;i++){
            // Get the number of digits of the number element
            int digitElement = arr[i] % 10;
            // Put it into each bucket. The bucket element counts [digitelement] is used to record the number of elements in each bucket
            bucket[digitElement][bucketElementCounts[digitElement]] = arr[i];
            // Increase the number of corresponding elements
            bucketElementCounts[digitElement]++;
        }

        // Define an auxiliary variable index to store the first sorted number
        int index = 0;
        // Put the elements in the bucket into the array
        for (int j=0;j<bucketElementCounts.length;j++){
           // If the element in each bucket is not empty, there are elements in this location
            if (bucketElementCounts[j]!=0){
                // Loop to the k-th bucket, and the number of bucket elementcounts [k] represents the k-th bucket element
                for (int k=0;k<bucketElementCounts[j];k++){//bucketElementCounts[j]
                    // Put the elements in each bucket into the array arr in order
                    arr[index++] = bucket[j][k];
                }
            }
            // After taking out the data, set the bucket element counts [J] to empty, so that it is important to use it next time!!!
            bucketElementCounts[j]=0;
        }
        System.out.println("First round sorting results"+Arrays.toString(arr));

        // --------------Second round sorting----------------------
        // Traverse the array, and put the number in the array into the corresponding bucket according to the bit value
        for (int i=0;i<arr.length;i++){
            // Get the number of digits of a number element
            int digitElement = arr[i] /10% 10;
            // Put it into each bucket. The bucket element counts [digitelement] is used to record the number of elements in each bucket
            bucket[digitElement][bucketElementCounts[digitElement]] = arr[i];
            // Increase the number of corresponding elements
            bucketElementCounts[digitElement]++;
        }

        // Define an auxiliary variable index to store the first sorted number
        index = 0;
        // Put the elements in the bucket into the array
        for (int j=0;j<bucketElementCounts.length;j++){
            // If the element in each bucket is not empty, there are elements in this location
            if (bucketElementCounts[j]!=0){
                // Loop to the k-th bucket, and the number of bucket elementcounts [k] represents the k-th bucket element
                for (int k=0;k<bucketElementCounts[j];k++){//bucketElementCounts[j]
                    // Put the elements in each bucket into the array arr in order
                    arr[index++] = bucket[j][k];
                }
            }
            // After taking out the data, set the bucket element counts [J] to empty, so that it is important to use it next time!!!
            bucketElementCounts[j]=0;
        }
        System.out.println("Second round sorting results"+Arrays.toString(arr));

        //--------------------Third round sorting-----------------------
        for (int i=0;i<arr.length;i++){
            // Get the number of digits of a number element
            int digitElement = arr[i] /100 %10;
            // Put it into each bucket. The bucket element counts [digitelement] is used to record the number of elements in each bucket
            bucket[digitElement][bucketElementCounts[digitElement]] = arr[i];
            // Increase the number of corresponding elements
            bucketElementCounts[digitElement]++;
        }

        // Define an auxiliary variable index to store the first sorted number
        index = 0;
        // Put the elements in the bucket into the array
        for (int j=0;j<bucketElementCounts.length;j++){
            // If the element in each bucket is not empty, there are elements in this location
            if (bucketElementCounts[j]!=0){
                // Loop to the k-th bucket, and the number of bucket elementcounts [k] represents the k-th bucket element
                for (int k=0;k<bucketElementCounts[j];k++){//bucketElementCounts[j]
                    // Put the elements in each bucket into the array arr in order
                    arr[index++] = bucket[j][k];
                }
            }
            // After taking out the data, set the bucket element counts [J] to empty, so that it is important to use it next time!!!
            bucketElementCounts[j]=0;
        }
        System.out.println("Sorting results of the third round"+Arrays.toString(arr));

    }

A general method of Radix sorting algorithm

  • The time complexity is log, R is the base, B is the logarithm, B is the true number 0-9, and R is the base number
 /**
     * From three digits to all digits
     * @param arr
     */
    public static void redixSort2(int[] arr){
        //a. According to the above derivation process, we can see the code of cardinality sorting
        // Get the maximum number of digits in the array
        int max = arr[0];
        for (int i=0;i<arr.length;i++){
            if(arr[i]>max){
                max = arr[i];
            }
        }
        //After selecting the maximum number, check how many digits the maximum number is
        int maxLength = (max + "").length();


        // First round of sorting, sorting for single digits
        //Define a bucket, representing ten buckets, each bucket is a one bit array
        //Be careful:
        // 1. Define a two-dimensional data, including 10 arrays, to store data with n digits
        // 2. To prevent data overflow when putting data, define the size of each bucket as arr.length
        // 3. It is necessary to know that Radix sorting is a classical algorithm of exchanging time for space
        int[][] bucket = new int[10][arr.length];

        // Define an array to record the number of elements in each bucket
        // Columns such as bucketElementCounts[0] are the number of elements in the bucket (array) with subscript 0
        int[] bucketElementCounts = new int[10];

        // Cycle maxLength based on the number of digits from the loop
        for (int m=0,n=1;m<maxLength;m++,n*=10){
            // Traverse the array, and put the number in the array into the corresponding bucket according to the bit value
            for (int i=0;i<arr.length;i++){
                // Get the value of the corresponding digit of the digital element
                int digitElement = arr[i] /n % 10;
                // Put it into each bucket. The bucket element counts [digitelement] is used to record the number of elements in each bucket
                bucket[digitElement][bucketElementCounts[digitElement]] = arr[i];
                // Increase the number of corresponding elements
                bucketElementCounts[digitElement]++;
            }

            // Define an auxiliary variable index to store the first sorted number
            int index = 0;
            // Put the elements in the bucket into the array
            for (int j=0;j<bucketElementCounts.length;j++){
                // If the element in each bucket is not empty, there are elements in this location
                if (bucketElementCounts[j]!=0){
                    // Loop to the k-th bucket, and the number of bucket elementcounts [k] represents the k-th bucket element
                    for (int k=0;k<bucketElementCounts[j];k++){//bucketElementCounts[j]
                        // Put the elements in each bucket into the array arr in order
                        arr[index++] = bucket[j][k];
                    }
                }
                // After taking out the data, set the bucket element counts [J] to empty, so that it is important to use it next time!!!
                bucketElementCounts[j]=0;
            }
            System.out.println("The first"+(m+1)+"Round sorting result"+Arrays.toString(arr));
        }

Test on sorting code of cardinality

  • It can be seen from the sorting of operation base that its operation speed is very fast. When running 800w pieces of data, the speed is about 1s
  • Because this algorithm is a typical space for time algorithm, when running 8000w pieces of data, it needs 11 arrays, each int number is 4 bytes, so it needs 80000000 * 11 * 4 / 1024 / 1024 / 1024 =3.3G memory, and if the computer's memory is not enough, there will be oom (memory overflow) problem
	public static void main(String[] args) {
		int arr[] = { 53, 3, 542, 748, 14, 214};
		
		// 80000000 * 11 * 4 / 1024 / 1024 / 1024 =3.3G 
//		int[] arr = new int[8000000];
//		for (int i = 0; i < 8000000; i++) {
//			arr[i] = (int) (Math.random() * 8000000); / / generate a [0, 8000000) number
//		}
		System.out.println("Before sorting");
		Date data1 = new Date();
		SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
		String date1Str = simpleDateFormat.format(data1);
		System.out.println("The time before sorting is=" + date1Str);
		
		radixSort2(arr);
		
		Date data2 = new Date();
		String date2Str = simpleDateFormat.format(data2);
		System.out.println("The time before sorting is=" + date2Str);
		
		System.out.println("After cardinality sorting " + Arrays.toString(arr));
		
	}

Summary of common sorting algorithms

184 original articles published, 870 praised, 220000 visitors+
Private letter follow

Posted by jcd on Mon, 24 Feb 2020 06:32:42 -0800