TopK Question: What is TopK Question?TopK is implemented using heap and fast exhaust

Keywords: Java less

Directory

1. What is Top K Question

2. The actual application scenario of Top K

3. Code implementation of Top K problem and its efficiency comparison

1. Implement Top K by heap

2. Implement Top K with Quick Row

3. Efficiency comparison of TopK by heap or fast-pad

 

* Body

1. What is Top K problem?

Given an unordered array of length N, output the minimum (or maximum) number of K.

 

2. The actual application scenario of Top K

List: There are millions of users, but only the top 100 results are required.To show, and this leaderboard is changing in real time.

 

3. Code implementation of Top K problem

Requirements: Give an unordered array of length N, please output the smallest 5 numbers.(

1. Implement Top K with Heap - Small Top Heap

(1) Comb the steps:

(1) Create a small top heap with k nodes;

(2) When the amount of data is less than k, put the data directly into this small-top heap, where the top node of the heap is the minimum value;

(3) When the amount of data >= k, each new data generated is compared with the top node of the heap:

If the new data > the top node data, the top node is deleted and the new data is put into the heap, where the heap is sorted and the total number of nodes that maintain the heap is k;

(2) The central idea is to keep the total number of nodes in the heap at k.

(3) Code implementation:

 1     @Test
 2     public void getTopKByHeapInsertTopKElement() {
 3         int arrayLength = 10000000 + 10;
 4         int topK = 5;
 5 
 6         // Prepare a length of arrayLength Unordered array:
 7         int[] array = A03TopKByQuickSortAndNewArray.getDisorderlyArray(arrayLength);
 8 
 9         // Prepare a summary point of topK Small Top Heap:
10         PriorityQueue<Integer> heap = new PriorityQueue<>(topK);
11 
12         long start = System.currentTimeMillis();
13         
14         // Always maintain a number of summary points k Heap of:
15         insertButmaintainTheHeapAtTopK(heap, array, topK);
16         
17         //Get the maximum topK: 
18         printHeap(heap);
19         
20         long end = System.currentTimeMillis();
21         System.out.println("Get the maximum top5 Total time consumed: " + (end - start));
22     }
23     
24     /**
25      * Get topK with a small top heap: When the amount of data exceeds topK, the newly generated data is directly compared with the top node of the heap.
26      */
27     private static void insertButmaintainTheHeapAtTopK(PriorityQueue<Integer> heap, int[] array, int topK) {
28         for (int i = 0; i < array.length; i++) {
29             if (i < topK) {
30                 heap.add(array[i]);
31             } else {// The following code is the key to how to maintain the total number of nodes in the heap:
32                 if (null != heap.peek() && array[i] > heap.peek()) {
33                     heap.poll();
34                     heap.add(array[i]);
35                 }
36             }
37         }
38     }
39     
40     /**
41      * Get Maximum TopK
42      * @param heap
43      */
44     static void printHeap(PriorityQueue<Integer> heap) {
45         Iterator<Integer> iterator = heap.iterator();
46         while (iterator.hasNext()) {
47             System.out.println(iterator.next());
48         }
49     }

  

 

2. Implement Top K with Quick Row

(1) Comb the steps:

(1) Sort the arrays of unordered arrays first by fast ranking;

(2) Remove the minimum Top 5 and place it in topArray; [Key]

(3) More than arrayLength results in insertNumber new data: directly compared with topArray arrays, it is also placed in topArray; [Key]

(2) Time complexity:

(1) Time complexity of sorting: O(N*logN);

(2) The time complexity of removing the top k: O(1) is to traverse the array.

(3) Code implementation:

  1     @Test
  2     public void testGetTopKByQuickSortToNewArray() {
  3         int topK = 5;
  4         int arrayLength = 10000000;
  5         
  6         //Prepare an unordered array
  7         int[] array = getDisorderlyArray(arrayLength);
  8         
  9         long start = System.currentTimeMillis();
 10         
 11         //1.Quick Row First Unordered Array array Sort
 12         quickSort(array, 0, array.length-1);
 13         
 14         //2.Remove Minimum Top 5,And put it in topArray Medium:
 15         int[] topKArray = insertToTopArrayFromDisorderlyArray(array, topK);
 16         
 17         //3.Exceed arrayLength After a total of data, there is insertNumber New data: direct and topArray[topKArray.length-1]Compare and place topArray Medium
 18         insertToTopKArray(topKArray, 10, 100, topKArray.length-1);//Generate 10 random numbers within 100 as new data, and topKArray[topKArray.length-1]
 19         
 20         long end = System.currentTimeMillis();
 21         System.out.println("Get the maximum top5 Total time consumed: " + (end - start));
 22     }
 23     
 24     /**
 25      * After generating new data, compare it with the topKArray array, and insert it into topKArray when viewing new data. If insertion is required, the heap topKArray is rearranged.
 26      * 
 27      * @param topKArray topK array
 28      * @param insertNumber Number of newly generated data
 29      * @param randomIntRange Within what range new data is generated, such as random numbers within 10.
 30      * @param topK In topKArray, determine the subscript of the element to be replaced.Get the minimum topK, then topK is the last element of topKArray sorted from smallest to largest.
 31      */
 32     private static void insertToTopKArray(int[] topKArray, int insertNumber, int randomIntRange, int topK) {
 33         Random random = new Random();
 34         int randomInt;
 35         for(int i = 0; i < insertNumber; i++) {
 36             randomInt = random.nextInt(100);
 37             if(randomInt < topKArray[topK]) {//New data if less than topArray[topK],Then replace directly with that number topArray,Then add topArray Reorder.
 38                 topKArray[topK] = randomInt;
 39                 quickSort(topKArray, 0, topKArray.length-1);
 40             }
 41         }
 42     }
 43     
 44     /**
 45      * Remove the desired TopK from the ordered array and place it in the TopK array.
 46      * 
 47      * @param sourceArray Ordered Array
 48      * @param topK Need to get Top K
 49      * @return TopK array
 50      */
 51     private static int[] insertToTopArrayFromDisorderlyArray(int[] sourceArray, int topK) {
 52         int[] topArray = new int[topK];
 53         for(int i = 0; i < 5; i++) {
 54             topArray[i] = sourceArray[i];
 55         }
 56         return topArray;
 57     }
 58     
 59     /**
 60      * Quick Row
 61      * @param target
 62      * @param left
 63      * @param right
 64      */
 65     static void quickSort(int[] target, int left, int right) {
 66         if (left >= right) {
 67             return;
 68         }
 69         int pivot = target[left];// Datum point
 70         int temp;
 71         int i = left;
 72         int j = right;
 73         while (i < j) {
 74             while (target[j] >= pivot && i < j) {
 75                 j--;
 76             }
 77             while (target[i] <= pivot && i < j) {
 78                 i++;
 79             }
 80             if (i < j) {
 81                 temp = target[i];
 82                 target[i] = target[j];
 83                 target[j] = temp;
 84             }
 85         }
 86         // left and right Meet:
 87         // ①The sum of the elements that will meet the point pivot Exchange:
 88         target[left] = target[j];
 89         target[j] = pivot;
 90         // ②Sort the elements on either side of the datum separately:
 91         quickSort(target, left, j - 1);
 92         quickSort(target, j + 1, right);
 93     }
 94     
 95     /**
 96      * Prepare an unordered array
 97      * 
 98      * @param arrayLength
 99      * @return int[]
100      */
101     static int[] getDisorderlyArray(int arrayLength) {
102         int[] disorderlyArray = new int[arrayLength];
103         Random random = new Random();
104         for (int i = 0; i < arrayLength; i++) {
105             disorderlyArray[i] = random.nextInt(arrayLength);
106         }
107         return disorderlyArray;
108     }
109     
110     /**
111      * foreach
112      */
113     static void showArray(int[] target) {
114         for (Integer element : target) {
115             System.out.println(element);
116         }
117     }

 

3. Efficiency comparison between heap-based TopK and fast-pad TopK:

* Small Top Heap | Quick Row

1 million + 10:00 data: | 11ms | 124ms

10 million + 10:00 data: | 28 ms | 1438 MS

Posted by Azala on Sun, 08 Dec 2019 11:43:06 -0800