leetcode brush record day009:28 and 8

Keywords: Java Algorithm leetcode Interview string

28. Simple difficulty: passing rate 40%

explain:
What value should we return when need is an empty string? This is a good question in the interview.
For this problem, we should return 0 when need is an empty string. This is consistent with the C language's str () and Java's indexOf() definitions.

Question to charity: haystack and need are only composed of lowercase English characters

Method 1: KMP algorithm

Using the simple method of enumeration, without considering pruning, the complexity is O(m * n), while the complexity of KMP algorithm is O(m + n)
KMP can complete the search within O(m + n) complexity because it can extract effective information for reuse in the process of "incomplete matching", so as to reduce the consumption of "repeated matching".
KMP principle is not described here, and relevant interpretation videos can be searched at station b.
Or click the link below to see the code implementation process:
Author: AC_OIer
Link: https://leetcode-cn.com/problems/implement-strstr/solution/shua-chuan-lc-shuang-bai-po-su-jie-fa-km-tb86/

class Solution {
    // KMP algorithm
    // ss: original string PP: matching pattern
    public int strStr(String ss, String pp) {
        if (pp.isEmpty()) return 0;
        
        // Read the length of the original string and the matching string respectively
        int n = ss.length(), m = pp.length();
        // The original string and the matching string are preceded by spaces so that their subscripts start with 1
        ss = " " + ss;
        pp = " " + pp;

        char[] s = ss.toCharArray();
        char[] p = pp.toCharArray();

        // Construct the next array whose length is the length of the matching string (the next array is related to the matching string)
        int[] next = new int[m + 1];
        // The construction process starts with i = 2, j = 0, and i is less than or equal to the matching string length [construction starts with 2]
        for (int i = 2, j = 0; i <= m; i++) {
            // If the matching is unsuccessful, j = next(j)
            while (j > 0 && p[i] != p[j + 1]) j = next[j];
            // If the match is successful, let j first++
            if (p[i] == p[j + 1]) j++;
            // Update next[i], end this cycle, I++
            next[i] = j;
        }

        // Matching process, i = 1, j = 0, i is less than or equal to the original string length [matching i starts from 1]
        for (int i = 1, j = 0; i <= n; i++) {
            // Unsuccessful matching j = next(j)
            while (j > 0 && s[i] != p[j + 1]) j = next[j];
            // If the matching is successful, let j + + finish this cycle and then i++
            if (s[i] == p[j + 1]) j++;
            // If the whole segment matches successfully, the subscript is returned directly
            if (j == m) return i - m;
        }

        return -1;
    }
}

Method 2: Sunday algorithm

Its core idea is: in the matching process, the pattern string is not required to compare from left to right or from right to left. When it finds a mismatch, the algorithm can skip as many characters as possible for the next matching, so as to improve the matching efficiency.

Note that the mode string is S, the substring is T, and the lengths are N and M respectively.
For T, we do a simple and clever preprocessing: record the last position of each character in T and store it in an array.
The idea of Sunday algorithm is very similar to BM algorithm. When matching fails, we focus on the next character of the last character in the text string. If the character does not appear in the matching string, it will be skipped directly, that is, moving step = matching string length + 1
(from the original matching position + the length of the matching string to "the next character of the last character", but it is not in the matching string, so + 1);
(since the matching at the current position will fail, if we start from the next position at the current position, the matching will just succeed, then the next character of the last character participating in the failed matching will certainly appear in the pattern string. If it does not exist, the assumption is not tenable, so we directly + 1 and start from the current position + matching string length + 1 position Match)
If the character appears in the matching string, like the BM algorithm, its moving step = the distance from the rightmost character in the matching string to the end + 1
(the number of moving digits of the "first comparison position" in the text string and the matching string = the length of the pattern string - the rightmost position of the character (starting with 0) = the distance from the rightmost position of the character in the pattern string to the tail + 1)
(if this piece is difficult to understand, see the following example)

Algorithm example: the picture comes from https://blog.csdn.net/u013001763/article/details/69397504
A is the text string and B is the matching string. At the beginning, the pointer points to the first place respectively
[external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG igkrnaoy-1632399194032) (28. Implement strstrstr()]. Assets / image-20210923183013415. PNG)

The first element is different from each other and t is not in B. the next character of the last character currently matched is i, and i is in B. therefore, the next "first comparison position" is "pattern string length - the rightmost position of the character" = 4-1 = 3 distance after the position of element t, that is, the red pointer moves 3 distance from the position of element t to s (this just allows element i of A to correspond to element i of B) as shown in the following figure
[external chain image transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-GiC6F5ZL-1632399194034)(28. Implement strstrstr()]. Assets / image-20210923183214240. PNG)

At this time, s and c are different. The next character of the last character currently matched is b, and b is not in b. therefore, directly jump to the next element i of b as the "first comparison position" to start the comparison (that is, increase the distance from s position to pattern string length + 1 = 4 + 1 = 5 to i position). As shown in the following figure
[external link image transfer failed. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-hFND6Pfk-1632399194036)(28. Implement strstrstr()]. Assets / image-20210923184428396. PNG)

Still different, the next character of the last character currently matched is t, and t is in B, so the next "first comparison position" is "pattern string length - the rightmost position of the character" = 4-2 = 2 distance after element i position, that is, the red pointer moves 2 distance from element i position to c. as shown in the following figure
[external link picture transfer failed. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-zzyb6wn3-1632399194037)(28. Implement strstrstr()]. Assets / image-20210923184612334. PNG)

class Solution {
       public int strStr(String haystack, String needle) {
        int n = haystack.length(), m = needle.length();
        if (m == 0) return 0;
        if (m > n) return -1;
        char[] haystacks = haystack.toCharArray();
        char[] needles = needle.toCharArray();
        Map<Character, Integer> shiftTable = new HashMap<>();
        for (int i = 0; i < m; i++) {
            //The distance from the rightmost position of the character in the pattern string to the tail + 1: use this as the offset table [c]
            shiftTable.put(needles[i], m - i);
        }

        int i = 0;
        //Cycle end condition IDX + len (pattern) > len (string) 
        while (i + m <= n) {
            int k = 0;
            //Are the starting comparison characters the same
            while (k < m && needles[k] == haystacks[i + k]) {
                k++;
            }
            //If k equals m and all characters match the same, the correct subscript will be returned directly
            if (k == m) return i;
            else {
                //If there are different characters: if I + m < n, it also meets the basic matching conditions. If not, it directly returns - 1 without correct results
                //If the conditions are met, if shiftTable.get(haystacks[i + m]) == null, it is the next character of the last character currently matched
                //If it is not in the mode string, the current character position i moves by a distance of m + 1 (mode string length + 1)
                //If it's in the pattern string, then
                if (i + m < n) i += shiftTable.get(haystacks[i + m]) == null ? m + 1 : shiftTable.get(haystacks[i + m]);
                else return -1;
            }
        }

        return -1;
    }  
}

The logic of the above code operation part is:
Each matching will extract the string to be matched from the target string to match the pattern string:
If it matches, the current idx (current character position) is returned
If there is no match, check the last character of the matching string c:
If c exists in Pattern, idx = idx + offset table [c]
Otherwise, idx = idx + len(pattern)
Repeat Loop until IDX + len (pattern) > len (string) (current character position + matching string len gt h > text string)

Method 3: horsepool algorithm. You can understand it in detail. The algorithm is similar to that of sunday

class Solution {
public static int strStr(String haystack, String needle) {
        if (needle.equals("")) return 0;
        if (needle.length() > haystack.length()) return -1;
        Map<Character, Integer> shiftTable = new HashMap<>();
        char[] haystacks = haystack.toCharArray();
        char[] needles = needle.toCharArray();
        int n = haystacks.length, m = needles.length;
	    //Start from the first element, record the last position of each character in T, and store it in a hash table.
        for (int i = 0; i < needles.length - 1; i++) {
            //The first parameter is a character, and the second parameter is where the character appears
            //Suppose that character a is recorded at position 1, but a appears at position 3. At this time, the position record will be refreshed, so that the record at the rightmost position is reached
            //The distance from the rightmost position of the character in the pattern string to the tail + 1: use this as the offset table [c]
            shiftTable.put(needles[i], needles.length - 1 - i);
        }
		//At the beginning, i points to the end of the matching string (starting from 0)
        int i = m - 1; 
        while (i < n) { 
            int k = 0; 
            while (k < m && needles[m - k - 1] == haystacks[i - k]) {
                k++;
            }
            if (k == m) return i - m + 1;
            else i += shiftTable.get(haystacks[i]) == null ? m : shiftTable.get(haystacks[i]);
        }

        return -1;
    }
}

8. Medium difficulty: the passing rate of this question is only about 20%

If it is difficult to understand, you can look directly at the examples given in the title to cover all the situations.
1. Read in the string and discard useless leading spaces
2. Check whether the next character (assuming it has not reached the end of the character) is a positive or negative sign, and read the character (if any). Determines whether the final result is negative or positive. If neither exists, the result is assumed to be positive.
3. Reads the next character until the next non numeric character is reached or the end of the input is reached. The rest of the string will be ignored.
(example: ① after reading the first non numeric character w, the "123word" ignores w and the following parts. ② the non numeric character in "wer 123" is at the front, so directly ignoring all the following is equal to not reading any numbers, and it is directly determined as 0)
4. Convert these numbers read in the previous steps into integers (i.e., "123" - > 123, "0032" - > 32). If no number is read in, the integer is 0. Change the symbol if necessary (starting from step 2).
5. If the number of integers exceeds the range of 32-bit signed integers [− 2 ^ 31, 2 ^ 31 − 1], you need to truncate the integer to keep it within this range. Specifically, integers less than − 2 ^ 31 should be fixed as − 2 ^ 31, and integers greater than 2 ^ 31 − 1 should be fixed as 2 ^ 31 − 1.
Returns an integer as the final result.

be careful:
The blank characters in this question only include the space character ''
Do not ignore any characters except the leading space or the rest of the string after the number
0 <= s.length <= 200
s consists of English letters (uppercase and lowercase), numbers (0-9), ',' + ',' - 'and'. '
The title requires that the string be converted into an integer, so no decimal is required

Code and parsing references are linked as follows:

Author: Liwei 1419
Link: https://leetcode-cn.com/problems/string-to-integer-atoi/solution/jin-liang-bu-shi-yong-ku-han-shu-nai-xin-diao-shi-/

//Time complexity: O(N) space complexity: O(1)
public class Solution {

    public int myAtoi(String str) {
        int len = str.length();
        // str.charAt(i) method goes back to check the validity of the subscript. Generally, it is first converted into a character array (. toCharArray() method)
        char[] charArray = str.toCharArray();

        // 1. Remove leading spaces
        int index = 0;
        
        //As long as the index is less than len, it will be traversed unless the current character is not a space. In this way, you can judge the following "" multiple spaces
        while (index < len && charArray[index] == ' ') { 
            index++;
        }

        // 2. If traversal is complete (for extreme use case '')
        if (index == len) {
            return 0;
        }

        // 3. If a symbol character appears, only the first one is valid and the positive and negative are recorded
        int sign = 1;
        char firstChar = charArray[index];
        if (firstChar == '+') {
            index++;
        } else if (firstChar == '-') {
            index++;
            sign = -1;
        }

        // 4. Converts subsequent numeric characters
        // You can't use long type. That's what the title says
        int res = 0;
        while (index < len) {
            char currChar = charArray[index];
            // 4.1 judge the illegal situation first
            if (currChar > '9' || currChar < '0') {
                break;
            }

            // The title says: the environment can only store signed integers of 32-bit size. Therefore, it is necessary to judge in advance: whether the break is crossed after multiplying by 10
            //Integer.MAX_VALUE refers to the result of dividing the limit number by 10. The difference between this number * 10 and the limit number is Integer.MAX_VALUE % 10
            //Integer.MAX_ Value% 10 = 7 because% 10 means discarding all numbers above single digits
            if (res > Integer.MAX_VALUE / 10 || (res == Integer.MAX_VALUE / 10 && (currChar - '0') > Integer.MAX_VALUE % 10)) {
                return Integer.MAX_VALUE;
            }
            if (res < Integer.MIN_VALUE / 10 || (res == Integer.MIN_VALUE / 10 && (currChar - '0') > -(Integer.MIN_VALUE % 10))) {
                return Integer.MIN_VALUE;
            }

            // 4.2 conversion is considered only when it is legal, and sign bits are multiplied in each step
            res = res * 10 + sign * (currChar - '0');
            index++;
        }
        return res;
    }

    public static void main(String[] args) {
        Solution solution = new Solution();
        String str = "2147483646";
        int res = solution.myAtoi(str);
        System.out.println(res);

        System.out.println(Integer.MAX_VALUE);
        System.out.println(Integer.MIN_VALUE);
    }
}

In fact, this problem does not investigate the knowledge of algorithms. It simulates the processing of raw data in daily development (such as "parameter verification" and other scenarios)
In fact, many times, business requirements are such problems. If you encounter them at work:
1. There are ready-made tools and class libraries that need to be used as much as possible, because they have better performance and are relatively reliable after more rigorous testing;
2. Those that can be extracted into tool classes and tool methods shall be extracted as much as possible to highlight the backbone logic and facilitate code reuse in the future;
3. When you have to write cumbersome and lengthy, you need to write clear notes and reflect the logical level, so as to troubleshoot problems and subsequent maintenance after going online.

Several key points:
1. According to example 1, the leading space needs to be removed;
2. According to example 2, it is necessary to judge whether the first character is + and -. Therefore, a variable sign can be designed, which is 1 during initialization. If -, the sign can be corrected to - 1;
3. To judge whether it is a number, the ASCII code value of the character can be used for comparison, i.e. 0 < = C < = '9';
4. According to examples 3 and 4, when the first character that is not a number is encountered, the conversion stops and exits the cycle;
5. According to example 5, if the converted number exceeds the range of int type, it needs to be intercepted. The result res variable cannot be designed as long here,

Note: since the input string may exceed the long type after conversion, you need to judge whether it is out of bounds within the loop,

Exit the loop as long as it crosses the boundary, which can also reduce unnecessary calculations;
6. Since subscript access is involved, it is necessary to consider whether the array subscript is out of bounds in the whole process.
7. Because the title says that "the environment can only save 32-bit integers", it is necessary to check whether there is overflow after multiplying by 1010 before each cycle. See the code for details.

Posted by cavemaneca on Thu, 23 Sep 2021 04:25:57 -0700