leetcode 438. Find All Anagrams in a String

Keywords: C++

Description

Given a string s and a non-empty string p, find all the start indices of p's anagrams in s.

Strings consists of lowercase English letters only and the length of both strings s and p will not be larger than 20,100.

The order of output does not matter.

Example 1:

Input:
s: "cbaebabacd" p: "abc"

Output:
[0, 6]

Explanation:
The substring with start index = 0 is "cba", which is an anagram of "abc".
The substring with start index = 6 is "bac", which is an anagram of "abc".
Example 2:

Input:
s: "abab" p: "ab"

Output:
[0, 1, 2]

Explanation:
The substring with start index = 0 is "ab", which is an anagram of "ab".
The substring with start index = 1 is "ba", which is an anagram of "ab".
The substring with start index = 2 is "ab", which is an anagram of "ab".

My solution

  • The simplest idea is to calculate and compare two map s from 1 - > psize, 2 - > psize + 1, 3 - > psize + 2,... Each time.
  • Considering that only one new element comes in and one old element leaves at a time, the algorithm can be simplified to modify the map only for changing places.
  • Considering that comparing all the elements of two map s at a time is redundant, because it only needs to compare the changes, if some elements have been matched by s <=> p, it can be ignored. The way I adopt is to build a map to be investigated (named dif). When dif is empty, there is no difference between the two map s, res.push_back is enough; when dif is not empty, it means that the sliding window needs to move on. Comparing the O(psize) complexity of two maps at a time.
  • Generally speaking, the basic idea is unordered_map. Of course, the code needs to be optimized.
class Solution {
public:
    vector<int> findAnagrams(string s, string p) {
        vector<int> res;
        int ssize = s.size();
        int psize = p.size();
        unordered_map<char, int> mp;
        for (int i = 0; i < psize; ++i) --mp[p[i]];
        unordered_map<char, int> dif = mp;
        for (int i = 0; i < psize; ++i) {
            if (++dif[s[i]] == 0) dif.erase(s[i]);
        }
        if (dif.empty()) res.push_back(0);
        for (int i = psize; i < ssize; ++i) {
            if (++dif[s[i]] == 0) dif.erase(s[i]);
            if (--dif[s[i - psize]] == 0) dif.erase(s[i - psize]);
            if (dif.empty()) res.push_back(i - psize + 1);
        }
        return res;
    }
};

Discuss

The following code is basically the same as my idea, but because the letters are limited, direct storage of vectors in 26 spaces (this is an exceptional case). Of course, the time consumption of vectors should be higher, especially in the steps p==v in the following code.

class Solution {
public:
    vector<int> findAnagrams(string s, string p) {
        vector<int> pv(26,0), sv(26,0), res;
        if(s.size() < p.size())
           return res;
        // fill pv, vector of counters for pattern string and sv, vector of counters for the sliding window
        for(int i = 0; i < p.size(); ++i)
        {
            ++pv[p[i]-'a'];
            ++sv[s[i]-'a'];
        }
        if(pv == sv)
           res.push_back(0);

        //here window is moving from left to right across the string. 
        //window size is p.size(), so s.size()-p.size() moves are made 
        for(int i = p.size(); i < s.size(); ++i) 
        {
             // window extends one step to the right. counter for s[i] is incremented 
            ++sv[s[i]-'a'];
            
            // since we added one element to the right, 
            // one element to the left should be forgotten. 
            //counter for s[i-p.size()] is decremented
            --sv[s[i-p.size()]-'a']; 

            // if after move to the right the anagram can be composed, 
            // add new position of window's left point to the result 
            if(pv == sv)  
               res.push_back(i-p.size()+1);
        }
        return res;
    }
};

Epilogue
Why do I think my way is excellent? Is it an illusion?

Reference

Posted by Digital Wallfare on Thu, 14 Feb 2019 22:21:18 -0800