Hash table: General hash, string hash (with examples)

Keywords: data structure

storage structure

1. Open addressing method (y general recommendation)
The principle of open addressing method is similar to going to the toilet
2. Zipper method
Similar to adjacency table

String hash mode

p-ary
1. Treat the string as a p-ary number
2. Convert the number in p-ary to the number in 10-ary
3. For the whole number modQ
In this way, any string can be mapped to a number between 0 and Q-1
Two principles
1. Cannot map to 0
2. The character is good enough without conflict
When p=131 or 13331
Q is taken as 2e64
At this time, there is no conflict in most cases.
Except for the loop section, kmp can't hash strings on most problems

Common operations of hash table

Algorithm often test: add, find
If you have to delete, you don't really delete this point, but open a boolean variable mark

hash function

Generally, we take the module directly. The number of modules generally takes the prime number, and this number should be as far away from the entire power of 2 as possible. It can be proved that the probability of causing conflict is the smallest.

conflict

Zipper method: open a one-dimensional array to store all hash values, and add a chain to each slot to store all conflicting numbers in this slot.

AcWing 840. Analog hash table
Maintain a collection and support the following operations:
"I x", insert a number x;
"Q x", query whether the number x has appeared in the set;
Now, N operations will be performed, and the corresponding results will be output for each query operation.

Input format
The first line contains the integer N, which represents the number of operations.

Next N lines, each line contains an operation instruction, which is one of "I x" and "Q x".

Output format
For each query instruction "Q x", output a query result. If x has appeared in the set, output "Yes", otherwise output "No".

One line for each result.

Data range
1≤N≤105
−109≤x≤109
Input example:
5
I 1
I 2
I 3
Q 2
Q 5
Output example:
Yes
No

Zipper method

#include <bits/stdc++.h>
using namespace std;
const int N = 100003;
int h[N], ne[N], e[N], idx;
void myinsert(int x)
{
    int k = (x % N + N) % N;
    e[idx] = x;
    ne[idx] = h[k];
    h[k] = idx++;
}
bool myfind(int x)
{
    int k = (x % N + N) % N;
    for(int i = h[k]; i != -1; i = ne[i])
    {
        if(e[i] == x)
            return true;
    }
    return false;
}
int main()
{
    ios::sync_with_stdio(false);
    cin.tie(0);
    int n, x;
    string s;
    memset(h, -1, sizeof h);
    cin >> n;
    while(n--)
    {
        cin >> s >> x;
        if(s == "I")
            myinsert(x);
        if(s == "Q")
        {
            if(myfind(x))
                cout << "Yes" << '\n';
            else
                cout << "No" << '\n';
        }
    }
    return 0;
}

open addressing

#include <bits/stdc++.h>
using namespace std;
const int N = 200003;
const int null = 0x3f3f3f3f;
int h[N];
int myfind(int x)
{
    int t = (x % N + N) % N;
    while(h[t] != null && h[t] != x)
    {
        t++;
        if(t == N)
            t = 0;
    }
    return t;
}
int main()
{
    ios::sync_with_stdio(false);
    cin.tie(0);
    int n, x;
    string s;
    memset(h, 0x3f, sizeof h);
    cin >> n;
    while(n--)
    {
        cin >> s >> x;
        if(s == "I")
        {
            h[myfind(x)] = x;
        }
        if(s == "Q")
        {
            if(h[myfind(x)] == null)
                cout << "No" << '\n';
            else
                cout << "Yes" << '\n';
        }
    }
    return 0;
}

Note: memset is assigned by byte, so the value of insurance is 0 or - 1

ACWING841 string hash
Title Description
Given a string of length n and m queries, each query contains four integers l1,r1,l2,r2l1,r1,l2,r2. Please judge whether the string substrings contained in [l1,r1l1,r1] and [l2,r2l2,r2] are exactly the same.

The string contains only upper and lower case English letters and numbers.

Input format
The first line contains integers n and m, indicating the length of the string and the number of queries.

The second line contains a string of length n, which contains only upper and lower case English letters and numbers.

The next m lines contain four integers l1,r1,l2,r2l1,r1,l2,r2, representing the two intervals involved in a query.

Note that the position of the string starts with 1.

Output format
One result is output for each query. If the two string substrings are exactly the same, "Yes" is output, otherwise, "No" is output.

One line for each result.

Data range
1≤n,m≤1051≤n,m≤105
Input example:
8 3
aabbaabb
1 3 5 7
1 3 6 8
1 2 1 2
Output example:
Yes
No
Yes

AC code

#include <bits/stdc++.h>
using namespace std;
const int N = 100010, P = 131;
typedef unsigned long long ULL;
int h[N], p[N];
char x[N];
ULL query(int l, int r)
{
    return h[r] - h[l - 1] * p[r - l + 1];
}
int main()
{
    ios::sync_with_stdio(false);
    cin.tie(0);
    int n, m, l1, r1, l2, r2;
    h[0] = 0, p[0] = 1;
    cin >> n >> m;
    cin >> x + 1;
    for(int i = 1; i <= n; i++)
    {
        h[i] = h[i - 1] * P + x[i];
        p[i] = p[i - 1] * P;
    }
    while(m--)
    {
        cin >> l1 >> r1 >> l2 >> r2;
        if(query(l1, r1) == query(l2, r2))
            cout << "Yes" << '\n';
        else
            cout << "No" << '\n';
    }
    return 0;
}

Note: p[i] represents the value of the I power of p, and h[i] represents the hash value of the string from 1 to I, that is, the prefix value of the string.
Using unsigned long to store h and p arrays can save the process of modQ.
Formula:
For a string shaped like X1X2X3 * Xn − 1XnX1X2X3 * Xn − 1Xn, the hash value is calculated by multiplying the ascii code of the character by the power of P.
Mapping formula (X1) × Pn−1+X2 × Pn−2+⋯+Xn−1 × P1+Xn × P0)modQ

For string x1x2x3
h[1] = x1
h[2] = h[1] * P + x2 = x1 * p + x2
h[3] = h[2] * p + x3 = x1 * p^2 + x2 * p + x3
Now we want to find the hash value of x2x3, which can be recorded as x2 * p+x3. It can be seen that this expression is only one x1 * p^2 from h[3]. We can multiply h[1] by the quadratic power of p, and 2 = 3 - 2 + 1 (r - l + 1), 1 = 2 - 1 (l - 1).
Therefore, the string hash value from 2 to 3 is: h[3] - h[2-1] * p[3-2+1]

In general, the string hash value from l to r is: h[r] - h[l-1] * p[r-l+1]

Posted by ChibiGuy on Fri, 12 Nov 2021 19:15:23 -0800