## storage structure

1. Open addressing method (y general recommendation)

The principle of open addressing method is similar to going to the toilet

2. Zipper method

Similar to adjacency table

## String hash mode

p-ary

1. Treat the string as a p-ary number

2. Convert the number in p-ary to the number in 10-ary

3. For the whole number modQ

In this way, any string can be mapped to a number between 0 and Q-1

Two principles

1. Cannot map to 0

2. The character is good enough without conflict

When p=131 or 13331

Q is taken as 2e64

At this time, there is no conflict in most cases.

Except for the loop section, kmp can't hash strings on most problems

## Common operations of hash table

Algorithm often test: add, find

If you have to delete, you don't really delete this point, but open a boolean variable mark

## hash function

Generally, we take the module directly. The number of modules generally takes the prime number, and this number should be as far away from the entire power of 2 as possible. It can be proved that the probability of causing conflict is the smallest.

## conflict

Zipper method: open a one-dimensional array to store all hash values, and add a chain to each slot to store all conflicting numbers in this slot.

AcWing 840. Analog hash table

Maintain a collection and support the following operations:

"I x", insert a number x;

"Q x", query whether the number x has appeared in the set;

Now, N operations will be performed, and the corresponding results will be output for each query operation.

Input format

The first line contains the integer N, which represents the number of operations.

Next N lines, each line contains an operation instruction, which is one of "I x" and "Q x".

Output format

For each query instruction "Q x", output a query result. If x has appeared in the set, output "Yes", otherwise output "No".

One line for each result.

Data range

1≤N≤105

−109≤x≤109

Input example:

5

I 1

I 2

I 3

Q 2

Q 5

Output example:

Yes

No

Zipper method

#include <bits/stdc++.h> using namespace std; const int N = 100003; int h[N], ne[N], e[N], idx; void myinsert(int x) { int k = (x % N + N) % N; e[idx] = x; ne[idx] = h[k]; h[k] = idx++; } bool myfind(int x) { int k = (x % N + N) % N; for(int i = h[k]; i != -1; i = ne[i]) { if(e[i] == x) return true; } return false; } int main() { ios::sync_with_stdio(false); cin.tie(0); int n, x; string s; memset(h, -1, sizeof h); cin >> n; while(n--) { cin >> s >> x; if(s == "I") myinsert(x); if(s == "Q") { if(myfind(x)) cout << "Yes" << '\n'; else cout << "No" << '\n'; } } return 0; }

open addressing

#include <bits/stdc++.h> using namespace std; const int N = 200003; const int null = 0x3f3f3f3f; int h[N]; int myfind(int x) { int t = (x % N + N) % N; while(h[t] != null && h[t] != x) { t++; if(t == N) t = 0; } return t; } int main() { ios::sync_with_stdio(false); cin.tie(0); int n, x; string s; memset(h, 0x3f, sizeof h); cin >> n; while(n--) { cin >> s >> x; if(s == "I") { h[myfind(x)] = x; } if(s == "Q") { if(h[myfind(x)] == null) cout << "No" << '\n'; else cout << "Yes" << '\n'; } } return 0; }

Note: memset is assigned by byte, so the value of insurance is 0 or - 1

ACWING841 string hash

Title Description

Given a string of length n and m queries, each query contains four integers l1,r1,l2,r2l1,r1,l2,r2. Please judge whether the string substrings contained in [l1,r1l1,r1] and [l2,r2l2,r2] are exactly the same.

The string contains only upper and lower case English letters and numbers.

Input format

The first line contains integers n and m, indicating the length of the string and the number of queries.

The second line contains a string of length n, which contains only upper and lower case English letters and numbers.

The next m lines contain four integers l1,r1,l2,r2l1,r1,l2,r2, representing the two intervals involved in a query.

Note that the position of the string starts with 1.

Output format

One result is output for each query. If the two string substrings are exactly the same, "Yes" is output, otherwise, "No" is output.

One line for each result.

Data range

1≤n,m≤1051≤n,m≤105

Input example:

8 3

aabbaabb

1 3 5 7

1 3 6 8

1 2 1 2

Output example:

Yes

No

Yes

AC code

#include <bits/stdc++.h> using namespace std; const int N = 100010, P = 131; typedef unsigned long long ULL; int h[N], p[N]; char x[N]; ULL query(int l, int r) { return h[r] - h[l - 1] * p[r - l + 1]; } int main() { ios::sync_with_stdio(false); cin.tie(0); int n, m, l1, r1, l2, r2; h[0] = 0, p[0] = 1; cin >> n >> m; cin >> x + 1; for(int i = 1; i <= n; i++) { h[i] = h[i - 1] * P + x[i]; p[i] = p[i - 1] * P; } while(m--) { cin >> l1 >> r1 >> l2 >> r2; if(query(l1, r1) == query(l2, r2)) cout << "Yes" << '\n'; else cout << "No" << '\n'; } return 0; }

Note: p[i] represents the value of the I power of p, and h[i] represents the hash value of the string from 1 to I, that is, the prefix value of the string.

Using unsigned long to store h and p arrays can save the process of modQ.

Formula:

For a string shaped like X1X2X3 * Xn − 1XnX1X2X3 * Xn − 1Xn, the hash value is calculated by multiplying the ascii code of the character by the power of P.

Mapping formula (X1) × Pn−1+X2 × Pn−2+⋯+Xn−1 × P1+Xn × P0)modQ

For string x1x2x3

h[1] = x1

h[2] = h[1] * P + x2 = x1 * p + x2

h[3] = h[2] * p + x3 = x1 * p^2 + x2 * p + x3

Now we want to find the hash value of x2x3, which can be recorded as x2 * p+x3. It can be seen that this expression is only one x1 * p^2 from h[3]. We can multiply h[1] by the quadratic power of p, and 2 = 3 - 2 + 1 (r - l + 1), 1 = 2 - 1 (l - 1).

Therefore, the string hash value from 2 to 3 is: h[3] - h[2-1] * p[3-2+1]

In general, the string hash value from l to r is: h[r] - h[l-1] * p[r-l+1]