Mo team (offline processing interval query)

It was a long time ago that I first heard about the Mo team's algorithm. I always thought it was a difficult algorithm, so I never tried to learn from it. Until I saw Mo team today, I knew that Mo team is actually an optimization of solving interval query problems by using block thinking.

Next, I will analyze the optimization process of Mo team step by step. Let me introduce an example first:

How should we solve this problem? At first, it must be a violent practice. We can initially define an L, R and calculate the answer of the interval [l, R]. We might as well assume that the next question is [l - p] , r - Q], then we can move l left, add the contribution of l every time we move left, move r left, and subtract the contribution of R every time we move left. In this way, we can get the interval [l - p] at the end , r - Answer to q]. However, there will be a problem: if the distance between two queries is large, the complexity is very high, such as (1, 10), (1000010008), (2, 15), (1005010100)...... for such an interval, l moves from 1 to 10000, then to 2, and then to 10050. Each span is very large, so the complexity is very high.

At this time, some students will think of such a solution, which is to sort from small to large according to l as the first priority and r as the second priority, and then do it according to the previous idea of violence.

This is indeed a good optimization idea, but there are still data that can block this approach, such as (1, 15), (100200), (2, 50), (15000), (24000), (33000). If we sort from small to large according to l as the first priority and r as the second priority of each query, the queries after sorting are (1, 15), (15000), (2, 50), (24000), (32000), (1005000). In this case, the moving order of L is from 1 - > 2 - > 3 - > 100, l is 99 times in total, and the moving order of R is 15 - > 5000 - > 50 - > 4000 - > 2000 - > 5000, so r is 4985 + 4950 + 3950 + 2000 + 3000 = 18885 times in total. Obviously, R moves too many times. If we ask in the following order:

(1, 15), (2, 50), (32000), (24000), (15000), (1005000). In this way, the moving order of l is 1 - > 2 - > 3 - > 2 - > 1 - > 100, the moving times are 103, the moving order of r is 15 - > 50 - > 2000 - > 4000 - > 5000 - > 5000, and the moving times of r is 4985. In this way, the moving times of the following method is obviously less than that of the above method. Then the following method is optimized by using the idea of Mo team.

What is the idea of Mo team, that is, first divide the left boundary of the query into blocks, and then sort according to the block where the left boundary is located as the first priority and the right boundary as the second priority. Then solve it according to the idea of violence. How to divide it? In fact, according to the mathematical derivation, when the size of the block is the square of the total number, the complexity of the algorithm is the lowest, and the complexity is o (n^1.5). Let me write it briefly:

Suppose we divide the whole interval into x blocks, then each block has n/x, l moves up to n/x times each time on average, and r moves up to N times each time. For m times, the query complexity is max (n*m/x,m*x). Obviously, when the two are equal, this formula obtains the minimum value, that is, x=sqrt (n)

Let's talk about the nature of the problems that can be solved by Mo team:

The answer o(1) of [l, R] can be used to calculate the answers of [L - 1, R], [L + 1, R], [l, R - 1], [l, R + 1].

The following is the template: (take Luogu P2709 as an example)

#include<cstdio>
#include<cstring>
#include<algorithm>
#include<iostream>
#include<queue>
#include<cmath>
using namespace std;
typedef long long ll;
const int N=1e5+10;
ll pn,n,m,k,ans;
//pn is the size of the block 
struct node{
	ll id,l,r,pl;
}p[N];
bool cmp(node a,node b)
{
	//The block where the left boundary is located is the first priority, and the size of the right boundary is the second priority
	if(a.pl!=b.pl) return a.pl<b.pl;
	return a.r<b.r;
}
ll cnt[N],sum[N],a[N];
//For different topics, just change the add and sub functions 
void add(ll x) 
{
	ans-=cnt[x]*cnt[x];
	cnt[x]++;
	ans+=cnt[x]*cnt[x];
}
void sub(ll x)
{
	ans-=cnt[x]*cnt[x];
	cnt[x]--;
	ans+=cnt[x]*cnt[x];
}
int main()
{
	scanf("%lld%lld%lld",&n,&m,&k);
	pn=sqrt(n);
	for(int i=1;i<=n;i++)
		scanf("%lld",&a[i]);
	for(int i=1;i<=m;i++)
	{
		scanf("%lld%lld",&p[i].l,&p[i].r);
		p[i].id=i;
		p[i].pl=(p[i].l-1)/pn+1;//Block the left boundary of the query 
	}
	sort(p+1,p+m+1,cmp);
	ll l=1,r=0;
	for(int i=1;i<=m;i++)
	{
		while(l<p[i].l)
		{
			sub(a[l]);
			l++;
		}
		while(l>p[i].l)
		{
			l--;
			add(a[l]);
		}
		while(r<p[i].r)
		{
			r++;
			add(a[r]);
		}
		while(r>p[i].r)
		{
			sub(a[r]);
			r--;
		}
		sum[p[i].id]=ans;
	}
	//Output answer 
	for(int i=1;i<=m;i++)
		printf("%lld\n",sum[i]);
	return 0;
}

Posted by Threepwood on Wed, 13 Oct 2021 04:40:46 -0700

Programmer Group

Mo team (offline processing interval query)

Hot Keywords