Introduction to ACM [read in and output optimization]

Keywords: C C++ Algorithm

This article is mainly a summary based on oiwiki, and will do a series of related articles in the future. If you want to continue reading, you can follow the column.
The purpose of this column is to lay a foundation for my recent systematic. So I want to write a column for beginners who like ACM.

Why read in and output optimization

In fact, the purpose of read in and output optimization is to prevent our program TLE due to read in and output problems. So we need to read in and output optimization.
To reduce our running time and achieve the purpose of AC problem.

Do we need to optimize the input and output of each question?

A: in fact, this is actually analyzed according to specific topics. If the input / output data of the topic is large (e.g. more than 1e5), optimization should be added for the sake of insurance.
Of course, according to the habits of many experts, they directly add optimization regardless of the input / output of the topic.

Optimization method 1: close synchronization / unbind

scanf printf is much faster than cin cout.
This is because: in order to be compatible with C, C + + ensures that the program does not get confused when using printf scanf and cin cout, and binds the input / output streams together, so the time is much slower.
The solution is as follows:

std::ios::sync_with_stdio(false);
This function is a switch for "compatibility with stdio". In order to be compatible with C, C + + ensures that the program uses printf and cout
Without confusion, the output streams are tied together.
This is actually a conservative measure taken by C + + for compatibility. We can unbind stdio before IO operation, but after doing so
Note that you cannot use cin cout and printf scanf at the same time
tie
tie is a function that binds two streams. If it is null, it returns the current output stream pointer.
By default, s t d:: C i n is bound to s t d:: c o u T. f l u s h() is called every time the < < operator is executed, which will increase the IO burden. You can unbind s t d:: C i n with S T D:: C o u t through s t d:: C i n. T I E (0) (0 means NULL) to further speed up the execution efficiency.

The total optimization code is as follows:

std::ios::sync_with_stdio(false);
std::cin.tie(0);
//If C++11 or higher is enabled for compilation, std::cin.tie(nullptr) is recommended;

Optimization method 2: fast read and fast write

scanf and printf still have room for optimization. Note: the fast read and fast write here are only for integers. If floating-point numbers, write them yourself.
principle
As we all know, getchar is a function used to read in 1 byte data and convert it to char type. It is very fast, so "read in character - convert to integer" can be used Instead of slowly reading in each integer, each integer consists of two parts - Symbols and numbers. The '+' of the integer is usually omitted and will not affect the value represented by the following numbers, while the '-' cannot be omitted. Therefore, it should be determined that the hexadecimal integer does not contain spaces or other characters except 0 ~ 9 and positive and negative signs. Therefore, the characters that should not exist in the integer should be read in (usually a space), it can be determined that the end of reading has been completed. C and C + + languages provide the function isdigit in ctype.h and cctype header files respectively. This function will check whether the incoming parameters are decimal numeric characters. If yes, it will return true, otherwise it will return false. Correspondingly, in the following code, isdigit(ch) can be used instead of CH > = '0' & & ch < = '9',
Instead, use! isdigit(ch) instead of C H < '0' | C H > '9'
Then the template is as follows:

template <typename T>
inline T read() 
{ 
 	//Declare the template class, require the input type T, and define the inline function read() based on this type
	T sum = 0, fl = 1; // Define sum,fl, and ch as the input type
	int ch = getchar();
	for (; !isdigit(ch); ch = getchar())
	if (ch == '-') fl = -1;
	for (; isdigit(ch); ch = getchar()) sum = sum * 10 + ch - '0';
	return sum * fl;
}
template <typename T>
inline void write(T x) 
{
    static int sta[35];
    int top = 0;
    do {
        sta[top++] = x % 10, x /= 10;
    } while (x);
    while (top) putchar(sta[--top] + 48); // 48 is' 0 '
}

The method of use is as follows:

a = read<int>();
b = read<long long>();
c = read<__int128>();
write(a);

Time comparison in various cases

Through the above figure, you will basically have a general understanding. Finally, you can add a locomotive to further optimize. After adding a locomotive, we will have less time.
The locomotive code is as follows:

#pragma GCC optimize(2)
#pragma GCC optimize(3)
#pragma GCC optimize("Ofast")
#pragma GCC optimize("inline")
#pragma GCC optimize("-fgcse")
#pragma GCC optimize("-fgcse-lm")
#pragma GCC optimize("-fipa-sra")
#pragma GCC optimize("-ftree-pre")
#pragma GCC optimize("-ftree-vrp")
#pragma GCC optimize("-fpeephole2")
#pragma GCC optimize("-ffast-math")
#pragma GCC optimize("-fsched-spec")
#pragma GCC optimize("unroll-loops")
#pragma GCC optimize("-falign-jumps")
#pragma GCC optimize("-falign-loops")
#pragma GCC optimize("-falign-labels")
#pragma GCC optimize("-fdevirtualize")
#pragma GCC optimize("-fcaller-saves")
#pragma GCC optimize("-fcrossjumping")
#pragma GCC optimize("-fthread-jumps")
#pragma GCC optimize("-funroll-loops")
#pragma GCC optimize("-fwhole-program")
#pragma GCC optimize("-freorder-blocks")
#pragma GCC optimize("-fschedule-insns")
#pragma GCC optimize("inline-functions")
#pragma GCC optimize("-ftree-tail-merge")
#pragma GCC optimize("-fschedule-insns2")
#pragma GCC optimize("-fstrict-aliasing")
#pragma GCC optimize("-fstrict-overflow")
#pragma GCC optimize("-falign-functions")
#pragma GCC optimize("-fcse-skip-blocks")
#pragma GCC optimize("-fcse-follow-jumps")
#pragma GCC optimize("-fsched-interblock")
#pragma GCC optimize("-fpartial-inlining")
#pragma GCC optimize("no-stack-protector")
#pragma GCC optimize("-freorder-functions")
#pragma GCC optimize("-findirect-inlining")
#pragma GCC optimize("-fhoist-adjacent-loads")
#pragma GCC optimize("-frerun-cse-after-loop")
#pragma GCC optimize("inline-small-functions")
#pragma GCC optimize("-finline-small-functions")
#pragma GCC optimize("-ftree-switch-conversion")
#pragma GCC optimize("-foptimize-sibling-calls")
#pragma GCC optimize("-fexpensive-optimizations")
#pragma GCC optimize("-funsafe-loop-optimizations")
#pragma GCC optimize("inline-functions-called-once")
#pragma GCC optimize("-fdelete-null-pointer-checks")

The use of locomotives is not recommended. It seems that they are not allowed in formal competitions.

Total template

From the above, the following templates can be obtained.
Template 1: fast reading and fast writing

Benefits: we can mix cin cout printf scanf
Disadvantages: the template is a little long

template <typename T>
inline T read() 
{ 
 	//Declare the template class, require the input type T, and define the inline function read() based on this type
	T sum = 0, fl = 1; // Define sum,fl, and ch as the input type
	int ch = getchar();
	for (; !isdigit(ch); ch = getchar())
	if (ch == '-') fl = -1;
	for (; isdigit(ch); ch = getchar()) sum = sum * 10 + ch - '0';
	return sum * fl;
}
template <typename T>
inline void write(T x) 
{
    static int sta[35];
    int top = 0;
    do {
        sta[top++] = x % 10, x /= 10;
    } while (x);
    while (top) putchar(sta[--top] + 48); // 48 is' 0 '
}

Template 2: close buffer and synchronization

Benefit: the optimized code is very short, only two lines
Disadvantages: after optimization, only cin cout can be used

std::ios::sync_with_stdio(false);
std::cin.tie(0);

Personal recommendation:
I generally prefer not to use templates. Generally, I can use scanf printf for statements with more input data and more output data.
If you are playing online games (such as cf), I will use the first method, because I like to use printf when I want some formatted output.

Posted by ugriffin on Sat, 04 Dec 2021 18:18:09 -0800

Programmer Group