Author Wang Zhanghao

https://zhuanlan.zhihu.com/p/143052860

Python is a scripting language, which has some shortcomings in efficiency and performance compared to compiling languages such as C/C++. However, there are many times when Python's efficiency is not as exaggerated as you might think. This article summarizes some of the Python code acceleration techniques.

## 0. Code optimization principles

This article introduces a number of Python code acceleration techniques. Before going into the details of code optimization, you need to understand some basic principles of code optimization.

The first basic principle is not to optimize prematurely. Many people start writing code with the goal of performance optimization, "Making the right program faster is much easier than making the fast program right." Therefore, optimization is based on the assumption that the code works properly. Optimizing prematurely may overlook the overall performance indicators and do not override them until global results are obtained.

The second basic principle is to balance the cost of optimization. Optimization comes at a cost, and it is almost impossible to solve all performance problems. Usually the choice is time for space or space for time. In addition, development costs need to be considered.

The third principle is not to optimize those parts that don't matter. If each part of the code is optimized, these changes can make the code difficult to read and understand. If your code is running slowly, first find the place where it is running slowly, usually an internal loop, and focus on optimizing where it is running slowly. Elsewhere, a loss of time has little effect.

## 1. Avoid global variables

# Writing is not recommended. Code time: 26.8 seconds import math size = 10000 for x in range(size): for y in range(size): z = math.sqrt(x) + math.sqrt(y)

Many programmers start writing simple scripts in Python, and when writing scripts, they are often accustomed to writing them directly as global variables, such as the code above. However, because global and local variables are implemented differently, code defined globally will run much slower than code defined in a function. By placing script statements in functions, you typically get a 15% - 30% increase in speed.

# Recommended writing. Code time: 20.6 seconds import math def main(): # Define in a function to reduce the use of all variables size = 10000 for x in range(size): for y in range(size): z = math.sqrt(x) + math.sqrt(y) main()

## 2.Avoid.

2.1 Avoid access to module and function properties

# Writing is not recommended. Code time: 14.5 seconds import math def computeSqrt(size: int): result = [] for i in range(size): result.append(math.sqrt(i)) return result def main(): size = 10000 for _ in range(size): result = computeSqrt(size) main()

Each time you use a. (property access operator) a specific method, such as u getattribute_u () and u getattr_u (), these methods perform dictionary operations, thus incurring additional time overhead. Attribute access can be eliminated by the from import statement.

# Optimize writing for the first time. Code time: 10.9 seconds from math import sqrt def computeSqrt(size: int): result = [] for i in range(size): result.append(sqrt(i)) # Avoid using math.sqrt return result def main(): size = 10000 for _ in range(size): result = computeSqrt(size) main()

In section 1, we mentioned that finding local variables is faster than finding global variables, so changing sqrt to a local variable can speed up the operation of frequently accessed variables.

# Optimize the writing for the second time. Code time: 9.9 seconds import math def computeSqrt(size: int): result = [] sqrt = math.sqrt # Assigning to a local variable for i in range(size): result.append(sqrt(i)) # Avoid using math.sqrt return result def main(): size = 10000 for _ in range(size): result = computeSqrt(size) main()

In addition to math.sqrt, the computeSqrt function has a.Presence, which is to call the append method of the list. By assigning this method to a local variable, the for loop inside the computeSqrt function can be completely eliminated.

# Recommended writing. Code time: 7.9 seconds import math def computeSqrt(size: int): result = [] append = result.append sqrt = math.sqrt # Assigning to a local variable for i in range(size): append(sqrt(i)) # Avoid use of result.append and math.sqrt return result def main(): size = 10000 for _ in range(size): result = computeSqrt(size) main()

## 2.2 Avoid Intra-Class Attribute Access

# Writing is not recommended. Code time: 10.4 seconds import math from typing import List class DemoClass: def __init__(self, value: int): self._value = value def computeSqrt(self, size: int) -> List[float]: result = [] append = result.append sqrt = math.sqrt for _ in range(size): append(sqrt(self._value)) return result def main(): size = 10000 for _ in range(size): demo_instance = DemoClass(size) result = demo_instance.computeSqrt(size) main()

The principle of avoiding.Also applies to attributes within classes, accessing self._value will be slower than accessing a local variable. You can speed up your code by assigning attributes within classes that require frequent access to a local variable.

# Recommended writing. Code time: 8.0 seconds import math from typing import List class DemoClass: def __init__(self, value: int): self._value = value def computeSqrt(self, size: int) -> List[float]: result = [] append = result.append sqrt = math.sqrt value = self._value for _ in range(size): append(sqrt(value)) # Avoid self. _ Use of value return result def main(): size = 10000 for _ in range(size): demo_instance = DemoClass(size) demo_instance.computeSqrt(size) main()

## 3. Avoid unnecessary abstraction

# Writing not recommended, code time: 0.55 seconds class DemoClass: def __init__(self, value: int): self.value = value @property def value(self) -> int: return self._value @value.setter def value(self, x: int): self._value = x def main(): size = 1000000 for i in range(size): demo_instance = DemoClass(size) value = demo_instance.value demo_instance.value = i main()

Anytime you wrap code with additional processing layers, such as decorators, property access, descriptors, it slows down the code. In most cases, it is necessary to reexamine the definition of using attribute accessors, which are usually legacy code styles from C/C++ programmers. If it's really not necessary, use simple attributes.

# Recommended writing, code time: 0.33 seconds class DemoClass: def __init__(self, value: int): self.value = value # Avoid unnecessary property accessors def main(): size = 1000000 for i in range(size): demo_instance = DemoClass(size) value = demo_instance.value demo_instance.value = i main()

## 4. Avoid data copying

4.1 Avoid meaningless data copying

# Writing not recommended, code time: 6.5 seconds def main(): size = 10000 for _ in range(size): value = range(size) value_list = [x for x in value] square_list = [x * x for x in value_list] main()

Value_in the code above Lists are completely unnecessary, creating unnecessary data structures or replications.

# Recommended writing, code time: 4.8 seconds def main(): size = 10000 for _ in range(size): value = range(size) square_list = [x * x for x in value] # Avoid meaningless duplication main()

Another is that Python's data-sharing mechanism is too paranoid, does not have a good understanding or trust of Python's memory model, and abuses functions such as copy.deepcopy(). It is usually possible to remove replication from these codes.

4.2 Do not use intermediate variables when exchanging values

# Writing not recommended, code time: 0.07 seconds def main(): size = 1000000 for _ in range(size): a = 3 b = 5 temp = a a = b b = temp main()

The code above creates a temporary variable temp when swapping values, which is simpler and runs faster without the help of intermediate variables.

# Recommended writing, code time: 0.06 seconds def main(): size = 1000000 for _ in range(size): a = 3 b = 5 a, b = b, a # Do not use intermediate variables main()

4.3 String splicing using join instead of +

# Writing not recommended, code time: 2.6 seconds import string from typing import List def concatString(string_list: List[str]) -> str: result = '' for str_i in string_list: result += str_i return result def main(): string_list = list(string.ascii_letters * 100) for _ in range(10000): result = concatString(string_list) main()

When using a + b to stitch strings, since strings in Python are immutable objects, they request a piece of memory space to copy a and b into the new requested memory space, respectively. Therefore, if n strings are to be stitched, n-1 intermediate results will be generated, each intermediate result will need to be requested and copied once in memory, seriously affecting the efficiency of operation. When using join() to stitch a string, the total memory space to be requested is calculated first, then the required memory is requested one time, and each string element is copied to that memory.

# Recommended writing, code time: 0.3 seconds import string from typing import List def concatString(string_list: List[str]) -> str: return ''.join(string_list) # Use join instead of + def main(): string_list = list(string.ascii_letters * 100) for _ in range(10000): result = concatString(string_list) main()

## 5. Short-circuit characteristics using if condition

# Writing not recommended, code time: 0.05 seconds from typing import List def concatString(string_list: List[str]) -> str: abbreviations = {'cf.', 'e.g.', 'ex.', 'etc.', 'flg.', 'i.e.', 'Mr.', 'vs.'} abbr_count = 0 result = '' for str_i in string_list: if str_i in abbreviations: result += str_i return result def main(): for _ in range(10000): string_list = ['Mr.', 'Hat', 'is', 'Chasing', 'the', 'black', 'cat', '.'] result = concatString(string_list) main()

The short-circuit property of the if condition refers to statements such as if a and b that return directly when a is False and B is no longer calculated. For statements such as if a or b, when a is True, they are returned directly and B is no longer evaluated. Therefore, in order to save running time, for an or statement, variables with a higher likelihood of being True should be written before or, and should be deferred.

# Recommended writing, code time: 0.03 seconds from typing import List def concatString(string_list: List[str]) -> str: abbreviations = {'cf.', 'e.g.', 'ex.', 'etc.', 'flg.', 'i.e.', 'Mr.', 'vs.'} abbr_count = 0 result = '' for str_i in string_list: if str_i[-1] == '.' and str_i in abbreviations: # Short-circuit characteristics using if condition result += str_i return result def main(): for _ in range(10000): string_list = ['Mr.', 'Hat', 'is', 'Chasing', 'the', 'black', 'cat', '.'] result = concatString(string_list) main()

## 6. Cycle optimization

6.1 Replace while loop with for loop

# Writing is not recommended. Code time: 6.7 seconds def computeSum(size: int) -> int: sum_ = 0 i = 0 while i < size: sum_ += i i += 1 return sum_ def main(): size = 10000 for _ in range(size): sum_ = computeSum(size) main()

Python's for loop is much faster than the while loop.

# Recommended writing. Code time: 4.3 seconds def computeSum(size: int) -> int: sum_ = 0 for i in range(size): # for loop instead of while loop sum_ += i return sum_ def main(): size = 10000 for _ in range(size): sum_ = computeSum(size) main()

6.2 Use implicit for loop instead of explicit for loop

For the example above, an explicit for loop can be replaced by an implicit for loop

# Recommended writing. Code time: 1.7 seconds def computeSum(size: int) -> int: return sum(range(size)) # Implicit for loop instead of explicit for loop def main(): size = 10000 for _ in range(size): sum = computeSum(size) main()

6.3 Reduce the calculation of the inner for loop

# Writing is not recommended. Code time: 12.8 seconds import math def main(): size = 10000 sqrt = math.sqrt for x in range(size): for y in range(size): z = sqrt(x) + sqrt(y) main()

In the above code, sqrt(x) is placed in the inner for loop, which is recalculated every time during the training process, which increases the time cost.

# Recommended writing. Code time: 7.0 seconds import math def main(): size = 10000 sqrt = math.sqrt for x in range(size): sqrt_x = sqrt(x) # Reduce calculation of inner for loop for y in range(size): z = sqrt_x + sqrt(y) main()

## 7. Use numba.jit

We follow the example described above and use numba.jit on top of that. Numba can compile the Python function JIT into machine code execution, greatly improving the speed of code execution. For more information on numba, see the following home page: http://numba.pydata.org/numba.pydata.org

# Recommended writing. Code time: 0.62 seconds import numba @numba.jit def computeSum(size: float) -> int: sum = 0 for i in range(size): sum += i return sum def main(): size = 10000 for _ in range(size): sum = computeSum(size) main()

## 8. Choose the appropriate data structure

Python's built-in data structures, such as str, tuple, list, set, dict, are all implemented in C, which is very fast. It is almost impossible to achieve a built-in speed in terms of performance by implementing a new data structure yourself.

list is a dynamic array similar to std::vector in C++. It will pre-allocate a certain amount of memory space, and when the pre-allocated memory space runs out and you continue to add elements to it, it will request a larger piece of memory space, then copy all the original elements, destroy the previous memory space, and insert new elements.

Deleting an element is similar in that when there is less than half of the pre-allocated memory space already in use, an additional small piece of memory is requested, an element copy is made, and the existing memory space is destroyed.

Therefore, if there are frequent addition and deletion operations and a large number of new and deleted elements, the list is not efficient. At this point, you should consider using collections.deque. Collections.dequeue is a two-end queue with the characteristics of both stack and queue, enabling insertion and deletion of O(1) complexity at both ends.

The list lookup operation is also time consuming. When you need to frequently find elements in a list or access them in an orderly manner, you can use bisect to maintain the order of the list objects and perform a binary search within them to improve the efficiency of the search.

Another common requirement is to find the minimum or maximum value, in which case you can use the heapq module to convert the list into a heap, making the time complexity to get the minimum value O(1).

The following pages give the time complexity of operations for common Python data structures: https://wiki.python.org/moin/TimeComplexity