Strong! Eight Python tips for optimizing speed-up!

Author Wang Zhanghao
https://zhuanlan.zhihu.com/p/143052860

Python is a scripting language, which has some shortcomings in efficiency and performance compared to compiling languages such as C/C++. However, there are many times when Python's efficiency is not as exaggerated as you might think. This article summarizes some of the Python code acceleration techniques.

0. Code optimization principles

This article introduces a number of Python code acceleration techniques. Before going into the details of code optimization, you need to understand some basic principles of code optimization.

The first basic principle is not to optimize prematurely. Many people start writing code with the goal of performance optimization, "Making the right program faster is much easier than making the fast program right." Therefore, optimization is based on the assumption that the code works properly. Optimizing prematurely may overlook the overall performance indicators and do not override them until global results are obtained.

The second basic principle is to balance the cost of optimization. Optimization comes at a cost, and it is almost impossible to solve all performance problems. Usually the choice is time for space or space for time. In addition, development costs need to be considered.

The third principle is not to optimize those parts that don't matter. If each part of the code is optimized, these changes can make the code difficult to read and understand. If your code is running slowly, first find the place where it is running slowly, usually an internal loop, and focus on optimizing where it is running slowly. Elsewhere, a loss of time has little effect.

1. Avoid global variables

# Writing is not recommended. Code time: 26.8 seconds
import math

size = 10000
for x in range(size):
    for y in range(size):
        z = math.sqrt(x) + math.sqrt(y)

Many programmers start writing simple scripts in Python, and when writing scripts, they are often accustomed to writing them directly as global variables, such as the code above. However, because global and local variables are implemented differently, code defined globally will run much slower than code defined in a function. By placing script statements in functions, you typically get a 15% - 30% increase in speed.

# Recommended writing. Code time: 20.6 seconds
import math

def main():  # Define in a function to reduce the use of all variables
    size = 10000
    for x in range(size):
        for y in range(size):
            z = math.sqrt(x) + math.sqrt(y)

main()

2.Avoid.

2.1 Avoid access to module and function properties

# Writing is not recommended. Code time: 14.5 seconds
import math

def computeSqrt(size: int):
    result = []
    for i in range(size):
        result.append(math.sqrt(i))
    return result

def main():
    size = 10000
    for _ in range(size):
        result = computeSqrt(size)

main()

Each time you use a. (property access operator) a specific method, such as u getattribute_u () and u getattr_u (), these methods perform dictionary operations, thus incurring additional time overhead. Attribute access can be eliminated by the from import statement.

# Optimize writing for the first time. Code time: 10.9 seconds
from math import sqrt

def computeSqrt(size: int):
    result = []
    for i in range(size):
        result.append(sqrt(i))  # Avoid using math.sqrt
    return result

def main():
    size = 10000
    for _ in range(size):
        result = computeSqrt(size)

main()

In section 1, we mentioned that finding local variables is faster than finding global variables, so changing sqrt to a local variable can speed up the operation of frequently accessed variables.

# Optimize the writing for the second time. Code time: 9.9 seconds
import math

def computeSqrt(size: int):
    result = []
    sqrt = math.sqrt  # Assigning to a local variable
    for i in range(size):
        result.append(sqrt(i))  # Avoid using math.sqrt
    return result

def main():
    size = 10000
    for _ in range(size):
        result = computeSqrt(size)

main()

In addition to math.sqrt, the computeSqrt function has a.Presence, which is to call the append method of the list. By assigning this method to a local variable, the for loop inside the computeSqrt function can be completely eliminated.

# Recommended writing. Code time: 7.9 seconds
import math

def computeSqrt(size: int):
    result = []
    append = result.append
    sqrt = math.sqrt    # Assigning to a local variable
    for i in range(size):
        append(sqrt(i))  # Avoid use of result.append and math.sqrt
    return result

def main():
    size = 10000
    for _ in range(size):
        result = computeSqrt(size)

main()

2.2 Avoid Intra-Class Attribute Access

# Writing is not recommended. Code time: 10.4 seconds
import math
from typing import List

class DemoClass:
    def __init__(self, value: int):
        self._value = value
    
    def computeSqrt(self, size: int) -> List[float]:
        result = []
        append = result.append
        sqrt = math.sqrt
        for _ in range(size):
            append(sqrt(self._value))
        return result

def main():
    size = 10000
    for _ in range(size):
        demo_instance = DemoClass(size)
        result = demo_instance.computeSqrt(size)

main()

The principle of avoiding.Also applies to attributes within classes, accessing self._value will be slower than accessing a local variable. You can speed up your code by assigning attributes within classes that require frequent access to a local variable.

# Recommended writing. Code time: 8.0 seconds
import math
from typing import List

class DemoClass:
    def __init__(self, value: int):
        self._value = value
    
    def computeSqrt(self, size: int) -> List[float]:
        result = []
        append = result.append
        sqrt = math.sqrt
        value = self._value
        for _ in range(size):
            append(sqrt(value))  # Avoid self. _ Use of value
        return result

def main():
    size = 10000
    for _ in range(size):
        demo_instance = DemoClass(size)
        demo_instance.computeSqrt(size)

main()

3. Avoid unnecessary abstraction

# Writing not recommended, code time: 0.55 seconds
class DemoClass:
    def __init__(self, value: int):
        self.value = value

    @property
    def value(self) -> int:
        return self._value

    @value.setter
    def value(self, x: int):
        self._value = x

def main():
    size = 1000000
    for i in range(size):
        demo_instance = DemoClass(size)
        value = demo_instance.value
        demo_instance.value = i

main()

Anytime you wrap code with additional processing layers, such as decorators, property access, descriptors, it slows down the code. In most cases, it is necessary to reexamine the definition of using attribute accessors, which are usually legacy code styles from C/C++ programmers. If it's really not necessary, use simple attributes.

# Recommended writing, code time: 0.33 seconds
class DemoClass:
    def __init__(self, value: int):
        self.value = value  # Avoid unnecessary property accessors

def main():
    size = 1000000
    for i in range(size):
        demo_instance = DemoClass(size)
        value = demo_instance.value
        demo_instance.value = i

main()

4. Avoid data copying

4.1 Avoid meaningless data copying

# Writing not recommended, code time: 6.5 seconds
def main():
    size = 10000
    for _ in range(size):
        value = range(size)
        value_list = [x for x in value]
        square_list = [x * x for x in value_list]

main()

Value_in the code above Lists are completely unnecessary, creating unnecessary data structures or replications.

# Recommended writing, code time: 4.8 seconds
def main():
    size = 10000
    for _ in range(size):
        value = range(size)
        square_list = [x * x for x in value]  # Avoid meaningless duplication

main()

Another is that Python's data-sharing mechanism is too paranoid, does not have a good understanding or trust of Python's memory model, and abuses functions such as copy.deepcopy(). It is usually possible to remove replication from these codes.

4.2 Do not use intermediate variables when exchanging values

# Writing not recommended, code time: 0.07 seconds
def main():
    size = 1000000
    for _ in range(size):
        a = 3
        b = 5
        temp = a
        a = b
        b = temp

main()

The code above creates a temporary variable temp when swapping values, which is simpler and runs faster without the help of intermediate variables.

# Recommended writing, code time: 0.06 seconds
def main():
    size = 1000000
    for _ in range(size):
        a = 3
        b = 5
        a, b = b, a  # Do not use intermediate variables

main()

4.3 String splicing using join instead of +

# Writing not recommended, code time: 2.6 seconds
import string
from typing import List

def concatString(string_list: List[str]) -> str:
    result = ''
    for str_i in string_list:
        result += str_i
    return result

def main():
    string_list = list(string.ascii_letters * 100)
    for _ in range(10000):
        result = concatString(string_list)

main()

When using a + b to stitch strings, since strings in Python are immutable objects, they request a piece of memory space to copy a and b into the new requested memory space, respectively. Therefore, if n strings are to be stitched, n-1 intermediate results will be generated, each intermediate result will need to be requested and copied once in memory, seriously affecting the efficiency of operation. When using join() to stitch a string, the total memory space to be requested is calculated first, then the required memory is requested one time, and each string element is copied to that memory.

# Recommended writing, code time: 0.3 seconds
import string
from typing import List

def concatString(string_list: List[str]) -> str:
    return ''.join(string_list)  # Use join instead of +

def main():
    string_list = list(string.ascii_letters * 100)
    for _ in range(10000):
        result = concatString(string_list)

main()

5. Short-circuit characteristics using if condition

# Writing not recommended, code time: 0.05 seconds
from typing import List

def concatString(string_list: List[str]) -> str:
    abbreviations = {'cf.', 'e.g.', 'ex.', 'etc.', 'flg.', 'i.e.', 'Mr.', 'vs.'}
    abbr_count = 0
    result = ''
    for str_i in string_list:
        if str_i in abbreviations:
            result += str_i
    return result

def main():
    for _ in range(10000):
        string_list = ['Mr.', 'Hat', 'is', 'Chasing', 'the', 'black', 'cat', '.']
        result = concatString(string_list)

main()

The short-circuit property of the if condition refers to statements such as if a and b that return directly when a is False and B is no longer calculated. For statements such as if a or b, when a is True, they are returned directly and B is no longer evaluated. Therefore, in order to save running time, for an or statement, variables with a higher likelihood of being True should be written before or, and should be deferred.

# Recommended writing, code time: 0.03 seconds
from typing import List

def concatString(string_list: List[str]) -> str:
    abbreviations = {'cf.', 'e.g.', 'ex.', 'etc.', 'flg.', 'i.e.', 'Mr.', 'vs.'}
    abbr_count = 0
    result = ''
    for str_i in string_list:
        if str_i[-1] == '.' and str_i in abbreviations:  # Short-circuit characteristics using if condition
            result += str_i
    return result

def main():
    for _ in range(10000):
        string_list = ['Mr.', 'Hat', 'is', 'Chasing', 'the', 'black', 'cat', '.']
        result = concatString(string_list)

main()

6. Cycle optimization

6.1 Replace while loop with for loop

# Writing is not recommended. Code time: 6.7 seconds
def computeSum(size: int) -> int:
    sum_ = 0
    i = 0
    while i < size:
        sum_ += i
        i += 1
    return sum_

def main():
    size = 10000
    for _ in range(size):
        sum_ = computeSum(size)

main()

Python's for loop is much faster than the while loop.

# Recommended writing. Code time: 4.3 seconds
def computeSum(size: int) -> int:
    sum_ = 0
    for i in range(size):  # for loop instead of while loop
        sum_ += i
    return sum_

def main():
    size = 10000
    for _ in range(size):
        sum_ = computeSum(size)

main()

6.2 Use implicit for loop instead of explicit for loop

For the example above, an explicit for loop can be replaced by an implicit for loop

# Recommended writing. Code time: 1.7 seconds
def computeSum(size: int) -> int:
    return sum(range(size))  # Implicit for loop instead of explicit for loop

def main():
    size = 10000
    for _ in range(size):
        sum = computeSum(size)

main()

6.3 Reduce the calculation of the inner for loop

# Writing is not recommended. Code time: 12.8 seconds
import math

def main():
    size = 10000
    sqrt = math.sqrt
    for x in range(size):
        for y in range(size):
            z = sqrt(x) + sqrt(y)

main()

In the above code, sqrt(x) is placed in the inner for loop, which is recalculated every time during the training process, which increases the time cost.

# Recommended writing. Code time: 7.0 seconds
import math

def main():
    size = 10000
    sqrt = math.sqrt
    for x in range(size):
        sqrt_x = sqrt(x)  # Reduce calculation of inner for loop
        for y in range(size):
            z = sqrt_x + sqrt(y)

main()

7. Use numba.jit

We follow the example described above and use numba.jit on top of that. Numba can compile the Python function JIT into machine code execution, greatly improving the speed of code execution. For more information on numba, see the following home page: http://numba.pydata.org/numba.pydata.org

# Recommended writing. Code time: 0.62 seconds
import numba

@numba.jit
def computeSum(size: float) -> int:
    sum = 0
    for i in range(size):
        sum += i
    return sum

def main():
    size = 10000
    for _ in range(size):
        sum = computeSum(size)

main()

8. Choose the appropriate data structure

Python's built-in data structures, such as str, tuple, list, set, dict, are all implemented in C, which is very fast. It is almost impossible to achieve a built-in speed in terms of performance by implementing a new data structure yourself.

list is a dynamic array similar to std::vector in C++. It will pre-allocate a certain amount of memory space, and when the pre-allocated memory space runs out and you continue to add elements to it, it will request a larger piece of memory space, then copy all the original elements, destroy the previous memory space, and insert new elements.

Deleting an element is similar in that when there is less than half of the pre-allocated memory space already in use, an additional small piece of memory is requested, an element copy is made, and the existing memory space is destroyed.

Therefore, if there are frequent addition and deletion operations and a large number of new and deleted elements, the list is not efficient. At this point, you should consider using collections.deque. Collections.dequeue is a two-end queue with the characteristics of both stack and queue, enabling insertion and deletion of O(1) complexity at both ends.

The list lookup operation is also time consuming. When you need to frequently find elements in a list or access them in an orderly manner, you can use bisect to maintain the order of the list objects and perform a binary search within them to improve the efficiency of the search.

Another common requirement is to find the minimum or maximum value, in which case you can use the heapq module to convert the list into a heap, making the time complexity to get the minimum value O(1).

The following pages give the time complexity of operations for common Python data structures: https://wiki.python.org/moin/TimeComplexity

Posted by gammaster on Mon, 18 Oct 2021 09:47:11 -0700

Programmer Group