Python testing: beware of the pitfalls behind simplicity

Many friends will ask why the code they write is so slow?

I remember when I was a graduate student, my tutor went to Taiwan for a visit and told me about his personal experience. He said that the algorithm he wrote would work out in half an hour. Moreover, it was on a personal computer, and the students in Taiwan couldn't work out in 2-3 days. After asking, he said he was still calculating.

The tutor said that the problem is about the stress analysis of the crack. For example, if a bullet hits the glass, will the glass crack? How is the stress around the crack distributed? After the crack occurs, can the glass continue to reach the stress bearing capacity for use?

This is a very complex calculation, so it can be said that the tutor's algorithm is very good. Of course, we do computer numerical calculation, the pursuit of performance can be said to reach the acme, in the application field, the development cost is also a very important consideration.

Our company's hydrodynamics simulation software has always been very performance oriented. At the same time, as a commercial software development company, the consideration of development cost is also very concerned. From computing performance to visualization performance, it needs time and experience to analyze.

Today we talk about performance.

About performance and development

1. What are the principles of the beauty of programming?

Beauty and ugliness are to make a distinction. Since there is a standard problem in distinguishing nature, that is, accurate measurement.

As an algorithm engineer, it is natural to complete the function first, and then the performance. However, the matter of function is usually well described, the standard is relatively clear, and the matter of performance is endless.

In general, our criterion is to take the customer experience effect of the current market software level as the standard when the function meets the customer's needs, and wait for 0.1 seconds for others to operate, and we can do it within 0.1 seconds.

2. Behind the sacrifice of performance is the freedom of organizational form and a broader talent space

In short, there are three aspects:

(1) We can organize development work more easily, so that architects and Algorithm Engineers can have more tools and options.

(2) In the lower performance requirements, the level requirements of developers will be reduced a lot, which can greatly reduce the cost of personnel.

(3) The development efficiency will be higher if the demand of shorter development cycle is easy to find and the algorithm is well organized.

Python performance test

1. Fast logical judgment or basic operation

import time
def get_time():
    return time.time()
a=1
b=2
t0 = get_time()
for i in range(100000):
    if a + b:
        pass
t1 = get_time()
for i in range(100000):
    c= a > b
t2 = get_time()
for i in range(100000):
    if a > b:
        pass
t3 = get_time()
print("add time:",t1-t0)
print(">   time:",t2-t1)
print("if  time:",t3-t2)

Operation result:

add time: 0.005983114242553711
">" time: 0.008975982666015625
if  time: 0.00628209114074707

It can be seen that the judgment statement and the basic operation speed are of an order of magnitude. If a loop takes 1 E-8 seconds, that is, 1 E-5 milliseconds. The second loop has 3e-8 seconds more because of the assignment.

2. Call python's own max and if statements to judge

code:

import time
def get_time():
    return time.time()
def max1(a,b):
    if a > b:
        return a
    else:
        return b
a=1
b=2
c=0
t0 = get_time()
for i in range(100000):
    if a > b:
        c = a
    else:
        c = b
t1 = get_time()
for i in range(100000):
    c = max(a, b)
t2 = get_time()
for i in range(100000):
    c = max1(a,b)
t3 = get_time()
print("check time:",t1-t0,"rate:","1")
print("max   time:",t2-t1,"rate",(t2-t1)/(t1-t0))
print("max1  time:",t3-t2,"rate",(t3-t2)/(t1-t0))

Here we judge that the two values compare with each other. Using the max function of python, if judgment, and if judgment into a function max1

Let's look at the results:

check time: 0.009987831115722656 rate: 1
max   time: 0.019948720932006836 rate 1.9973025876062256
max1  time: 0.014958620071411133 rate 1.497684522104459

The fastest is direct if judgment. We regard him as the standard of comparison. The max speed is 2 times and the max1 is 1.5 times. Therefore, the performance of the functions written by myself is not necessarily slower than that of the built-in functions. I said that Max is not necessary to be slow here, because Max has other judgment processing.

So, using only the things you use, just right, is the best. But here, why compare this way? Obviously, the speed of max1 is slower than that of direct judgment, which is not necessarily true. Let's take a slow look behind.

3. Class example

Today, I sent a micro headline to judge whether a point is in a rectangular area, that is, this headline let me try to see whose performance is faster. Now let's test it!

It mainly compares the performance differences between distance function judgment and direct judgment.

Using the distance defined by max and abs
Use if statement instead of max
All use if judgment

Long code:

First, define a Rect class,

class Area1(Rect):
    def __init__(self, x, y, w, h):
        super().__init__( x, y, w, h)

    @property
    def center(self):
        return self.x + self.w/2, self.y+self.h/2

    def norm(self, pos=None):
        """
        //Define a distance
        """
        wph = self.w / self.h
        if pos is not None:
            norm0 = max(abs(pos[0] - self.center[0]), wph * abs(pos[1] - self.center[1]))
            return norm0
        else:
            radius = max(abs(self.x - self.center[0]), wph * abs(self.y - self.center[1]))
            return radius

The first case we define is using the distance defined by max and abs. The class name is Aera1.

We class Area1(Rect):
    def __init__(self, x, y, w, h):
        super().__init__( x, y, w, h)

    @property
    def center(self):
        return self.x + self.w/2, self.y+self.h/2

    def norm(self, pos=None):
        """
        //Define a distance
        """
        wph = self.w / self.h
        if pos is not None:
            norm0 = max(abs(pos[0] - self.center[0]), wph * abs(pos[1] - self.center[1]))
            return norm0
        else:
            radius = max(abs(self.x - self.center[0]), wph * abs(self.y - self.center[1]))
            return radius

In the second case, replace the Aera2 class of max with if

class Area2(Rect):
    def __init__(self, x, y, w, h):
        super().__init__(x, y, w, h)

    @property
    def center(self):
        return self.x + self.w/2, self.y+self.h/2

    def norm(self,pos):
        """
        //Define a distance
        """
        if abs(pos[0] - self.center[0]) > self.wph * abs(pos[1] - self.center[1]):
            return abs(pos[0] - self.center[0])
        return self.wph * abs(pos[1] - self.center[1])

    def __contains__(self, pos):
        pos_norm = self.norm(pos)
        if pos_norm > self.radius:
            return False
        else:
            return True

Third, all use if to judge the Aera0 class.

class Area0(Rect):
    def __init__(self, x, y, w, h):
        super().__init__( x, y, w, h)

    @property
    def center(self):
        return self.x + self.w/2, self.y+self.h/2

    def __contains__(self, pos):
        if pos[0] < self.x:
            return False
        if pos[0] > self.x + self.w:
            return False
        if pos[1] < self.y:
            return False
        if pos[1] > self.y + self.h:
            return False

        return True

In terms of code organization, I prefer the class Area1. You can take a closer look at the differences between these codes. We just use some different operations to see the performance characteristics of python code.

Test code:

if __name__ == "__main__":
    area0 =Area0(10,10,10,10)
    area1 = Area1(10, 10, 10, 10)
    area2= Area2(10, 10, 10, 10)
    pos1 = (1, 2)
    def check1():
        for i in range(100000):
            if pos1 in area1:
                pass
    t0 = time.time()
    for i in range(100000):
        if pos1 in area0:
            pass
    t1 = time.time()
    for i in range(100000):
        if pos1 in area1:
            pass
    t2 = time.time()
    for i in range(100000):
        if pos1 in area2:
            pass
    t3 = time.time()
    check1()
    t4 = time.time()
    dt =t1-t0
    print("Area0:",t1-t0,"rate:","1")
    print("Area1:",t2-t1,"rate:",(t2-t1)/dt)
    print("Area2:",t3-t2,"rate:",(t3-t2)/dt)
    print("check1:",t4-t3,"rate:",(t4-t3)/dt)

Take a look at the results:

Area0: 0.017013072967529297 rate: 1
Area1: 0.23638176918029785 rate: 13.894125395891141
Area2: 0.12012887001037598 rate: 7.060974242551641
check1: 0.22240829467773438 rate: 13.072787914459486

Also take the judgment as the benchmark 1.

It is found that the judgment is indeed the fastest, and not a little faster, that is to say, I think a very good algorithm, but its performance is stupid, and it is a order of magnitude slower to make judgment.

Why? Because I use python's own function, which uses more basic operations.

In addition, it is worth mentioning that check1 is just a simple function encapsulation for area1 call, but the speed has been improved. Not very much, of course. At least, it shows that in complex problems, sealing into functions has advantages.

Conclusion and performance improvement tips

Assignment, judgment and basic operation are of the same order of magnitude. My computer is at the speed level of one millionth of a second and can be used at will.
Try to write as many functions as you need, not more. This is another contradiction.
Some operations that have nothing to do with the loop, try to put them outside the loop.
Complex process, function encapsulation does not lose performance.

The performance of python is a very complex problem. I hope this paper can give you some preliminary senses. We often think that beautiful algorithms and organizational forms are obtained at the expense of performance.

In the process of communicating with many programmers, we found that most of the performance problems depend on the language itself, so most of the complaints will be about the language. However, you should know that no powerful language can replace algorithms for things like AI.

Algorithm will bring unparalleled improvement to software, which is a programmer's self-cultivation, but also the spirit of environmental protection, because we consume computing resources.

Some people say that performance accounts for 70%, others think that performance is not important, or choose elegance, do you think?

Source network, for learning purposes only, invasion and deletion.

Don't panic. I have a set of learning materials, including 40 + E-books, 800 + teaching videos, involving Python foundation, reptile, framework, data analysis, machine learning, etc. I'm not afraid you won't learn! https://shimo.im/docs/JWCghr8prjCVCxxK/ Python learning materials

Pay attention to the official account [Python circle].

Posted by Codewarrior123 on Wed, 06 May 2020 00:11:36 -0700

Programmer Group

Python testing: beware of the pitfalls behind simplicity

About performance and development

Python performance test

Conclusion and performance improvement tips

Hot Keywords