Detailed Python Derivation

Keywords: Python

The Python language has a unique derivative grammar, a bit like grammar sugar, that helps you write more concise and cool code on certain occasions, and it may perform better than we do with loops.It is primarily used to initialize a list, as well as collections and dictionaries.

1. Derivative classification and usage

1.1 List Derivation

List derivation is a quick way to generate lists.It is usually enclosed in'[]', for example

>>> [i for i in range(10)]
[0,1, 2, 3, 4, 5, 6, 7, 8, 9]

This is the most basic use. List derivation first performs a for loop, and then returns the traversed element (or some calculation expression for the element) as a list element.

>>> [i*i for i in range(10)]
[0,1, 4, 9, 16, 25, 36, 49, 64, 81]

It's equivalent to

>>> l = []
>>> for i in range(10):
...   l.append(i*i)
...
>>>

We can quickly initialize a two-dimensional array using list derivation

m = [[0,0,0],
     [0,0,0],
     [0,0,0]
     ]

n = []
for row in range(3):
    r = []
    for col in range(3):
        r.append(0)
    n.append(r)
print(n)

Use the following formula to get this two-dimensional array

>>> [[0]*3 for i in range(3)]
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]

There are many forms of list derivation

for loop preceded by if...else...

This produces no fewer elements, just different expressions based on the result of the for loop

# If I is a multiple of 5, the result is i, otherwise 0
>>> [i if i % 5 == 0 else 0 for i in range(20)]
[0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 10, 0, 0, 0, 0, 15, 0, 0, 0, 0]

# If it is even, add 100 and odd subtract 100
>>> [i+100 if i % 2 == 0 else i-100 for i in range(10)]
[100, -99, 102, -97, 104, -95, 106, -93, 108, -91]

for loop followed by if...

This takes only the elements that meet the criteria, so the number of elements depends on the criteria

# for loop results only select even numbers
>>> [i for i in range(10) if i % 2 == 0]
[0, 2, 4, 6, 8]
# for loops only choose multiples of 2 and 3
>>> [i for i in range(10) if i % 2 == 0 and i % 3 == 0]
[0, 6]
# The result of the for loop only selects an even number, and the str function is applied
>>> [str(i) for i in range(10) if i % 2 == 0]
['0', '2', '4', '6', '8']

Nested loop

If we expand a two-dimensional matrix, like m below, we can use nested loops.

m = [[1,2,3],
     [4,5,6],
     [7,8,9]
     ]

n = []
for row in m:
    for col in row:
        n.append(col)
print(n)

Using list inference, the row from the outermost for loop can be used inside

m = [[1,2,3],
     [4,5,6],
     [7,8,9]
     ]
n = [col for row in m for col in row]
print(n)

Another example is the following

>>> [a + b for a in '123' for b in 'abc']
['1a', '1b', '1c', '2a', '2b', '2c', '3a', '3b', '3c']

More Usages

List derivation is a flexible way to use it. We don't have to master everything, but we need to understand it.

>>> dic = {"k1":"v1","k2":"v2"}
>>> a = [k+":"+v for k,v in dic.items()]
>>> a
['k1:v1', 'k2:v2']

1.2 set derivation

The syntax of set deduction is the same as list deduction except that it uses "{}" and the set is automatically de-duplicated

>>> { i for i in range(10)}
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
>>> { 0 if i % 2 == 0 else 1 for i in range(10)}
{0, 1}

1.3 Dictionary Derivation

The syntax derived from a dictionary is similar to others, except that in the first format it is key:value, which is also de-emphasized

>>> { i : i.upper() for i in 'hello world'}
{'h': 'H', 'e': 'E', 'l': 'L', 'o': 'O', ' ': ' ', 'w': 'W', 'r': 'R', 'd': 'D'}
>>> { str(i) : i*i for i in range(10)}
{'0': 0, '1': 1, '2': 4, '3': 9, '4': 16, '5': 25, '6': 36, '7': 49, '8': 64, '9': 81}

1.4 tuple derivation?Nonexistent

Since list derivation can be done with [], can tuple derivation be done with ()?No, because () is used on a special object: a generator.

>>> a = (i for i in range(10))
>>> print(a)
<generator object <genexpr> at 0x000001A6100869C8>
>>> type(a)
<class 'generator'>

Generator is an object that generates elements sequentially, can only be accessed sequentially, can only go forward, and can only be traversed once.

The next() function can be used to take the next element, and if it is not taken, a StopIteration exception can be reported, or a for loop can be used to traverse.

Generated cannot be accessed with subscripts, next until an exception is reported

>>> a = (i for i in range(0,2))
>>> a[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'generator' object is not subscriptable
>>> next(a)
0
>>> next(a)
1
>>> next(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

Traversing with a for loop

>>> a = (i for i in range(0,2))
>>> for i in a:
...   print(i)
...
0
1

next access first, then for loop

>>> a = (i for i in range(0,3))
>>> next(a)
0
>>> for i in a:
...   print(i)
...
1
2

We can add list,tuple,set, and so on, but list and set are unnecessary. If you want to initialize a tuple, use tuple to force.You don't need extra brackets to force it.

>>> a = tuple(i for i in range(0,3))
>>> a
(0, 1, 2)
>>> a = tuple( (i for i in range(0,3)) )
>>> a
(0, 1, 2)

Generative computing is lazy, that is, you do use this element before it calculates. The advantage is that it saves memory, but the disadvantage is that it cannot be accessed randomly.

2. Performance of the Derivation

2.1 List Derivation and Cycle Performance

Let's use the timeit module to compare performance.

import timeit

def getlist1():   
    l = []   
    for i in range(10000):
        l.append(i)
    return l

def getlist2():
    return [i for i in range(10000)]

# 10,000 executions each
t1 = timeit.timeit('getlist1()',"from __main__ import getlist1", number=10000)
t2 = timeit.timeit('getlist2()',"from __main__ import getlist2", number=10000)

print('Circulation:',t1)
print('Derivation:',t2)

The results are as follows:

Circulation mode: 5.343517699991935
 Derivation method: 2.6003115000057733

Visible loops are twice as slow as derivations. Why is this a problem?Let's see the difference by decompiling directly. Using the dis module, you can decompile Python code to produce byte codes.

The source code and lines are as follows

The decompilation of getlist1 is as follows: the red on the left corresponds to the number of lines of source code, and the blue circle corresponds to the byte code of the sixth line of code. As we can see, getlist1 has a parameter and a procedure that calls the method append, which costs a lot to call the function.

Take another look at the decompiled results inferred by the following tables

First, there are fewer byte codes inferred from the list than from the loop append, and the list inference is not invoked using a method, using the directive LIST_APPEND directly, as explained on the Python official website.

In fact, this explanation is misleading, the use of LIST_APPEND in byte code is completely different from calling append in Python code, except that the underlying things are not of interest to many people, and their functions are the same.In 2008, Python code was patch ed in the hope that list.append() would be automatically optimized to LIST_APPEND instead of through function calls, but it has not yet been adopted.

Proposers want to add some options when compiling, such as -O1,-O2, which gcc can use for different levels of optimization, but CPython currently does not have these options because most Python developers are not concerned with performance.

If we replace the list above with a collection or dictionary, the difference will be greater, so using the derivation whenever possible can improve performance.

2.2 Performance of List Derivation and Generator Derivation

In fact, these two are not comparable, because the result is not one thing.We can easily predict that the derivative performance of the generator is better than the list derivation, but the generator is not as good as the list derivation.

import timeit

def getlist1():   
    return [i for i in range(10000)]

def getlist2():
    return (i for i in range(10000))

# 10,000 executions each
t1 = timeit.timeit('getlist1()',"from __main__ import getlist1", number=10000)
t2 = timeit.timeit('getlist2()',"from __main__ import getlist2", number=10000)

print('List:',t1)
print('Generator:',t2)

def getlist11():   
    a = [i for i in range(10000)]
    sum = 0
    for i in a:
        sum += i

def getlist22():
    a = (i for i in range(10000))
    sum = 0
    for i in a:
        sum += i

# 10,000 executions each
t1 = timeit.timeit('getlist11()',"from __main__ import getlist11", number=10000)
t2 = timeit.timeit('getlist22()',"from __main__ import getlist22", number=10000)

print('List:',t1)
print('Generator:',t2)

Execution results:

List: 2.5977418000111356
 Generator: 0.006076899997424334
 List: 6.336311199993361
 Generator: 9.1819036995019

Generators generate much more performance than lists, but they are not as good as lists when traversing, but generally look like generators are good.Don't forget, however, that the generator is not randomly accessible and can only be used once.So the right type of object to use in the right place is not necessarily better than either.

Posted by Fireglo on Sun, 22 Mar 2020 10:20:32 -0700

Programmer Group