Python - iterators and generators

Keywords: Python Programming

Wedge

If I have a list l=['a','b','c','d','e '], how can I get the contents of the list?

First of all, I can use index value l[0]. Second, can we still use for loop to get value?

Have you ever thought about it carefully? There is a subtle difference between index value and for loop value.

If you use the index value, you can get the value at any position, provided you know where the value is.

If we use the for loop to get values, we can get each value. We don't need to care about the position of each value, because we can only get values in sequence, and we can't skip any value to get other values directly.

But have you ever thought about why we can use the for loop to get values?

How does the for loop work internally?

iterator

for loop in python

To understand what's going on with the for loop in python, let's start from a code point of view.

First, we loop for a list.

for i in [1,2,3,4]:  
    print(i)

The above code is certainly no problem, but let's try to loop a number 1234 in another case

for i in 1234
    print(i) 

//Result:
Traceback (most recent call last):
  File "test.py", line 4, in <module>
    for i in 1234:
TypeError: 'int' object is not iterable

Look, it's wrong! What's wrong with the report? "TypeError: 'int' object is not iterable", which means int type is not an iterable. What is this iterable?

If you don't know what is iterable, we can turn to the dictionary and get a Chinese explanation first. Although you may not know it after translation, it doesn't matter. I will take you step by step to analyze it.

Iterative and iterative protocols

What is iteration

Now, we've got a new clue, a concept called "iterative.".

First of all, we analyze the error report. It seems that the reason why 1234 can't be for loop is that it can't be iterated. So if it's "iterative," it should be able to be looped for.

We know that strings, lists, tuples, dictionaries, and sets can all be for loops, indicating that they are all iterative.

How can we prove that?

from collections import Iterable
                             
l = [1,2,3,4]                
t = (1,2,3,4)                
d = {1:2,3:4}                
s = {1,2,3,4}                
                             
print(isinstance(l,Iterable))
print(isinstance(t,Iterable))
print(isinstance(d,Iterable))
print(isinstance(s,Iterable))

In combination with the phenomenon that we use for loop value, and then understand it literally, in fact, iteration is what we just said, which is called iteration, that is, we can extract the data in a dataset "one by one".

Iterative protocol

We are now analyzing the reasons from the results. What can be looped by for is "iterative". But if we are thinking about it, how can for know who is iterative?

If we write a data type ourselves, and hope that things in this data type can also be retrieved one by one using for, then we must meet the requirements of for. This requirement is called "agreement".

The requirements that can be satisfied by iteration are called iterative protocols. The definition of iterative protocol is very simple, that is, it implements the "iter" method internally.

Next, let's verify:

print(dir([1,2]))
print(dir((2,3)))
print(dir({1:2}))
print(dir({1,2}))
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'count', 'index']
['__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'items', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values']
['__and__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__iand__', '__init__', '__ior__', '__isub__', '__iter__', '__ixor__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__or__', '__rand__', '__reduce__', '__reduce_ex__', '__repr__', '__ror__', '__rsub__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__xor__', 'add', 'clear', 'copy', 'difference', 'difference_update', 'discard', 'intersection', 'intersection_update', 'isdisjoint', 'issubset', 'issuperset', 'pop', 'remove', 'symmetric_difference', 'symmetric_difference_update', 'union', 'update']
Result

To sum up what we know now: what can be looped by for is all iterative. To be iterative, there must be an "iter" method inside.

Then, what does the "iter" method do?

print([1,2].__iter__())

//Result
<list_iterator object at 0x1024784a8>

After executing the "iterator" method of list([1,2]), we seem to get a list iterator. Now we get a new term - iterator.

Iterator, marked for us here, is a special term in computer, called iterator.

iterator protocol

What is "iteratable" and what is "iterator"?

Although we don't know what an iterator is, now we have an iterator, which is a list iterator.

Let's take a look at the new ways that the iterator of this list implements compared with the list, so that we can uncover the mystery of the iterator, right?

'''
dir([1,2]. "Dir" is all the methods implemented in the list iterator, and dir([1,2]) is all the methods implemented in the list, which are returned to us in the form of a list. In order to see more clearly, we transform them into sets respectively,
Then take the difference set. ''' #print(dir([1,2].__iter__())) #print(dir([1,2])) print(set(dir([1,2].__iter__()))-set(dir([1,2]))) Result: {'__length_hint__', '__next__', '__setstate__'}

We see three more methods in the list iterator, so what do the three methods do respectively?

iter_l = [1,2,3,4,5,6].__iter__()
#Gets the length of an element in an iterator
print(iter_l.__length_hint__())
#Specify where to start iterations based on index values
print('*',iter_l.__setstate__(4))
#Value by value
print('**',iter_l.__next__())
print('***',iter_l.__next__())

Among these three methods, who is the magic method that can let us take value one by one?

You're right! It's "next"__

In the for loop, you call the "next" method internally to get a value.

Next, we use the next method of iterator to write a for independent traversal.

l = [1,2,3,4]
l_iter = l.__iter__()
item = l_iter.__next__()
print(item)
item = l_iter.__next__()
print(item)
item = l_iter.__next__()
print(item)
item = l_iter.__next__()
print(item)
item = l_iter.__next__()
print(item)

This is a piece of code that will report an error. If we keep fetching next to the iterator and there are no elements, we will throw an exception StopIteration to tell us that there are no valid elements in the list.

At this time, we will use the exception handling mechanism to handle the exception.

l = [1,2,3,4]
l_iter = l.__iter__()
while True:
    try:
        item = l_iter.__next__()
        print(item)
    except StopIteration:
        break

Now we use the while loop to implement the original for loop. Who do we get a value from? Is it l'iter? All right, this l'iter is an iterator.

The iterator follows the iterator protocol: you must have an iterator method and a next method.

Repayment: next and iter methods

In this way, we have paid off two methods of iterator and generator. Finally, let's see what range() is. First, it must be an iterative object, but is it an iterator? Let's test it

print('__next__' in dir(range(12)))  #See'__next__'Is it in? range()Whether there is internal after method execution__next__
print('__iter__' in dir(range(12)))  #See'__next__'Is it in? range()Whether there is internal after method execution__next__

from collections import Iterator
print(isinstance(range(100000000),Iterator))  #Verification range The result after execution is not an iterator

# The return value of range is an iterative object

Why is there a for loop

Based on the list traversal methods mentioned above, smart you can see the clue immediately, so you shout out, don't you amuse me? With the subscript access method, I can traverse a list like this

l=[1,2,3]

index=0
while index < len(l):
    print(l[index])
    index+=1

#Woolen yarn for Loop, want wool to be iterative, want wool iterator

Yes, sequence type string, list and tuple all have subscripts. You can access them in the above way, perfect! But you may have thought about the feelings of non sequence types such as dictionaries, collections, and file objects. So, young man, for loop is based on the iterator protocol, which provides a unified method for traversing all objects. Before traversing, first call the object's ﹐ iter ﹐ method to convert it into an iterator, and then use the iterator protocol to achieve circular access. So Some objects can be traversed by the for loop, and the effect you see is exactly the same. This is the omnipotent for loop. Wake up, young man

generator

Preliminary knowledge generator

There are two kinds of iterators we know: one is directly returned by calling methods, and the other is obtained by executing iter methods on iteratable objects. The advantages of iterators are that they can save memory.

If in some cases, we also need to save memory, we can only write by ourselves. What we have written to implement the function of iterator is called generator.

Generators available in Python:

1. Generator function: general function definition, but use yield statement instead of return statement to return results. The yield statement returns one result at a time. In the middle of each result, suspend the state of the function so that it can continue to execute the next time it leaves

2. Generator expression: similar to list derivation, however, the generator returns an object that produces results on demand instead of building a result list one at a time

Generator:

Nature: iterators (so we don't need to implement the "iterator" method and "next" method)

Features: inert operation, customized by developers

generator function

A function that contains the yield keyword is a generator function. Yield can return value from function for us, but yield is different from return. The execution of return means the end of program. Calling generator function will not get the specific value returned, but an iterative object. Each time we get the value of the iteratable object, we can push the function to execute and get the new return value. Until the end of function execution.

# Preliminary knowledge generator 1
import time def genrator_fun1(): a = 1 print('Now it's defined a variable') yield a b = 2 print('Now it's defined again b variable') yield b g1 = genrator_fun1() print('g1 : ',g1) #Printing g1 Can be found g1 It's a generator print('-'*20) #I'm a gorgeous dividing line print(next(g1)) time.sleep(1) #sleep See the execution process in one second print(next(g1))

What are the benefits of generators? It doesn't generate too much data in memory at once

If I want the factory to make school uniforms for students and produce 2000000 pieces of clothes, I will tell the factory that the factory should promise to come down first and then produce them. I can ask for them one by one or look for them according to the batch of students.
We can't just say that if we want to produce 2000000 pieces of clothes, the factory will first produce 2000000 pieces of clothes. When we get back to work, all the students will graduate...

#Preliminary knowledge generator II

def produce():
    """Production of clothing"""
    for i in range(2000000):
        yield "Production No.%s Clothes"%i

product_g = produce()
print(product_g.__next__()) #Want a dress
print(product_g.__next__()) #One more dress
print(product_g.__next__()) #One more dress
num = 0
for i in product_g:         #A batch of clothes, such as five
    print(i)
    num +=1
    if num == 5:
        break

#Here we go to the factory and get 8 pieces of clothes. I have my production function(that is produce generator function )Produce 2000000 pieces of clothing.
#There are still a lot of clothes left. We can take them all the time, or we can wait until we want to take them

More applications

Example of generator listening for file input

import time

def tail(filename):
    f = open(filename)
    f.seek(0, 2) #From the end of the file
    while True:
        line = f.readline()  # Read new lines of text in the file
        if not line:
            time.sleep(0.1)
            continue
        yield line

tail_g = tail('tmp')
for line in tail_g:
    print(line)

send

def generator():
    print(123)
    content = yield 1
    print('=======',content)
    print(456)
    yield2

g = generator()
ret = g.__next__()
print('***',ret)
ret = g.send('hello')   #send The effect and next equally
print('***',ret)

#send Get the effect and next Basically consistent
#Just give the previous one when getting the next value yield The location of passes a data
#Use send Precautions for
    # When I first used the generator, I used next Get next value
    # The last one yield Cannot accept external value
def averager():
    total = 0.0
    count = 0
    average = None
    while True:
        term = yield average
        total += term
        count += 1
        average = total/count


g_avg = averager()
next(g_avg)
print(g_avg.send(10))
print(g_avg.send(30))
print(g_avg.send(5))
Calculate moving average (1)
A decorator for calculating the moving average (2) ﹤ pre excitation process

yield from

def gen1():
    for c in 'AB':
        yield c
    for i in range(3):
        yield i

print(list(gen1()))

def gen2():
    yield from 'AB'
    yield from range(3)

print(list(gen2()))

List derivation and generator expression

#The old boy is soon on the way to the market due to the strong joining of Feng Ge,alex Think about it and decide to lay some eggs to repay brother Feng

egg_list=['Egg%s' %i for i in range(10)] #List parsing

#Feng brother looked at him. alex A basket of eggs,Cover your nose,Say a sentence:Brother,You'd better give me a hen,I'll go home myself

laomuji=('Egg%s' %i for i in range(10))#Generator Expressions 
print(laomuji)
print(next(laomuji)) #next The essence is to call__next__
print(laomuji.__next__())
print(next(laomuji))

Conclusion:

1. Replace the [] parsed in the list with () to get the generator expression

2. List parsing and generator expression are both convenient programming methods, but generator expression saves more memory

3.Python not only uses the iterator protocol, but also makes the for loop more general. Most built-in functions also use the iterator protocol to access objects. For example, the sum function is a built-in function of Python. It uses the iterator protocol to access objects, while the generator implements the iterator protocol. Therefore, we can directly calculate the sum of a series of values as follows:

sum(x ** 2 for x in range(4))

Instead of building a list:

sum([x ** 2 for x in range(4)]) 

 

For more details, please refer to the topic of iterator generator: http://www.cnblogs.com/Eva-J/articles/7276796.html

Summary of this chapter

Iteratable objects:

Have a method

Features: lazy operation

For example: range(),str,list,tuple,dict,set

Iterator:

Have 'iter' method and 'next' method

For example: ITER (range()), ITER (STR), ITER (list), ITER (tuple), ITER (dict), ITER (set), reversed (list UO), map (func, list UO), filter (func, list UO), file uo

Generator:

Nature: iterator, so it has "iter" method and "next" method

Features: lazy operation, customized by developers

Advantages of using generators:

1. Delay calculation and return one result at a time. In other words, it will not generate all the results at once, which will be very useful for large data processing.

#List parsing
sum([i for i in range(100000000)])#Large memory usage,The machine is easy to get stuck
 
#Generator Expressions 
sum(i for i in range(100000000))#Almost no memory

2. Improve code readability

Interview questions related to generator

Generators play a lot of roles in programming. Making good use of generators can help us solve many complex problems

In addition, the generator is also the focus of the interview questions. In addition to completing some functions, people also come up with many magical interview questions.
Next, let's take a look

def demo():
    for i in range(4):
        yield i

g=demo()

g1=(i for i in g)
g2=(i for i in g1)

print(list(g1))
print(list(g2))
def add(n,i):
    return n+i

def test():
    for i in range(4):
        yield i

g=test()
for n in [1,10]:
    g=(add(n,i) for i in g)

print(list(g))
import os

def init(func):
    def wrapper(*args,**kwargs):
        g=func(*args,**kwargs)
        next(g)
        return g
    return wrapper

@init
def list_files(target):
    while 1:
        dir_to_search=yield
        for top_dir,dir,files in os.walk(dir_to_search):
            for file in files:
                target.send(os.path.join(top_dir,file))
@init
def opener(target):
    while 1:
        file=yield
        fn=open(file)
        target.send((file,fn))
@init
def cat(target):
    while 1:
        file,fn=yield
        for line in fn:
            target.send((file,line))

@init
def grep(pattern,target):
    while 1:
        file,line=yield
        if pattern in line:
            target.send(file)
@init
def printer():
    while 1:
        file=yield
        if file:
            print(file)

g=list_files(opener(cat(grep('python',printer()))))

g.send('/test1')

//Coroutine application: grep -rl /dir

Posted by talor123 on Sat, 15 Feb 2020 12:00:24 -0800