Understanding the concepts of Python Iterative, Iterator and Generator

Keywords: Python github Android Java

About me
A thoughtful programmer ape, a lifelong learning practitioner, currently works as a team lead er in an entrepreneurship team. The technology stack involves Android, Python, Java and Go, which is also the main technology stack of our team.
Github: https://github.com/hylinux1024
Wechat Public Number: Angrycode

Iterable, Iterator and Generator are common concepts in Python. These concepts are often confused at first. Now is the time to clarify these concepts.

0x00 Iterable

Simply put, an object (everything in Python is an object) as long as the _iter_() method is implemented, then the Iterable object is checked with the isinstance() function.

for example

class IterObj:
    
    def __iter__(self):
        # Here we simply go back to ourselves.
        # But that may not be the case.
        # It's implemented through built-in iterative objects
        # The following column will show you
        return self 

It defines a class IterObj and implements the _iter_() method, which is an Iterable object.

    it = IterObj()
    print(isinstance(it, Iterable))  # true
    print(isinstance(it, Iterator))  # false
    print(isinstance(it, Generator)) # false

Keep this class in mind, and we'll see the definition of this class later.

Common Iterable Objects

What are the common iterative objects in Python?

  1. Collection or sequence types (such as list, tuple, set, dict, str)
  2. File object
  3. The object of the _iter_() method defined in the class can be considered as an Iterable object, but in order for the user-defined iterative object to be used correctly in the for loop, it is necessary to ensure that the _iter_() implementation must be correct (that is, it can be converted into an Iterator object through the built-in iter() function. As you'll see below about Iterator, there's a pit left, just remember that iter() functions are capable of converting an iterator object into an iterator object, and then use it for)
  4. In the class, if only _getitem_() is implemented, the object can be transformed into an iterator through the iter() function, but it is not an iteratable object itself. So when an object can run in a for loop, it's not necessarily an Iterable object.

With regard to points 1 and 2, we can verify them by the following.

    print(isinstance([], Iterable))  # true list is iterative
    print(isinstance({}, Iterable))  # true dictionary is iterative
    print(isinstance((), Iterable))  # true tuples are iterative
    print(isinstance(set(), Iterable))  # true set is iterative
    print(isinstance('', Iterable))  # true strings are iterative
    
    currPath = os.path.dirname(os.path.abspath(__file__))
    with open(currPath+'/model.py') as file:
        print(isinstance(file, Iterable)) # true

Let's look at point 3 again.

    print(hasattr([], "__iter__")) # true
    print(hasattr({}, "__iter__")) # true
    print(hasattr((), "__iter__")) # true
    print(hasattr('', "__iter__")) # true

These built-in sets or sequence objects all have _iter_ attributes, that is, they all implement the same name method. But if this iteratable object is to be used in a for loop, it should be able to be called by the built-in iter() function and converted into an Iterator object.
For example, let's look at built-in iterative objects

    print(iter([])) # <list_iterator object at 0x110243f28>
    print(iter({})) # <dict_keyiterator object at 0x110234408>
    print(iter(())) # <tuple_iterator object at 0x110243f28>
    print(iter('')) # <str_iterator object at 0x110243f28>

They are all converted into corresponding Iterator objects.
Now look back at the IterObj class that was first defined

class IterObj:
    
    def __iter__(self):
        return self 
        
it = IterObj()
print(iter(it))

We use the iter() function, which prints out the following information on the console:

Traceback (most recent call last):
  File "/Users/mac/PycharmProjects/iterable_iterator_generator.py", line 71, in <module>
    print(iter(it))
TypeError: iter() returned non-iterator of type 'IterObj'

A type error occurred, meaning that the iter() function cannot convert a'non-iterator'type to an iterator.

So how can I turn an Iterable object into an Iterator object?
Let's modify the definition of the IterObj class

class IterObj:

    def __init__(self):
        self.a = [3, 5, 7, 11, 13, 17, 19]

    def __iter__(self):
        return iter(self.a)

We define a list named a in the construction method, and then implement the _iter_() method.

Modified classes can be called by iter() functions, that is, they can also be used in for loops.

    it = IterObj()
    print(isinstance(it, Iterable)) # true
    print(isinstance(it, Iterator)) # false
    print(isinstance(it, Generator)) # false
    print(iter(it)) # <list_iterator object at 0x102007278>
    for i in it:
        print(i) # Print 3, 5, 7, 11, 13, 17, 19 elements

Therefore, when defining an Iterable object, we should pay great attention to the internal implementation logic of the _iter_() method. Generally, it is assisted by some known Iterable objects (e.g., set, sequence, file, etc., or other correctly defined Iterable objects mentioned above).

The meaning of point 4 is that the iter() function can convert an object that implements the getitem () method into an iterator object or can be used in a for loop, but it is not an iteratable object when detected by the isinstance() method.

class IterObj:
    
    def __init__(self):
        self.a = [3, 5, 7, 11, 13, 17, 19]
    
    def __getitem__(self, i):
        return self.a[i]
        
it = IterObj()
print(isinstance(it, Iterable)) # false
print(isinstance(it, Iterator)) # false
print(isinstance(it, Generator)) false
print(hasattr(it, "__iter__")) # false
print(iter(it)) # <iterator object at 0x10b231278>

for i in it:
    print(i) # Print out 3, 5, 7, 11, 13, 17, 19

This example illustrates that objects that can be used in for are not necessarily iterative objects.

Now let's make a summary:

  1. An iterative object is an object that implements the _iter_() method.
  2. To use it in the for loop, it must satisfy the call of iter() (that is, calling this function is error-free and can be correctly converted into an Iterator object)
  3. We can use known iteratable objects to assist in the implementation of our custom iteratable objects.
  4. An object implements the _getitem_() method, which can be converted to Iterator by iter() function, that is, it can be used in for loop, but it is not an iterative object (it can be detected by isinstance method).

0x01 Iterator

Iterator has been mentioned in many places above. Now let's fill the pit.
When we understand the concept of iteration, we have a better understanding of iterators.
An object implements _iter_() and _next_() methods, so it is an iterator object. for example

class IterObj:

    def __init__(self):
        self.a = [3, 5, 7, 11, 13, 17, 19]

        self.n = len(self.a)
        self.i = 0

    def __iter__(self):
        return iter(self.a)

    def __next__(self):
        while self.i < self.n:
            v = self.a[self.i]
            self.i += 1
            return v
        else:
            self.i = 0
            raise StopIteration()

In IterObj, the constructor defines a list a, list length n, index i.

    it = IterObj()
    print(isinstance(it, Iterable)) # true
    print(isinstance(it, Iterator)) # true
    print(isinstance(it, Generator)) # false
    print(hasattr(it, "__iter__")) # true
    print(hasattr(it, "__next__")) # true

We can find the above mentioned.
Sets and sequence objects are iterative but not iterators

    print(isinstance([], Iterator)) # false
    print(isinstance({}, Iterator)) # false
    print(isinstance((), Iterator)) # false
    print(isinstance(set(), Iterator)) # false
    print(isinstance('', Iterator)) # false

The file object is an iterator

    currPath = os.path.dirname(os.path.abspath(__file__))
    with open(currPath+'/model.py') as file:
        print(isinstance(file, Iterator)) # true

An Iterator object can be used not only in a for loop, but also through the built-in function next(). for example

it = IterObj()
next(it) # 3
next(it) # 5

0x02 Generator

Now let's see what a generator is.
A generator is an iterator as well as an iterator

There are two ways to define a generator:

  1. List Generator
  2. Define generator functions with yield

Look at the first case first.

    g = (x * 2 for x in range(10)) # Even Generator of 0-18 
    print(isinstance(g, Iterable)) # true
    print(isinstance(g, Iterator)) # true
    print(isinstance(g, Generator)) # true
    print(hasattr(g, "__iter__")) # true
    print(hasattr(g, "__next__")) # true
    print(next(g)) # 0
    print(next(g)) # 2

List generators can generate a huge list without consuming a lot of memory, and only compute when data is needed.
Look at the second scenario.

def gen():
    for i in range(10):
        yield i 

Here the function of yield is equivalent to return. This function is to return the natural numbers between [0,10] in sequence, which can be traversed by next() or by using a for loop.
When the program encounters the yield keyword, the generator function returns. Until the next() function is executed again, it will continue to execute from the execution point returned by the last function. That is, when the yield exits, it saves the location, variables and other information of the function's execution, and when it executes again, it will continue to persist from the place where the yield exits. That's ok.
In Python, these features of the generator can be used to realize the process. A coroutine can be understood as a lightweight thread, which has many advantages over threads in dealing with high concurrency scenarios.

Look at the following producer-consumer model implemented with a collaborative process

def producer(c):
    n = 0
    while n < 5:
        n += 1
        print('producer {}'.format(n))
        r = c.send(n)
        print('consumer return {}'.format(r))


def consumer():
    r = ''
    while True:
        n = yield r
        if not n:
            return
        print('consumer {} '.format(n))
        r = 'ok'


if __name__ == '__main__':
    c = consumer()
    next(c)  # Start consumer
    producer(c)

This code executes as follows

producer 1
consumer 1 
producer return ok
producer 2
consumer 2 
producer return ok
producer 3
consumer 3 
producer return ok

The protocol achieves the effect of concurrency by switching the CPU between two functions.

0x04 reference

  1. https://docs.python.org/3.7/

Posted by amithn12 on Tue, 20 Aug 2019 20:01:06 -0700