Iterables vs. Generators in Python

Iterables vs. Generators in Python

I’ve had some people asking me lately what the difference between a Python iteratable and a Python generator is, and when to use them. This is a short write-up to show some of the differences and the benefits of use.

Iterables

Simply put, an iteratable in Python is any object that will allow you to use it in with an in operator, e.g.:

    A = [1,2,3,4,5]
    for a in A:
        print(a)

will output:

1
2
3
4
5

An iteratable is an object that will hold other items, primitives or more complex objects. Lists, tuples, sets, dicts are all examples of iterables. So are things like strings and files. We use them all the time, quite properly, in our code.

And when we use a comprehension, (hint, hint!) we are also creating an iterable, so

    B = [x for x in range(1,10000)]

creates a list of 100 values, from 0…10000.

So that’s great. but perhaps we should be concerned about things like efficiency, and speed, and memory when we are building larger applications. Let’s examine a couple of data points. Here I’m using sys.getsizeof to get a fairly decent idea of memory usage.

sys.getsizeof(A)
920
sys.getsizeof(B)
87632

not surprisingly, B > A. Reasonable given that B has 10000 items while A has 5. But think about your application and how long A or B are going to hang around. Are you using them once or many times? Are you losing available memory for a one time operation or do you need to access that iterable again and again? If you only need it once, it is costly to make the iterable and then have it hang around.

Generators

Welcome the generator that will not store the values but rather the method used to compute the values. Let’s create a generator that gives the same result as B above.

    C = (x for x in range(1,10000))

The difference in the definition is the use of () rather than the list comprehension []. What did we get? Let’s look at the differences.

type(A)
< class ‘list’>
type(B)
< class ‘list’>
type(C)
< class ‘generator’>

But try using C in your print loop and you will get the same result! Size differences?

sys.getsizeof(A)
920
sys.getsizeof(B)
87632
sys.getsizeof(C)
80

Now that’s interesting! not only is C 0.09% the size of B, both giving the same result when used for computation, but C is 8.7% the size of A, which only has 5 values compared to C’s 10000 (0.05% of 10000)!

I got the output I wanted from C with a similar iteration, namely:

    for c in C:
        print(c)

and, sure enough, I have 10000 values in output. But when I ran it a second time, I got no output! That’s because my generator was consumed when I iterated over it. They can only be used once. So there are many times when I only need to make the iterable and run over it once in flow, and so that’s fine. And I gain considerable efficiencies.

A more complex example

And these are merely simple iterables. Imagine what happens with more complex structures, e.g.

    letters = ['a', 'b', 'c', 'd', 'e']
    colors = ['red', 'yellow', 'blue', 'green']
    squares = [x*x for x in range(1,10)]

    newlist = [(l, c, s) for l in letters for c in colors for s in squares]
    newgen = ((l, c, s) for l in letters for c in colors for s in squares)

Both newgen and newlist produce objects that can yield a 180 element group of tuples, e.g.:

(‘a’, ‘red’, 1)
(‘a’, ‘red’, 4)
(‘a’, ‘red’, 9)
(‘a’, ‘red’, 16)
(‘a’, ‘red’, 25)
(‘a’, ‘red’, 36)
(‘a’, ‘red’, 49)
(‘a’, ‘red’, 64)
(‘a’, ‘red’, 81)
(‘a’, ‘yellow’, 1)
(‘a’, ‘yellow’, 4)
(‘a’, ‘yellow’, 9)
(‘a’, ‘yellow’, 16)
(‘a’, ‘yellow’, 25)
(‘a’, ‘yellow’, 36)
(‘a’, ‘yellow’, 49)
(‘a’, ‘yellow’, 64)
(‘a’, ‘yellow’, 81)
(‘a’, ‘blue’, 1)
(‘a’, ‘blue’, 4)
(‘a’, ‘blue’, 9)
(‘a’, ‘blue’, 16)
(‘a’, ‘blue’, 25)
(‘a’, ‘blue’, 36)
(‘a’, ‘blue’, 49)
(‘a’, ‘blue’, 64)
(‘a’, ‘blue’, 81)
(‘a’, ‘green’, 1)
(‘a’, ‘green’, 4)
(‘a’, ‘green’, 9)
(‘a’, ‘green’, 16)
(‘a’, ‘green’, 25)
(‘a’, ‘green’, 36)
(‘a’, ‘green’, 49)
(‘a’, ‘green’, 64)
(‘a’, ‘green’, 81)
(‘b’, ‘red’, 1)
(‘b’, ‘red’, 4)
(‘b’, ‘red’, 9)
(‘b’, ‘red’, 16)
(‘b’, ‘red’, 25)
(‘b’, ‘red’, 36)
(‘b’, ‘red’, 49)
(‘b’, ‘red’, 64)
(‘b’, ‘red’, 81)
(‘b’, ‘yellow’, 1)
(‘b’, ‘yellow’, 4)
(‘b’, ‘yellow’, 9)
(‘b’, ‘yellow’, 16)
(‘b’, ‘yellow’, 25)
(‘b’, ‘yellow’, 36)
(‘b’, ‘yellow’, 49)
(‘b’, ‘yellow’, 64)
(‘b’, ‘yellow’, 81)
(‘b’, ‘blue’, 1)
(‘b’, ‘blue’, 4)
(‘b’, ‘blue’, 9)
(‘b’, ‘blue’, 16)
(‘b’, ‘blue’, 25)
(‘b’, ‘blue’, 36)
(‘b’, ‘blue’, 49)
(‘b’, ‘blue’, 64)
(‘b’, ‘blue’, 81)
(‘b’, ‘green’, 1)
(‘b’, ‘green’, 4)
(‘b’, ‘green’, 9)
(‘b’, ‘green’, 16)
(‘b’, ‘green’, 25)
(‘b’, ‘green’, 36)
(‘b’, ‘green’, 49)
(‘b’, ‘green’, 64)
(‘b’, ‘green’, 81)
(‘c’, ‘red’, 1)
(‘c’, ‘red’, 4)
(‘c’, ‘red’, 9)
(‘c’, ‘red’, 16)
(‘c’, ‘red’, 25)
(‘c’, ‘red’, 36)
(‘c’, ‘red’, 49)
(‘c’, ‘red’, 64)
(‘c’, ‘red’, 81)
(‘c’, ‘yellow’, 1)
(‘c’, ‘yellow’, 4)
(‘c’, ‘yellow’, 9)
(‘c’, ‘yellow’, 16)
(‘c’, ‘yellow’, 25)
(‘c’, ‘yellow’, 36)
(‘c’, ‘yellow’, 49)
(‘c’, ‘yellow’, 64)
(‘c’, ‘yellow’, 81)
(‘c’, ‘blue’, 1)
(‘c’, ‘blue’, 4)
(‘c’, ‘blue’, 9)
(‘c’, ‘blue’, 16)
(‘c’, ‘blue’, 25)
(‘c’, ‘blue’, 36)
(‘c’, ‘blue’, 49)
(‘c’, ‘blue’, 64)
(‘c’, ‘blue’, 81)
(‘c’, ‘green’, 1)
(‘c’, ‘green’, 4)
(‘c’, ‘green’, 9)
(‘c’, ‘green’, 16)
(‘c’, ‘green’, 25)
(‘c’, ‘green’, 36)
(‘c’, ‘green’, 49)
(‘c’, ‘green’, 64)
(‘c’, ‘green’, 81)
(‘d’, ‘red’, 1)
(‘d’, ‘red’, 4)
(‘d’, ‘red’, 9)
(‘d’, ‘red’, 16)
(‘d’, ‘red’, 25)
(‘d’, ‘red’, 36)
(‘d’, ‘red’, 49)
(‘d’, ‘red’, 64)
(‘d’, ‘red’, 81)
(‘d’, ‘yellow’, 1)
(‘d’, ‘yellow’, 4)
(‘d’, ‘yellow’, 9)
(‘d’, ‘yellow’, 16)
(‘d’, ‘yellow’, 25)
(‘d’, ‘yellow’, 36)
(‘d’, ‘yellow’, 49)
(‘d’, ‘yellow’, 64)
(‘d’, ‘yellow’, 81)
(‘d’, ‘blue’, 1)
(‘d’, ‘blue’, 4)
(‘d’, ‘blue’, 9)
(‘d’, ‘blue’, 16)
(‘d’, ‘blue’, 25)
(‘d’, ‘blue’, 36)
(‘d’, ‘blue’, 49)
(‘d’, ‘blue’, 64)
(‘d’, ‘blue’, 81)
(‘d’, ‘green’, 1)
(‘d’, ‘green’, 4)
(‘d’, ‘green’, 9)
(‘d’, ‘green’, 16)
(‘d’, ‘green’, 25)
(‘d’, ‘green’, 36)
(‘d’, ‘green’, 49)
(‘d’, ‘green’, 64)
(‘d’, ‘green’, 81)
(‘e’, ‘red’, 1)
(‘e’, ‘red’, 4)
(‘e’, ‘red’, 9)
(‘e’, ‘red’, 16)
(‘e’, ‘red’, 25)
(‘e’, ‘red’, 36)
(‘e’, ‘red’, 49)
(‘e’, ‘red’, 64)
(‘e’, ‘red’, 81)
(‘e’, ‘yellow’, 1)
(‘e’, ‘yellow’, 4)
(‘e’, ‘yellow’, 9)
(‘e’, ‘yellow’, 16)
(‘e’, ‘yellow’, 25)
(‘e’, ‘yellow’, 36)
(‘e’, ‘yellow’, 49)
(‘e’, ‘yellow’, 64)
(‘e’, ‘yellow’, 81)
(‘e’, ‘blue’, 1)
(‘e’, ‘blue’, 4)
(‘e’, ‘blue’, 9)
(‘e’, ‘blue’, 16)
(‘e’, ‘blue’, 25)
(‘e’, ‘blue’, 36)
(‘e’, ‘blue’, 49)
(‘e’, ‘blue’, 64)
(‘e’, ‘blue’, 81)
(‘e’, ‘green’, 1)
(‘e’, ‘green’, 4)
(‘e’, ‘green’, 9)
(‘e’, ‘green’, 16)
(‘e’, ‘green’, 25)
(‘e’, ‘green’, 36)
(‘e’, ‘green’, 49)
(‘e’, ‘green’, 64)
(‘e’, ‘green’, 81)

Yet look at the type and size differences:

< class ‘list’> 1680
< class ‘generator’> 80

newlist is 21 times larger than newgen. So, as you write iterable obejcts, especially iterables that are being used for another computation, consider generators!

twitter
twitter

Leave a Reply

Your email address will not be published. Required fields are marked *