Tag Archives: design

Iterables vs. Generators in Python

Iterables vs. Generators in Python

I’ve had some people asking me lately what the difference between a Python iteratable and a Python generator is, and when to use them. This is a short write-up to show some of the differences and the benefits of use.

Iterables

Simply put, an iteratable in Python is any object that will allow you to use it in with an in operator, e.g.:

    A = [1,2,3,4,5]
    for a in A:
        print(a)

will output:

1
2
3
4
5

An iteratable is an object that will hold other items, primitives or more complex objects. Lists, tuples, sets, dicts are all examples of iterables. So are things like strings and files. We use them all the time, quite properly, in our code.

And when we use a comprehension, (hint, hint!) we are also creating an iterable, so

    B = [x for x in range(1,10000)]

creates a list of 100 values, from 0…10000.

So that’s great. but perhaps we should be concerned about things like efficiency, and speed, and memory when we are building larger applications. Let’s examine a couple of data points. Here I’m using sys.getsizeof to get a fairly decent idea of memory usage.

sys.getsizeof(A)
920
sys.getsizeof(B)
87632

not surprisingly, B > A. Reasonable given that B has 10000 items while A has 5. But think about your application and how long A or B are going to hang around. Are you using them once or many times? Are you losing available memory for a one time operation or do you need to access that iterable again and again? If you only need it once, it is costly to make the iterable and then have it hang around.

Generators

Welcome the generator that will not store the values but rather the method used to compute the values. Let’s create a generator that gives the same result as B above.

    C = (x for x in range(1,10000))

The difference in the definition is the use of () rather than the list comprehension []. What did we get? Let’s look at the differences.

type(A)
< class ‘list’>
type(B)
< class ‘list’>
type(C)
< class ‘generator’>

But try using C in your print loop and you will get the same result! Size differences?

sys.getsizeof(A)
920
sys.getsizeof(B)
87632
sys.getsizeof(C)
80

Now that’s interesting! not only is C 0.09% the size of B, both giving the same result when used for computation, but C is 8.7% the size of A, which only has 5 values compared to C’s 10000 (0.05% of 10000)!

I got the output I wanted from C with a similar iteration, namely:

    for c in C:
        print(c)

and, sure enough, I have 10000 values in output. But when I ran it a second time, I got no output! That’s because my generator was consumed when I iterated over it. They can only be used once. So there are many times when I only need to make the iterable and run over it once in flow, and so that’s fine. And I gain considerable efficiencies.

A more complex example

And these are merely simple iterables. Imagine what happens with more complex structures, e.g.

    letters = ['a', 'b', 'c', 'd', 'e']
    colors = ['red', 'yellow', 'blue', 'green']
    squares = [x*x for x in range(1,10)]

    newlist = [(l, c, s) for l in letters for c in colors for s in squares]
    newgen = ((l, c, s) for l in letters for c in colors for s in squares)

Both newgen and newlist produce objects that can yield a 180 element group of tuples, e.g.:

(‘a’, ‘red’, 1)
(‘a’, ‘red’, 4)
(‘a’, ‘red’, 9)
(‘a’, ‘red’, 16)
(‘a’, ‘red’, 25)
(‘a’, ‘red’, 36)
(‘a’, ‘red’, 49)
(‘a’, ‘red’, 64)
(‘a’, ‘red’, 81)
(‘a’, ‘yellow’, 1)
(‘a’, ‘yellow’, 4)
(‘a’, ‘yellow’, 9)
(‘a’, ‘yellow’, 16)
(‘a’, ‘yellow’, 25)
(‘a’, ‘yellow’, 36)
(‘a’, ‘yellow’, 49)
(‘a’, ‘yellow’, 64)
(‘a’, ‘yellow’, 81)
(‘a’, ‘blue’, 1)
(‘a’, ‘blue’, 4)
(‘a’, ‘blue’, 9)
(‘a’, ‘blue’, 16)
(‘a’, ‘blue’, 25)
(‘a’, ‘blue’, 36)
(‘a’, ‘blue’, 49)
(‘a’, ‘blue’, 64)
(‘a’, ‘blue’, 81)
(‘a’, ‘green’, 1)
(‘a’, ‘green’, 4)
(‘a’, ‘green’, 9)
(‘a’, ‘green’, 16)
(‘a’, ‘green’, 25)
(‘a’, ‘green’, 36)
(‘a’, ‘green’, 49)
(‘a’, ‘green’, 64)
(‘a’, ‘green’, 81)
(‘b’, ‘red’, 1)
(‘b’, ‘red’, 4)
(‘b’, ‘red’, 9)
(‘b’, ‘red’, 16)
(‘b’, ‘red’, 25)
(‘b’, ‘red’, 36)
(‘b’, ‘red’, 49)
(‘b’, ‘red’, 64)
(‘b’, ‘red’, 81)
(‘b’, ‘yellow’, 1)
(‘b’, ‘yellow’, 4)
(‘b’, ‘yellow’, 9)
(‘b’, ‘yellow’, 16)
(‘b’, ‘yellow’, 25)
(‘b’, ‘yellow’, 36)
(‘b’, ‘yellow’, 49)
(‘b’, ‘yellow’, 64)
(‘b’, ‘yellow’, 81)
(‘b’, ‘blue’, 1)
(‘b’, ‘blue’, 4)
(‘b’, ‘blue’, 9)
(‘b’, ‘blue’, 16)
(‘b’, ‘blue’, 25)
(‘b’, ‘blue’, 36)
(‘b’, ‘blue’, 49)
(‘b’, ‘blue’, 64)
(‘b’, ‘blue’, 81)
(‘b’, ‘green’, 1)
(‘b’, ‘green’, 4)
(‘b’, ‘green’, 9)
(‘b’, ‘green’, 16)
(‘b’, ‘green’, 25)
(‘b’, ‘green’, 36)
(‘b’, ‘green’, 49)
(‘b’, ‘green’, 64)
(‘b’, ‘green’, 81)
(‘c’, ‘red’, 1)
(‘c’, ‘red’, 4)
(‘c’, ‘red’, 9)
(‘c’, ‘red’, 16)
(‘c’, ‘red’, 25)
(‘c’, ‘red’, 36)
(‘c’, ‘red’, 49)
(‘c’, ‘red’, 64)
(‘c’, ‘red’, 81)
(‘c’, ‘yellow’, 1)
(‘c’, ‘yellow’, 4)
(‘c’, ‘yellow’, 9)
(‘c’, ‘yellow’, 16)
(‘c’, ‘yellow’, 25)
(‘c’, ‘yellow’, 36)
(‘c’, ‘yellow’, 49)
(‘c’, ‘yellow’, 64)
(‘c’, ‘yellow’, 81)
(‘c’, ‘blue’, 1)
(‘c’, ‘blue’, 4)
(‘c’, ‘blue’, 9)
(‘c’, ‘blue’, 16)
(‘c’, ‘blue’, 25)
(‘c’, ‘blue’, 36)
(‘c’, ‘blue’, 49)
(‘c’, ‘blue’, 64)
(‘c’, ‘blue’, 81)
(‘c’, ‘green’, 1)
(‘c’, ‘green’, 4)
(‘c’, ‘green’, 9)
(‘c’, ‘green’, 16)
(‘c’, ‘green’, 25)
(‘c’, ‘green’, 36)
(‘c’, ‘green’, 49)
(‘c’, ‘green’, 64)
(‘c’, ‘green’, 81)
(‘d’, ‘red’, 1)
(‘d’, ‘red’, 4)
(‘d’, ‘red’, 9)
(‘d’, ‘red’, 16)
(‘d’, ‘red’, 25)
(‘d’, ‘red’, 36)
(‘d’, ‘red’, 49)
(‘d’, ‘red’, 64)
(‘d’, ‘red’, 81)
(‘d’, ‘yellow’, 1)
(‘d’, ‘yellow’, 4)
(‘d’, ‘yellow’, 9)
(‘d’, ‘yellow’, 16)
(‘d’, ‘yellow’, 25)
(‘d’, ‘yellow’, 36)
(‘d’, ‘yellow’, 49)
(‘d’, ‘yellow’, 64)
(‘d’, ‘yellow’, 81)
(‘d’, ‘blue’, 1)
(‘d’, ‘blue’, 4)
(‘d’, ‘blue’, 9)
(‘d’, ‘blue’, 16)
(‘d’, ‘blue’, 25)
(‘d’, ‘blue’, 36)
(‘d’, ‘blue’, 49)
(‘d’, ‘blue’, 64)
(‘d’, ‘blue’, 81)
(‘d’, ‘green’, 1)
(‘d’, ‘green’, 4)
(‘d’, ‘green’, 9)
(‘d’, ‘green’, 16)
(‘d’, ‘green’, 25)
(‘d’, ‘green’, 36)
(‘d’, ‘green’, 49)
(‘d’, ‘green’, 64)
(‘d’, ‘green’, 81)
(‘e’, ‘red’, 1)
(‘e’, ‘red’, 4)
(‘e’, ‘red’, 9)
(‘e’, ‘red’, 16)
(‘e’, ‘red’, 25)
(‘e’, ‘red’, 36)
(‘e’, ‘red’, 49)
(‘e’, ‘red’, 64)
(‘e’, ‘red’, 81)
(‘e’, ‘yellow’, 1)
(‘e’, ‘yellow’, 4)
(‘e’, ‘yellow’, 9)
(‘e’, ‘yellow’, 16)
(‘e’, ‘yellow’, 25)
(‘e’, ‘yellow’, 36)
(‘e’, ‘yellow’, 49)
(‘e’, ‘yellow’, 64)
(‘e’, ‘yellow’, 81)
(‘e’, ‘blue’, 1)
(‘e’, ‘blue’, 4)
(‘e’, ‘blue’, 9)
(‘e’, ‘blue’, 16)
(‘e’, ‘blue’, 25)
(‘e’, ‘blue’, 36)
(‘e’, ‘blue’, 49)
(‘e’, ‘blue’, 64)
(‘e’, ‘blue’, 81)
(‘e’, ‘green’, 1)
(‘e’, ‘green’, 4)
(‘e’, ‘green’, 9)
(‘e’, ‘green’, 16)
(‘e’, ‘green’, 25)
(‘e’, ‘green’, 36)
(‘e’, ‘green’, 49)
(‘e’, ‘green’, 64)
(‘e’, ‘green’, 81)

Yet look at the type and size differences:

< class ‘list’> 1680
< class ‘generator’> 80

newlist is 21 times larger than newgen. So, as you write iterable obejcts, especially iterables that are being used for another computation, consider generators!

R Data Structures

R Data Structures overview by Hadley Wickham

If you are working with any programming language, there is nothing more important to understand fundamentally than
the language’s underlying data structures. Wickham on R Data Structures is an
excellent overview for R programmers.


There are five fundamental data types in R.

  • Homogeneous
    1. 1D – Atomic vector
    2. 2D – Matrix
    3. nD – Array
  • Heterogeneous
    1. List
    2. Data frame

Hadley goes through the five to show how they compare, contrast and, most importantly, they are interrelated. Important stuff.
He also goes through a small set of exercises to test comprehension. I think that some of these could be used
as bones for interview questions.

Taken from his book, Advanced R which
is well worth the price and should be read by serious R folks.

Practitioner Conference for UX/UI/Testers

[SDTConf 2012](http://t.co/b6moYAN1) is at the University of Houston during April 27-29, 2012. SDTConf is an all open space conference providing software practitioners a platform to meet face-to-face and discuss/demonstrate simple design testing principles/approaches. At this conference you’ll meet real, hands-on practitioners interested in peer-to-peer learning and exploration. We strive hard to avoid fluffy, marketing talks and other non-sense. Continue reading Practitioner Conference for UX/UI/Testers

Enterprise Architecture definition

Been following a very flouncy thread on *structure smells* in an EA, and the guy starting the thread couldn’t define sufficiently well the starting point for what he meant by *enterprise* and *application* architectures. Here is the [TOGAF](http://www.opengroup.org/togaf/) definition, which most sane people would accept, along with some examples of the other stuff that was coming out, for …. comparative value. Continue reading Enterprise Architecture definition