Nasty bug in CPython 2.x

Turning to more generalized topics of Python once more, here is the story of a very nasty bug in the 2.x series of the CPython interpreter that I once battled with in Pyrit. This fellow is not going to get fixed, so you should know about this bug if your code is supposed to work in the 2.x series of the CPython interpreter.

Suppose you define a class that produces it’s own kind of iterator-object. Also suppose the way your object creates it’s iterators is not trivial and may involve actions that can fail (e.g. reading from an outside I/O source). You correctly handle failures by raising exceptions; let’s work with raising an instance of IOError in this example. The most simple class doing so looks like this:

class Foobar(object):
….def __iter__(self):
……..raise IOError()

Let’s get an instance of that object and iterate over it’s members:

>> f = Foobar()
>> for  member in f:
>> ….print member
Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
File “<stdin>”, line 3, in __iter__
IOError

In the second line of the code above, the interpreter will call Foobar.__iter__(f) and encounter an IOError which then becomes the exception-state of our current frame. If the code above is the top-frame, the program will crash to console with a traceback and complain about an IOError. This is expected behaviour. We could deal with this by placing the code in a try-except-clause (forget about the bad range of that try-except in our example).

Here comes the tricky part: One of the most popular built-in functions of Python is map(). It takes any function and any iteratable object and applies that function to the values yielded from the iterator; the result is a list of results from that function. Let’s do this with our Foobar-object from above and apply the identity-function:

>>> f = Foobar()
>>> map(lambda x:x, f)

The map()-function will also call Foobar.__iter__(f) to get an iterator. It encounters the IOError we placed there to indicate some I/O-related problem while creating the iterator for that object. We therefor expect the second line to fail with an IOError now. What you get instead in CPython 2.x is always a TypeError.

>>> map(lambda x:x, f)
Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
TypeError: argument 2 to map() must support iteration

This is confusing, to say the least: The error message from map() demands an object that is iterable (has a __iter__()-function of some meaning). Clearly, our object has such a function and raising an IOError is a thing of behaviour, not definition. Where is the TypeError coming from? Even more important, where did our IOError go? In real code, our IOError may be some custom exception to indicate cases that other code can know about and act according to (e.g. OhJustATemporaryIOError). The type (in fact the whole instance) of an exception is very imporant to us.

The behaviour shown by the map()-function disables us from raising meaningful exceptions in __iter__(): Anyone using map() will run into trouble, as everything appears to be a TypeError. You can also trade pest for cholera by sticking PostIt-notes to your screen, reminding you to never use map() on classes written by the guys in the other building or always catch TypeError when doing so (this will get you laid off).

The reason for all this is hidden within CPython’s implementation of the builtin map()-function. In Python/bltinmodule.c:975 we find this:

sqp->it = PyObject_GetIter(curseq);
if (sqp->it == NULL) {
….static char errmsg[] =
……..”argument %d to map() must support iteration”;
….char errbuf[sizeof(errmsg) + 25];
….PyOS_snprintf(errbuf, sizeof(errbuf), errmsg, i+2);
….PyErr_SetString(PyExc_TypeError, errbuf);
….goto Fail_2;
}

The function PyObject_GetIter() – we are talking CPython now – gets an iterator from an object. The return value will be NULL and the caller should find PyErr_Occurred() to be True in case the function fails. If so, there is an exception waiting in the current frame and the interpreter will act accordingly when given the chance. The code above however does not care about any exceptions that might have been raised and goes directly to calling PyErr_SetString(PyExc_TypeError, errbuf). It therefor overwrites any other exception and raises a TypeError instead. In other terms: The map()-function swallows any exception raised in any __iter__() and always replaces it with a TypeError. This took me a while to figure out.

Luckily, there is an easy workaround for this: Your iterator’s __iter__() must always return self (aka the iterator over the iterator is the iterator itself) in case your object’s __iter__()-function can raise exceptions on it’s own or by underlying code. This is already true for most simple objects in Python:

>> mylist = []
>> myiter = iter(mylist)
>> myiter is iter(myiter)
True

To make our Foobar-object behave like this, we need to expand our example like this:

class FoobarIterator(object):
….def __init__(self, fbar):
……..self.fbar = fbar

….def __iter__(self):
……..return self

….def next(self):
……..pass #return values of self.fbar

class Foobar(object):
….def __iter__(self):
……..if True:
…………raise IOError()
……..else:
…………return FoobarIterator(self)

Now we must do the following:

>>> map(lambda x:x, iter(f))

The explicit call to iter(f) looks mundane but is in fact the key to get correct behaviour. Remember that the map()-function will always get an iterator for  any object that you pass as second argument. This new iterator, which would have been an instance of  FoobarIterator, is just the iterator itself so there is nothing happening due to that iter(). However, we do an explicit call to Foobar.__iter__(f) and iter()‘s implemention in CPython 2.x handles exceptions as it should:

>>> map(lambda x:x, iter(f))
Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
File “<stdin>”, line 3, in __iter__
IOError

This is the expected behaviour of our object and it now works in all cases.

The bug described above also affects Python 2.6 and 2.7 but will not be fixed in any new version of the 2.x series; the 3.x series is unaffected. The CPython overlords decided not to act on this bug as 2.x is running out of business and changes in behaviour are no longer allowed. Personally, I don’t find this argument overly compelling: The interpreter is cleary doing things wrong and causes unexpected behaviour. While there is a way to handle this situation by restricting the behaviour of iterator objects, there is actually no way to generically remove that workaround when porting code from 2.x to 3.x as 2to3 can’t assume that it’s safe to remove an explicit call to iter.

The only lesson we are left with is this: There is very nasty bug in CPython 2.x’s map()-function that you have to know about and deal with yourself. The solution is to always use map() in conjunction with an explicit call to iter(). Otherwise all exceptions will be mangled into a TypeError which you have reason to handle only in special cases.

About these ads

Leave a comment

No comments yet.

Comments RSS TrackBack Identifier URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.