Monday, April 27, 2009

Final Words on Tail Calls

A lot of people remarked that in my post on Tail Recursion Elimination I confused tail self-recursion with other tail calls, which proper Tail Call Optimization (TCO) also eliminates. I now feel more educated: tail calls are not just about loops. I started my blog post when someone pointed out several recent posts by Pythonistas playing around with implementing tail self-recursion through decorators or bytecode hacks. In the eyes of the TCO proponents those were all amateurs, and perhaps that's so.

The one issue on which TCO advocates seem to agree with me is that TCO is a feature, not an optimization. (Even though in some compiled languages it really is provided by a compiler optimization.) We can argue over whether it is a desirable feature. Personally, I think it is a fine feature for some languages, but I don't think it fits Python: The elimination of stack traces for some calls but not others would certainly confuse many users, who have not been raised with tail call religion but might have learned about call semantics by tracing through a few calls in a debugger.

The main issue here is that I expect that in many cases tail calls are not of a recursive nature (neither direct nor indirect), so the elimination of stack frames doesn't do anything for the algorithmic complexity of the code, but it does make debugging harder. For example, if your have a function ending in something like this:
if x > y:
return some_call(z)
else:
return 42
and you end up in the debugger inside some_call() whereas you expected to have taken the other branch, with TCO as a feature your debugger can't tell you the value of x and y, because the stack frame has been eliminated.

(I'm sure at this point someone will bring up that the debugger should be smarter. Sure. I'm expecting your patch for CPython any minute now.)

The most interesting use case brought up for TCO is the implementation of algorithms involving state machines. The proponents of TCO claim that the only alternative to TCO is a loop with lots of state, which they consider ugly. Now, apart from the observation that since TCO essentially is a GOTO, you write spaghetti code using TCO just as easily, Ian Bicking gave a solution that is as simple as it is elegant. (I saw it in a comment to someone's blog that I can't find back right now; I'll add a link if someone adds it in a comment here.) Instead of this tail call:
return foo(args)
you write this:
return foo, (args,)
which doesn't call foo() but just returns it and an argument tuple, and embed everything in a "driver" loop like this:
func, args = ...initial func/args pair...
while True:
func, args = func(*args)
If you need an exit condition you can use an exception, or you could invent some other protocol to signal the end of the loop (like returning None).

And here it ends. One other thing I learned is that some in the academic world scornfully refer to Python as "the Basic of the future". Personally, I rather see that as a badge of honor, and it gives me an opportunity to plug a book of interviews with language designers to which I contributed, side by side with the creators of Basic, C++, Perl, Java, and other academically scorned languages -- as well as those of ML and Haskell, I hasten to add. (Apparently the creators of Scheme were too busy arguing whether to say "tail call optimization" or "proper tail recursion." :-)

Saturday, April 25, 2009

People Who Annoy Me

It's Grumpy Saturday. If I didn't respond to your email or "friend" request, maybe you fall in one of the following categories.

1. People who have exchanged some email with me (or who once commented on my blog, or were once in the same line at a conference) and send me a Linkedin friend request with the default invite message.

2. People I've never heard from asking to become my friend on Facebook.

3. People who send me a long form letter inviting me to speak at their conference, without any kind of personal note at the top.

4. People who resend that form letter three times.

5. People who get all upset when I respond to the last identical copy of that form letter with "Please stop spamming me."

6. People who send me a long rambling email asking me a Python programming question.

7. People who send me a long rambling email proposing a change to the Python language.

8. People who collect pictures of famous people's desktops.

9. People who send me email asking if they can put their ads on my website.

10. People who create fake blogs with computer-generated pseudo-nonsense text that happens to include my name.

Wednesday, April 22, 2009

Tail Recursion Elimination

I recently posted an entry in my Python History blog on the origins of Python's functional features. A side remark about not supporting tail recursion elimination (TRE) immediately sparked several comments about what a pity it is that Python doesn't do this, including links to recent blog entries by others trying to "prove" that TRE can be added to Python easily. So let me defend my position (which is that I don't want TRE in the language). If you want a short answer, it's simply unpythonic. Here's the long answer:

First, as one commenter remarked, TRE is incompatible with nice stack traces: when a tail recursion is eliminated, there's no stack frame left to use to print a traceback when something goes wrong later. This will confuse users who inadvertently wrote something recursive (the recursion isn't obvious in the stack trace printed), and makes debugging hard. Providing an option to disable TRE seems wrong to me: Python's default is and should always be to be maximally helpful for debugging. This also brings me to the next issue:

Second, the idea that TRE is merely an optimization, which each Python implementation can choose to implement or not, is wrong. Once tail recursion elimination exists, developers will start writing code that depends on it, and their code won't run on implementations that don't provide it: a typical Python implementation allows 1000 recursions, which is plenty for non-recursively written code and for code that recurses to traverse, for example, a typical parse tree, but not enough for a recursively written loop over a large list.

Third, I don't believe in recursion as the basis of all programming. This is a fundamental belief of certain computer scientists, especially those who love Scheme and like to teach programming by starting with a "cons" cell and recursion. But to me, seeing recursion as the basis of everything else is just a nice theoretical approach to fundamental mathematics (turtles all the way down), not a day-to-day tool.

For practical purposes, Python-style lists (which are flexible arrays, not linked lists), and sequences in general, are much more useful to start exploring the wonderful world of programming than recursion. They are some of the most important tools for experienced Python programmers, too. Using a linked list to represent a sequence of value is distinctly unpythonic, and in most cases very inefficient. Most of Python's library is written with sequences and iterators as fundamental building blocks (and dictionaries, of course), not linked lists, so you'd be locking yourself out of a lot of pre-defined functionality by not using lists or sequences.

Last, let's look at how we could implement tail recursion elimination. The first observation is that you can't do it at compile time. I've seen at least one blog entry that used a bytecode hack to replace a CALL opcode immediately before a RETURN opcode with a jump to the top of the function body. This may be a nice demo, but unfortunately Python's compiler cannot reliably determine whether any particular call is actually reference the current function, even if it appears to have the same name. Consider this simple example:
def f(x):
if x > 0:
return f(x-1)
return 0

It looks like you could replace the body with something like this:
if x > 0:
x = x-1
<jump to top>
return 0

This seems simple enough, but now add this:
g = f
def f(x):
return x
g(5)

The call to g(5) invokes the earlier f, but the "recursive" call no longer recurses! At run-time, the name 'f' is rebound to the later non-recursive definition, so the returned value is 4, not 0. While I agree that this particual example is bad style, it is a well-defined part of Python's semantics that has plenty of legitimate uses, and a compiler that made this replacement in the optimistic hope that f's definition will remain unchanged would introduce enough bugs in real-world code to cause an outrage.

Another blog post showed decorators that can be used to implement tail recursion using magical exceptions or return values. These can be written in plain Python (though that post shows an optimized Cython version that is claimed to be "only 10% slower", though it doesn't seem to be thread-safe). If this tickles your fancy I won't try to stop you, but I would strongly object against inclusion of something like this in the core distribution: there are many caveats to the use of such a decorator, since it has to assume that any recursive call (in the decorated function) is tail-recursive and can be eliminated. In the hands of less experienced users this could easily lead to disasters. For example, the common recursive definition of factorial is not tail-recursive:
def fact(n):
if n > 1:
return n * fact(n-1)
return 1

There are also plenty of functions that contain a tail-recursive call and another recursive call that isn't tail-recursive; the decorators don't handle such cases. Another subtlety that those decorators don't handle is tail-recursive calls inside try blocks: these may look like they could be eliminated, but they can't, because TRE could remove the exception handling which is guaranteed by the language. For all these reasons I think the decorator approach is doomed, at least for a general audience.

Still, if someone was determined to add TRE to CPython, they could modify the compiler roughly as follows. First, determine "safe" tail-recursive call sites. This would be something like a CALL opcode immediately followed by a RETURN opcode, and completely outside any try blocks. (Note: I'm ignoring the different CALL_* opcodes, which should be easy enough to handle using the same approach.) Next, replace each such CALL-RETURN opcode pair with a single CALL_RETURN opcode. There's no need for the compiler to try and check if the name of the function being called is the same as the current function: the new opcode can represent savings for all CALL+RETURN combinations merely by saving the time to decode a second opcode. If at run time the determination is made that this particular call is not applicable for TRE, the usual actions for a CALL followed by a RETURN opcode are carried out. (I suppose you could add some kind of caching mechanism indexed by call site to speed up the run-time determination.)

In the determination of wheter TRE can be applied, there are several levels of aggressiveness that you could apply. The least aggressive, "vanilla" approach would only optimize the call if the object being called is the function that is already running in the current stack frame. All we have to do at this point is clear the locals out of the current stack frame (and other hidden state like active loops), set the arguments from the evaluation stack, and jump to the top. (Subtlety: the new arguments are, by definition, in the current stack frame. It's probably just a matter of copying them first. More subtleties are caused by the presence of keyword arguments, variable length argument lists, and default argument values. It's all a simple matter of programming though.)

A more aggressive version would also recognize the situation where a method is tail recursive (i.e. the object being called is a bound method whose underlying function is the same as the one in the current stack frame). This just requires a bit more programming; the CPython interpreter code (ceval.c) already has an optimization for method calls. ( I don't know how useful this would be though: I expect the tail recursive style to be popular with programmers who like to use a functional programming style overall, and would probably not be using classes that much. :-)

In theory, you could even optimize all cases where the object being called is a function or method written in Python, as long as the number of local variables needed for the new call can be accommodated in the current stack frame object. (Frame objects in CPython are allocated on the heap and have a variable allocation size based on the required space for the locals; there is already machinery for reusing frame objects.) This would optimize mutually tail-recursive functions, which otherwise wouldn't be optimized. Alas, it would also disable stack traces in most cases, so it would probably not be a good idea.

A more benign variant would be to create Python-level stack frames objects just like before, but reuse the C stack frame. This would create an approximation of Stackless Python, though it would still be easy enough to run out of C stack by recursing through a built-in function or method.

Of course, none of this does anything to address my first three arguments. Is it really such a big deal to rewrite your function to use a loop? (After all TRE only addresses recursion that can easily be replaced by a loop. :-)

Tuesday, April 7, 2009

Italia Here I Come!

That is: PyCon Italia, here I come! It's still a month away (May 8-10), but I'm already looking forward to my vacation in the historic city of Florence and on the beautiful west coast of Italy. The organizers of PyCon Italia kindly invited me to give a keynote at their annual conference, making me an offer I couldn't refuse.

Kidding aside, it looks like it will be a very exciting conference, with Googlers Fredrik Lundh and Alex Martelli also coming to speak, as well as Python core developer Raymond Hettinger. And even though much of the program will be in Italian, real-time translations will be available for the main track. Personally, I'm most looking forward to a mysterious late-night event labeled PyBirra, where the locals will try to drink me under the table. Salute!