Friday, June 3, 2011

The depth and breadth of Python

As of late I'm noticing a trend: I'm spending more time having in-person in-depth conversations, and less time coding. While I regret the latter, I really enjoy the former. Certainly more than weekly meetings, code reviews, or bikeshedding email threads. (I'm not all that excited about blogging either, as you may have guessed; but some things just don't fit in 140 characters.)

Two conversations with visitors I particularly enjoyed this week were both with very happy Python users, and yet they couldn't be more different. This to me is a confirmation of Python's enduring depth and breadth: it is as far away of a one-trick language as you can imagine.

My first visitor was Annie Liu, a professor of computer science (with a tendency to theory :-) at Stony Brook University in New York State. During an animated conversation that lasted nearly three hours (and still she had more to say :-) she explained to me the gist of her research, which appears to be writing small Python programs that implement fundamental algorithms using set comprehensions, and then optimizing the heck out of it using an automated approach she summarized as the three I's: Iterate, incrementalize, and implement. While her academic colleagues laugh at her for her choice of such a non-theoretical language like Python, her students love it, and she seems to be having the last laugh, obtaining publication-worthy results that don't require advanced LaTeX skills, nor writing in a dead language like SETL (of which she is also a great fan, and which, via ABC, had some influence on Python -- see also below).

Annie told me an amusing anecdote about an inscrutable security standard produced by NiST a decade ago, with a fifty-page specification written in Z. She took a 12-page portion of it and translated it into a 120-line Python program, which was much more readable than the original, and in the process she uncovered some bugs in the spec!

Another anecdote she recounted had reached me before, but somehow I had forgotten about it until she reminded me. It concerns the origins of Python's use of indentation. The anecdote takes place long before Python was created. At an IFIP working group meeting in a hotel, one night the delegates could not agree about the best delimiters to use for code blocks. On the table were the venerable BEGIN ... END, the newcomers { ... }, and some oddities like IF ... FI and indentation. In desperation someone said the decision had to be made by a non-programmer. The only person available was apparently Robert Dewar's wife, who in those days traveled with her husband to these events. Despite the late hour, she was called down from her hotel room and asked for her independent judgement. Immediately she decided that structuring by pure indentation was the winner. Now, I've probably got all the details wrong here, but apparently Lambert Meertens was present, who went on to design Python's predecessor, ABC, though at the time he called it B (the italics meant that B was not the name of the language, but the name of the variable containing the name of the language). I checked my personal archives, and the first time I heard this was from Prof. Paul Hilfinger at Berkeley, who recounted a similar story. In his version, it was just Lambert Meertens and Robert Dewar, and Robert Dewar's wife chose indentation because she wanted to go to bed. Either way it is a charming and powerful story. (UPDATE: indeed the real story was quite different.)

Of course Annie had some requests as well. I'll probably go over these in more detail on python-ideas, but here's a quick rundown (of what I could remember):
  • Quantifiers. She is really longing for the "SOME x IN xs HAS pred" notation from ABC (and its sibling "EACH x IN xs HAS pred"), which superficially resemble Python's any() and all() functions, but have the added semantics of making x available in the scope executed when the test succeeds (or fails, in the case of EACH -- then x represents a counterexample).
  • Type declarations. (Though I think she would be happy with Python 3 function annotations, possibly augmented with the attribute declarations seen in e.g. Django and App Engine's model classes.)
  • Pattern matching, a la Erlang. I have been eying these myself from time to time; it is hard to find a syntax that really shines, but it seems to be a useful feature.
  • Something she calls labels or yield points. It seems somewhat similar to yield statements in generators, but not quite.
  • She has only recently begun to look at distributed algorithms (she had some Leslie Lamport anecdotes as well) and might prefer sets to be immutable after all. Though that isn't so clear; her work so far has actually benefited from mutating sets to maintain some algorithmic invariant. (The "incrementalize" of the three I's actually refers to a form of "differentiation" of expressions that produce a new set for each input.)
The contrast with my visitor the next day couldn't be greater. Through a former colleague I got an introduction to Drew Houston, co-founder and CEO of the vastly successful start-up company Dropbox. Dropbox currently has 25 million users, stores petabytes of data on Amazon S3, is profitable, and is not for sale. Drew is an easygoing MIT graduate who is equally comfortable discussing custom memory allocators, the world of venture capitalism, and how to keep engineers happy; he likes hard problems and winning.

Python plays an important role in Dropbox's success: the Dropbox client, which runs on Windows, Mac and Linux (!), is written in Python. This is key to the portability: everything except the UI is cross-platform. (The UI uses a Python-ObjC bridge on Mac, and wxPython on the other platforms.) Performance has never been a problem -- understanding that a small number of critical pieces were written in C, including a custom memory allocator used for a certain type of objects whose pattern of allocation involves allocating 100,000s of them and then releasing all but a few. Before you jump in to open up the Dropbox distro and learn all about how it works, beware that the source code is not included and the bytecode is obfuscated. Drew's no fool. And he laughs at the poor competitors who are using Java.

Next Monday I'm having lunch with another high-tech enterpreneur, a Y-combinator start-up founder using (and contributing to) App Engine. Maybe I should just cancel all weekly meetings and sign off from all mailing lists and focus on two things: meeting Python users and coding. That's the life!

38 comments:

  1. As a programmer for SpiderOak, I believe we make much betteer use of Python than that competitor you talked to.

    ReplyDelete
  2. "SOME x IN xs HAS test" seems equivalent to "x = first(x in xs if test)" where "def first(it): it=iter(it); try: return it.next() except StopIteration: return None" (on Blogger you must infer indentation ;). My understanding of EACH is that it's the same as "SOME x IN xs HAS not test"? Or, are these distinct from assignment because they are expressions?

    I feel like pattern matching is largely possible right now in Python, just not sufficiently explored. PJE's generic functions are certainly a form of pattern matching, and a fairly mature one, though also hacky (e.g., putting expressions in strings). As example I tried that is not as hacky but maybe not as general is patmatch -- though still using the same basic decorator-based syntax as PJE generic functions. From an implementation point of view generic functions also show that efficient comparisons can be made (using its technique of finding common subexpressions and avoiding multiple evaluations of those subexpressions.

    IMHO a general concept of immutability would be really nice, as I think it is a really nice building block for distributed processing. It's kind of ad hoc right now.

    ReplyDelete
  3. @dougfort How tacky to say something like that and offer no further elaboration. SpiderOak looks a lot more amaturish than DropBox.

    ReplyDelete
  4. @dougfort: please write to psf@python.org to tell them your success story!

    @Ian Bicking: The key feature of SOME is that it is an expression with a side effect (an assignment). SOME is the mathematical "exists" operator (upside-down E) and EACH is the "all" operator (upside-down A). "EACH x IN xs HAS pred" is the same as "NOT SOME x IN xs HAS NOT pred".

    I don't doubt that pattern matching is implementable; I just think that a really nice notation as available in some other languages would make it much more attractive (just like decorators didn't add new possibilities to the language but still changed how it is commonly used).

    ReplyDelete
  5. I've been similarly amazed by the breadth of Python, I started using it to do semiconductor design almost 10 years ago; it was embedded in our chip layout editor and swig wrapped the underlying C-api, so you could do all nature of complex layout editing tasks with a smallish Python script -- super powerful and fast to boot.

    I now work in the internet startup space and my bridge for transitioning from hardware design was literally Python -- I was a web-tech newb, but Python savvy and there are just loads of web startups using Python.

    So today I use it for basically everything, automating code and server deployment, running our entire backend http stack, ingesting and munging content feeds in xml and json, the works.

    ReplyDelete
  6. Isn't SOME x IN xs HAS pred written as:

    for x in itertools.filter(pred, xs):
    . use(x)
    . break
    else:
    . none_matched()

    And EACH x IN xs HAS pred:

    for x in (x for x in xs if not pred(x)):
    . use(x)
    . break
    else:
    . all_matched()

    Though imaginary, more-useful syntax for that loop could be:

    for x in xs if not pred(x):
    . ...

    ReplyDelete
  7. @Roger Pate: The problem with the for-loop expansion is that you can't combine it with WHILE. Annie's favorite example is WHILE SOME x IN xs HAS pred:

    ReplyDelete
  8. Whoops, itertools.ifilter. And I notice itertools.ifilterfalse simplifies the EACH case.

    ReplyDelete
  9. Great post Guido, as always! Fun to see the ObjC-Python bridge mentioned as I like to relate the ostensible story of why CNRI pursued you back in the day, how you excited us all with how easy it would be to marry Python and ObjC, and then (rightly!) disavowed us of the misguided notion that we couldn't just do it all in Python. :)

    ReplyDelete
  10. Semantic indentation was a major reason I latched onto Python back in '99. If another language could have offered me less curly brackets and more expressivity I might have had a flirt with it, but it seems Python is pretty unique, although Coffeescript seems to be doing very well for javascript by adopting a similar principle of legibility.

    It's amazing how much difference such an implementation detail can have on the accessibility of a language. I like to think that Python is the least gratuitously nerdy language out there.

    If computers are now more about communication that calculation, that bodes very well for the future of this language.

    ReplyDelete
  11. +1 for the Pattern matching, it would be great feature.

    ReplyDelete
  12. It's interesting to see that Annie requested type declarations. I'm currently working on a static type-checking utility for Python as a senior thesis. The idea was proposed by my professor; he likes teaching in Python for intro CS classes, but feels that people just getting started with programming should really stick to static typing coding style until they understand enough to have good reason to need dynamic typing features. Now I'm starting to wonder if this is something that a lot of professors want to see in Python.

    ReplyDelete
  13. SOME = ∃ = U+2203 '\N{THERE EXISTS}'
    EACH = ∀ = U+2200 '\N{FOR ALL}'

    In the Unicode age (hurrah!), periphrasis isn't that needed :)

    ReplyDelete
  14. @Snagglepants
    Apparently the Dropbox fan's have more in common with Apple users then previously believed.

    Tacky? Dude, you might want to read up on netiquette and buy a dictionary.

    ReplyDelete
  15. One more flame about dropbox v. spideroak (or other competitors) and I'll turn off commenting and delete all comments about that topic.

    ReplyDelete
  16. Please dont turn commenting off: they are very interesting

    ReplyDelete
  17. Thought about any(): it might have been interesting to have it return either an empty tuple (if False) or a 1-tuple containing the value (if True). Too late for that now, though.

    Regarding while/if: Perhaps it's time to revisit the idea of an 'as' clause on these conditionals (the tuple trick above could handle the cases where the condition and required value inside the clause don't match up)

    As far as immutability goes, perhaps someone would care to take the freeze() PEP back up (PEP 351)

    ReplyDelete
  18. As we're talking about language requests:
    Something that really could be done but probably won't are ranges in the form 1..10. :)

    As Ellipsis took the triple dot syntax a ruby-like range can't be made.

    BTW, I always tought it was inconsistent to have mylist[1:5] but 1:5 alone don't work as a range.

    Just toughts... :P

    ReplyDelete
  19. As far as I can tell, wouldn't an optional second arg to any() do the job for SOME:

    def any(iterable, fn = ident):
    for x in iterable:
    if fn(x):
    return True
    return False

    ReplyDelete
  20. @Carl
    You can already use: any(ident(x) for x in iterable)

    ReplyDelete
  21. Hi Guido. This "while some x in xs has pred:", will x on each iteration be the first element of xs where pred(x) and xs needs changing each iteration? If so, we've only x, and not its index to be able to efficiently alter xs. Also, "some" doesn't emphasise first and is more suggestive of a x where pred(x) being plucked at random. Cheers, Ralph.

    P.S. Isn't an upside-down E an E by another name? :-)

    ReplyDelete
  22. @Ralph Corderoy: xs is typically a set or dict so no index is needed. And the suggestion of "some" (or "any" for that matter) that it's an arbitrary element is intentional, since loops over sets/dicts use an arbitrary order.

    Which reminds me, Annie also asked for an arb() function that is like Ian's first(). It's been discussed on python-dev in the past, but we could never agree on what it should return if the collection is empty. None feels wrong because the collection could very well contain None as a legitimate value.

    ReplyDelete
  23. Why not have arb() behave like next(iterator[, default])? (Throw by default, but a value can be specified.)

    ReplyDelete
  24. Besides all the stories about some important decision making, it's really cool to see how some of the best ideas are coming from interacting with other people.
    So I totally agree with "focus on two things: meeting Python users and coding. That's the life!"
    I find programming less fun without interacting with other people. I guess it's all about sharing ideas and coming up with great solutions together. It probably makes us feel like a better programmer as well as a better human being.

    Thank you for sharing your ideas about programming and life in general, Guido! Good luck!

    PS: Robert Dewar's wife should be given an award for her input on the greatest programming language of all! ;)

    ReplyDelete
  25. Oh! Pattern Matching. Guido, please give us pattern matching. It makes life so much more beautiful in Erlang. With Python being a thing of beauty, it must must must have Pattern Matching.

    ReplyDelete
  26. Quick question: when you said you "checked your personal archives" per the indentation question, were you going back and reading journals that you'd kept or simply browsing over old mail?

    I ask because I'm curious as to how you record, organize, and store your thoughts.

    ReplyDelete
  27. I wish I'd kept journals! No, I just searched my GMail for "Robert Dewar" and found a message from 2005 where I had summarized the story as I had heard it then and sent it off to an old colleague (who didn't reply).

    ReplyDelete
  28. Some proposition to yield "SOME x IN xs HAS pred" and "EACH x IN xs HAS pred" on existing Python language programming constructs, mentioned in answers, definitely will suffer by lack of performance. I have awful lot of "all" and "any" in my Python mathematical code (OpenOpt, FuncDesigner), and both Pyhon and numpy versions of "all" and "any" suffer the mentioned drawback of evaluating all elements before yielding result, while very often checking only several elements would be enough.

    The thing I would like you to pay attention - it would be very essential to get the new Python constructs numpy-style compatible, where "all" and "any" often are evaluated along a chosen dimension of n-dimensional array (e.g. all(X==10, axis=1) or mere all(X==10, 1)).

    Incompatibility between numpy and Python "min" and "max" styles already is a terrible headache, especially for newbies. It's very hard to find the bug. Python max(-1,0) is 0, while numpy treats max(-1,0) as max of array with single element "-1" along axis 0 and thus result is -1.

    I hope it will be taken into account to reduce numpy-python incompatibility.

    ReplyDelete
  29. @Dmitrey: we're getting pretty far off-topic here, but I suggest that NumPy was in the fault by redefining min() and max() that way. This could be avoided by requiring a keyword for axis, e.g. max(a, axis=0). Perhaps NumPy can evolve in this direction?

    ReplyDelete
  30. Well, at first, it will bring some backward incompatibility to previous numpy versions, at second, I'm not a NumPy developer and thus cannot affect decisions they make. Thus here I mere want to prevent appearing of another one incompatibility. Checking predicates along axis of n-dimensional array will certainly be used very often in scientific/engineering Python software to speedup calculations instead of using "all"/"any" along the involved axis.

    Sometimes I wonder what NumPy developers do (they often doesn't care even about bug reports I inform), but of course it's absolutely up to them. For example, one of most essential task numpy certainly require - possibility for downloading it from easy_install or linux channels (e.g. apt-get) with ACML (for AMD processors, free of charge and without royalites) or MKL (Intel processors, free-for-educational) (currently it requires lots of makefile changes and other efforts). But since Enthought (who sponsor numpy and thus affect their decisions) included MKL into their EPD, they seems to be not interested of it any longer - they prefer to force people buy their EPD with MKL than having it easily available with numpy download for free.

    ReplyDelete
  31. By the way, Python and numpy "sum" are also incompatible. E.g. Python:

    >>> sum([1,2],-1)
    2
    Now suppose a programmer replaced in a Python file header "from numpy import func1, func2, ..." by "from numpy import *", or imported "sum" from numpy to perform his own operations somewhere in his part of the file. Then somewhere in other part of the file we'll get

    >>> from numpy import *
    >>> sum([1,2],-1)
    3

    It can be obtained in most inappropriate time and be very hard to find the error (or even worse, it can be not revealed at all).

    As for the predicates check, I think it's better to handle them in more lower level (e.g. C language), and possibly to have it parallel computed (at least, by default or by an optional argument). On the other hand, maybe it's better to wait with all those parallel computations changes till Python 3.3 where LLVM-based JIT will be implemented (I hope it will be already done that time, isn't it?)

    ReplyDelete
  32. Sorry, Unladen Swallow is dead: http://qinsb.blogspot.com/2011/03/unladen-swallow-retrospective.html

    ReplyDelete
  33. Another +1 for pattern matching arguments.

    What about something like

    def f([h,*t]) :

    for cutting up lists? It's close enough to [H|T] and consistent with *args notation.

    Empty lists would still be [].

    ReplyDelete
  34. Seize the day Guido, just make sure everything you do in your life is pure enjoyment! (And I bet it is!)

    I'll just quote this lyric from a song. (http://www.pubnub.com/blog/internet-heroes-song-crockford-dahl-wall-torvalds-resig-souders)

    "Guido Van Rossum, thanks for the Python. More like Van Awesome, maybe Van Right On."

    ReplyDelete
  35. I would like to thank Guido van Rossum for Python. I am a student of computer science and started python 3 weeks ago and now my world is changed. Honestly I never learned anything with such joy like Python. In python I feel like everything is possible to solve. .......

    Thanks Guido van Rossum ...you are my favorite person.

    ReplyDelete
  36. I would like to thank Guido van Rossum for Python. I am a student of computer science and started python 3 weeks ago and now my world is changed. Honestly I never learned anything with such joy like Python. In python I feel like everything is possible to solve. .......

    Thanks Guido van Rossum ...you are my favorite person.

    ReplyDelete
  37. I think pattern matching would be a great addition to Python. I did a small experiment implementing it as an external library (here: http://github.com/martinblech/pyfpm) but the new syntax requires a parser of its own. Language-level support would be much better. Still, pyfpm lets you do stuff like:

    unpacker('head :: tail') << (1, 2, 3)
    unpacker.head # 1
    unpacker.tail # (2, 3)

    or

    @match_args('head :: tail')
    def match(head, tail):
    return head, tail

    match([1, 2, 3]) # (1, (2, 3))

    ReplyDelete

Note: Only a member of this blog may post a comment.