Wednesday, April 27, 2016

King's Day Speech

Today the Netherlands celebrates King's Day. To honor this tradition, the Dutch embassy in San Francisco invited me to give a "TED talk" to an audience of Dutch and American entrepreneurs. Here's the text I read to them. Part of it is the tl;dr of my autobiography; part of it is about the significance of programming languages; part of it is about Python's big idea. Leve de koning! (Long live the king!)

Python: a programming language created by a community

Excuse my ramblings. I’ll get to a point eventually.

Let me introduce myself. I’m a nerd, a geek. I’m probably somewhere on the autism spectrum. Im also a late bloomer. I graduated from college when I was 26. I was 45 when I got married. Im now 60 years old, with a 14 year old son. Maybe I just have a hard time with decisions: I’ve lived in the US for over 20 years and I am still a permanent resident.

I'm no Steve Jobs or Mark Zuckerberg. But at age 35 I created a programming language that got a bit of a following. What happened next was pretty amazing. But I'll get to that.

At age 10 my parents gave me an educational electronics kit. The kit was made by Philips, and it was amazing. At first I just followed the directions and everything worked; later I figured out how to design my own circuits. My prized possessions were the kit's three (!) transistors.

I took one of my first electronics models, a blinking light, to show and tell in 5th grade. It was a total dud nobody cared or understood its importance. I think that's one of my earliest memories of finding myself a geek: until then I had just been a quiet quick learner.

In high school I developed my nerdiness further I hung out with a few other kids interested in electronics, and during physics class we sat in the back of the class discussing NAND gates while the rest of the class was still figuring out Ohm's law.

Fortunately our physics teacher had figured us out: he employed us to build a digital timer that he used to demonstrate the law of gravity to the rest of the class. It was a great project and showed us that our skills were useful. The other kids still thought we were weird: it was the seventies and many were into smoking pot and rebelling; another group was already preparing for successful careers as doctors or lawyers or tech managers. But they left me alone, I left them alone, and I graduated as one of the best of my year.

After high school I went to the University of Amsterdam: It was close to home, and to a teen growing up in the Netherlands in the seventies, Amsterdam was the only cool city. (Yes, the student protests of 1968 did touch me a bit.) Much to my high school physics teacher's surprise and disappointment, I chose to major in math, not physics. But looking back I think it didn’t matter.

In the basement of the science building was a mainframe computer, and it was love at first sight. Card punches! Line printers! Batch jobs! More to the point, I quickly learned to program, in languages with names like Algol, Fortran and Pascal. Mostly forgotten names, but highly influential at the time. Soon I was, again, sitting in the back of class, ignoring the lecture, correcting my computer programs. And why was that?

In that basement, around the mainframe, something amazing was happening. There was a loosely-knit group of students and staff with similar interests, and we exchanged tricks of the trade. We shared subroutines and programs. We united in our alliances against the mainframe staff, especially in the endless cat-and-mouse games over disk space. (Disk space was precious in a way you cannot understand today.)

But the most important lesson I learned was about sharing: while most of the programming tricks I learned there died with the mainframe era, the idea that software needs to be shared is stronger than ever. Today we call it open source, and it’s a movement. Hold that thought!

At the time, my immediate knowledge of the tricks and the trade seemed to matter most though. The mainframe’s operating system group employed a few part-time students, and when they posted a vacancy, I applied, and got the job. It was a life-changing event! Suddenly I had unlimited access to the mainframe no more fighting for space or terminalsplus access to the source code for its operating system, and dozens of colleagues who showed me how all that stuff worked.

I now had my dream job, programming all day, with real customers: other programmers, the users of the mainframe. I stalled my studies and essentially dropped out of college, and I would not have graduated if not for my enlightened manager and a professor who hadn't given up on me. They nudged me towards finishing some classes and pulled some strings, and eventually, with much delay, I did graduate. Yay!

I immediately landed a new dream job that would not have been open to me without that degree. I had never lost my interest in programming languages as an object of study, and I joined a team building a new programming language — not something you see every day. The designers hoped their language would take over the world, replacing Basic.

It was the eighties now, and Basic was the language of choice for a new generation of amateur programmers, coding on microcomputers like the Apple II and the Commodore 64. Our team considered the Basic language a pest that the world should be rid of. The language we were building, ABC, would "stamp out Basic", according to our motto.

Sadly, for a variety of reasons, our marketing (or perhaps our timing) sucked, and after four years, ABC was abandoned. Since then I've spent many hours trying to understand why the project failed, despite its heart being so clearly in the right place. Apart from being somewhat over-engineered, my best answer is that ABC died because there was no internet in those days, and as a result there could not be a healthy feedback loop between the makers of the language and its users. ABC’s design was essentially a one-way street.

Just half a decade later, when I was picking through ABC’s ashes looking for ideas for my own language, that missing feedback loop was one of the things I decided to improve upon. “Release early, release often” became my motto (freely after the old Chicago Democrats’ encouragement, “vote early, vote often”). And the internet, small and slow as it was in 1990, made it possible.

Looking back 25 years, the Internet and the Open Source movement (a.k.a. Free Software) really did change everything. Plus something called Moore's Law, which makes computers faster every year. Together, these have entirely changed the interaction between the makers and users of computer software. It is my belief that these developments (and how I managed to make good use of them) have contributed more to the success of “my” programming language than my programming skills and experience, no matter how awesome.

It also didn't hurt that I named my language Python. This was a bit of unwitting marketing genius on my part. I meant to honor the irreverent comedic genius of Monty Python's Flying Circus, and back in 1990 I didn't think I had much to lose. Nowadays, I'm sure "brand research" firms would be happy to to charge you a very large fee to tell you exactly what complex of associations this name tickles in the subconscious of the typical customer. But I was just being flippant.

I have promised the ambassador not to bore you with a technical discussion of the merits of different programming languages. But I would like to say a few things about what programming languages mean to the people who use them programmers. Typically when you ask a programmer to explain to a lay person what a programming language is, they will say that it is how you tell a computer what to do. But if that was all, why would they be so passionate about programming languages when they talk among themselves?

In reality, programming languages are how programmers express and communicate ideas and the audience for those ideas is other programmers, not computers. The reason: the computer can take care of itself, but programmers are always working with other programmers, and poorly communicated ideas can cause expensive flops. In fact, ideas expressed in a programming language also often reach the end users of the program people who will never read or even know about the program, but who nevertheless are affected by it.

Think of the incredible success of companies like Google or Facebook. At the core of these are ideas ideas about what computers can do for people. To be effective, an idea must be expressed as a computer program, using a programming language. The language that is best to express an idea will give the team using that language a key advantage, because it gives the team members — people! — clarity about that idea. The ideas underlying Google and Facebook couldn't be more different, and indeed these companies' favorite programming languages are at opposite ends of the spectrum of programming language design. And that’s exactly my point.

True story: The first version of Google was written in Python. The reason: Python was the right language to express the original ideas that Larry Page and Sergey Brin had about how to index the web and organize search results. And they could run their ideas on a computer, too!

So, in 1990, long before Google and Facebook, I made my own programming language, and named it Python. But what is the idea of Python? Why is it so successful? How does Python distinguish itself from other programming languages? (Why are you all staring at me like that? :-)

I have many answers, some quite technical, some from my specific skills and experience at the time, some just about being in the right place at the right time. But I believe the most important idea is that Python is developed on the Internet, entirely in the open, by a community of volunteers (but not amateurs!) who feel passion and ownership.

And that is what that group of geeks in the basement of the science building was all about.

Surprise: Like any good inspirational speech, the point of this talk is about happiness!

I am happiest when I feel that I'm part of such a community. I’m lucky that I can feel it in my day job too. (I'm a principal engineer at Dropbox.) If I can't feel it, I don't feel alive. And so it is for the other community members. The feeling is contagious, and there are members of our community all over the world.

The Python user community is formed of millions of people who consciously use Python, and love using it. There are active members organizing Python conferences — affectionately known as PyCons — in faraway places like Namibia, Iran, Iraq, even Ohio!

My favorite story: A year ago I spent 20 minutes on a video conference call with a classroom full of faculty and staff at Babylon University in southern Iraq, answering questions about Python. Thanks to the efforts of the audacious woman who organized this event in a war-ridden country, students at Babylon University are now being taught introductory programming classes using Python. I still tear up when I think about the power of that experience. In my wildest dreams I never expected I’d touch lives so far away and so different from my own.

And on that note I'd like to leave you: a programming language created by a community fosters happiness in its users around the world. Next year I may go to PyCon Cuba!

Monday, October 28, 2013

Book review: Introduction to Computer Science Using Python (by Charles Dierbach)

After much back and forth I received a nice new Python book in the mail. The book's full title is "Introduction to Computer Science Using Python: A Computational Problem-Solving Focus", and its author is a very experienced educator, Charles Dierbach.

This is not your average Python book -- it is a college text intended for first-semester CS courses that happens to use Python. As such, in assumes absolutely no previous programming experience, and it looks like any previous computer experience is optional. Not only that, but the book starts with a step-by-step introduction to the art of computational problem solving. This is an idea that goes well beyond hacking together a website!

The book is incredibly thorough: there are exercises throughout the text (not just at the end of each chapter), and it includes a plethora of examples, screenshots, tables, charts, diagrams, and photos. (Yes, my picture is in there -- so are Alan Turing, JFK, and K&R. :-)

The author is not afraid of taking a stance; for example, he omits the 'break' and 'continue' statement because they do not fit within the paradigm of structured programming. This actually fits with the general goal of the book, which is to give an overview of many areas of computer science without getting too deep into the minutiae of any topic. I love the final chapter, which is an overview of the history of computing, starting with Charles Babbage and Ada Lovelace.

At the same time, the book gives plenty of useful practical information, such as instructions for using IDLE and an extensive explanation of turtle graphics, culminating in a horse race simulation. (The author's Baltimore roots seem to show through here. :-)

All in all, I think this book is a great text for anyone teaching CS1 or interested in familiarizing themselves with computer science through serious self-study.

Thursday, October 24, 2013

Letter to a young programmer

Dear (insert name here),

I heard you enjoy a certain programming language named Python. Programming is a wonderful activity. I am a little jealous that you have access to computers at your age; when I grew up I didn't even know what a computer was! I was an electronics hobbyist though, and my big dream was to build my own electronic calculator from discrete components. I never did do that, but I did build several digital clocks, and it was amazing to build something that complex and see it work. I hope you dream big too -- programmers can make computers (and robots!) do amazing things, and this is a great time to become a programmer. Just imagine how much faster computers will be in five or ten years, and what you will be able to do with your skills then!

--Guido van Rossum (inventor of Python)

Thursday, August 25, 2011

Compare-And-Set in Memcache

With the most recent release (1.5.3, last week) App Engine's Python API for Memcache has added a new feature, Compare-And-Set. This feature (with a different API) was already available in Java; it has also been available in the non-App-Engine pure-Python memcache client. In fact, I designed the App Engine Python API for this feature to be compatible with the latter, since most of the rest of the App Engine Python API also strives to be at least a superset of that package.

But what is it? There seems to be little information on how to use Compare-And-Set with memcache. It is also sometimes (incorrectly) referred to as Compare-And-Swap -- incorrect, because the cas() operation does not actually "swap" anything. The first response when we closed the bug requesting this feature was "Some examples of usage are appreciated." So here goes.

The basic use case for Compare-And-Set is when multiple requests that are being handled concurrently need to update the same memcache key in an atomic fashion. Let's assume you are managing a counter in memcache. (Actually, you could use the incr() and decr() operations to update 64-bit integer counters atomically, but just for argument's sake assume you cannot use those -- there are other data types for which the memcache service does not have built-in support.)

The naive code to update a counter would be something like this:

def init_counter(key):
. memcache.set(key, 0)

def bump_counter(key):
. counter = memcache.get(key)
. assert counter is not None, 'Uninitialized counter'
. memcache.set(key, counter+1)

(Aside: I don't want to have to think about how to get blogger to properly format Python code. I really don't. So just bear with the dots I use for indentation. Okay? Comments pointing me to solutions will be DELETED.)

(Aside 2: The assert is kind of naive; in practice you'll have to somehow deal with counter initialization. You also should implement a backup for your counter using the App Engine datastore, so that it can survive eviction by the memcache service. However interesting these details are on their own, I leave them for another time.)

Hopefully you can spot the problem in this version of bump_counter(): if two requests execute concurrently (on different instances of the same app), the sequence of operations might be as follows, labeling the two requests as A and B:

A: counter = memcache.get(key) # Reads 42
B: counter = memcache.get(key) # Reads 42
A: memcache.set(key, counter+1) # Writes 43
B: memcache.set(key, counter+1) # Writes 43

So even though two requests were executed, the counter only gets incremented by one. This is called a race condition. Various interlacings of these lines can have the same effect; a race condition occurs whenever B reads the counter before A has written it (or vice versa).

You could try to guard against this by reading the counter value back and checking that it was incremented by one; however this solution still has a race condition (see if you can figure it out for yourself). There are other solutions possible involving a separate "lock" variable, managed using the add() and delete() operations. However these in general require more server roundtrips, and it is pretty hard to manufacture a decent lock out of the basic memcache operations (try for yourself -- think about what would happen if your request was somehow aborted after acquiring the lock, without having a chance of releasing it).

Using the Compare-And-Set operation, writing a reliable bump_counter() function is a cinch:

def bump_counter(key):
. client = memcache.Client()
. while True: # Retry loop
. . counter = client.gets(key)
. . assert counter is not None, 'Uninitialized counter'
. . if client.cas(key, counter+1):
. . . break

I've highlighted the changes from the previous version. There are several essential differences:
The Client object is required because the gets() operation actually squirrels away some hidden information that is used by the subsequent cas() operation. Because the memcache functions are stateless (meaning they don't alter any global values), these operations are only available as methods on the Client object, not as functions in the memcache module. (Apart from these two, the methods on the Client object are exactly the same as the functions in the module, as you can tell by comparing the documentation.)

The retry loop is necessary because this code doesn't actually avoid race conditions -- it just detects them! The memcache service guarantees that when used in the pattern shown here (i.e. using gets() instead of get() and cas() instead of set()), if two (or more) different client instances happen to be involved a race condition like I showed earlier, only the first one to execute the cas() operation will succeed (return True), while the second one (and later ones) will fail (return False). Let's spell out the events that happen when a race condition occurs:

A: counter = memcache.gets(key) # Reads 42
B: counter = memcache.gets(key) # Reads 42
A: memcache.cas(key, counter+1) # Writes 43, returns True
B: memcache.cas(key, counter+1) # Returns False
B: counter = memcache.gets(key) # Reads 43
B: memcache.cas(key, counter+1) # Writes 44, returns True

Another refinement I've left out here for brevity is to set a limit on the number of retries, to avoid an infinite loop in worst-case scenarios where there is a lot of contention for the same counter (meaning more requests are trying to update the counter than the memcache service can process in real time). You can figure out how to code this for yourself. (UPDATE: see comments #1 and #2 for a note about busy-waiting.)

Now let me explain roughly how this actually works. For some people that helps understanding how to use it: this is often the case for me when I am trying to understand some new concept.

The gets() operation internally receives two values from the memcache service: the value stored for the key (in our example the counter value), and a timestamp (also known as the cas_id). The timestamp is an opaque number; only the memcache service knows what it means. The important thing is that each time the value associated with a memcache key is updated, the associated timestamp is changed. The gets() operation stores this timestamp in a Python dict on the Client object, using the key passed to gets() as the dict key.

The cas() operation internally adds the timestamp to the request it sends to the memcache service. The service then compares the timestamp received with a cas() operation to the timestamp currently associated with the key. If they match, it updates the value and the timestamp, and returns success. If they don't match, it leaves the value and timestamp alone, and returns failure. (By the way, it does not send the new timestamp back with a successful response. The only way to retrieve the timestamp is to call gets().)

Of course, there's one more important ingredient: the App Engine memcache service itself behaves atomically. That is, when two concurrent requests (for the same app id) use memcache, they will go to the same memcache service instance (for historic reasons called a shard), and the memcache service has enough internal locking so that concurrent requests for the same key are properly serialized. In particular this means that two cas() requests for the same key do not actually run in parallel -- the service handles the first request that came in until completion (i.e., updating the value and timestamp) before it starts handling the second request.

And that's Compare-And-Set in a nutshell. If you have questions please don't hesitate to ask!

(UPDATE: The memcache API defines batch versions of most of its APIs. For example, to get multiple keys in a single call, there is get_multi(); to set multiple keys, there is set_multi(). Corresponding to cas(), there is cas_multi(). But there is no gets_multi(): instead, you can use get_multi(keys, for_cas=True). Finally, there's cas_reset(), which clears the dict used to store timestamps. But I haven't figured out what to do with it yet. :-)

Monday, July 25, 2011

Before Python

This morning I had a chat with the students at Google's CAPE program. Since I wrote up what I wanted to say I figured I might as well blog it here. Warning: this is pretty unedited (or else it would never be published :-). I'm posting it in my "personal" blog instead of the "Python history" blog because it mostly touches on my career before Python. Here goes.

Have you ever written a computer program? Using which language?
  • HTML
  • Javascript
  • Java
  • Python
  • C++
  • C
  • Other - which?
[It turned out the students had used a mixture of Scratch, App Inventor, and Processing. A few students had also used Python or Java.]

Have you ever invented a programming language? :-)

If you have programmed, you know some of the problems with programming languages. Have you ever thought about why programming isn't easier? Would it help if you could just talk to your computer? Have you tried speech recognition software? I have. It doesn't work very well yet. :-)

How do you think programmers will write software 10 years from now? Or 30? 50?

Do you know how programmers worked 30 years ago?

I do.

I was born in Holland in 1956. Things were different.

I didn't know what a computer was until I was 18. However, I tinkered with electronics. I built a digital clock. My dream was to build my own calculator.

Then I went to university in Amsterdam to study mathematics and they had a computer that was free for students to use! (Not unlimited though. We were allowed to use something like one second of CPU time per day. :-)

I had to learn how to use punch cards. There were machines to create them that had a keyboard. The machines were as big as a desk and made a terrible noise when you hit a key: a small hole was punched in the card with a huge force and great precision. If you made a mistake you had to start over.

I didn't get to see the actual computer for several more years. What we had in the basement of the math department was just an end point for a network that ran across the city. There were card readers and line printers and operators who controlled them. But the actual computer was elsewhere.

It was a huge, busy place, where programmers got together and discussed their problems, and I loved to hang out there. In fact, I loved it so much I nearly dropped out of university. But eventually I graduated.

Aside: Punch cards weren't invented for computers; they were invented for sorting census data and the like before WW2. [UPDATE: actually much earlier, though the IBM 80-column format I used did originate in 1928.] There were large mechanical machines for sorting stacks of cards. But punch cards are the reason that some software still limits you (or just defaults) to 80 characters per line.

My first program was a kind of "hello world" program written in Algol-60. That language was only popular in Europe, I believe. After another student gave me a few hints I learned the rest of the language straight from the official definition of the language, the "Revised Report on the Algorithmic Language Algol-60." That was not an easy report to read! The language was a bit cumbersome, but I didn't mind, I learned the basics of programming anyway: variables, expressions, functions, input/output.

Then a professor mentioned that there was a new programming language named Pascal. There was a Pascal compiler on our mainframe so I decided to learn it. I borrowed the book on Pascal from the departmental library (there was only one book, and only one copy, and I couldn't afford my own). After skimming it, I decided that the only thing I really needed were the "railroad diagrams" at the end of the book that summarized the language's syntax. I made photocopies of those and returned the book to the library.

Aside: Pascal really had only one new feature compared to Algol-60, pointers. These baffled me for the longest time. Eventually I learned assembly programming, which explained the memory model of a computer for the first time. I realized that a pointer was just an address. Then I finally understood them.

I guess this is how I got interested in programming languages. I learned the other languages of the day along the way: Fortran, Lisp, Basic, Cobol. With all this knowledge of programming, I managed to get a plum part-time job at the data center maintaining the mainframe's operating system. It was the most coveted job among programmers. It gave me access to unlimited computer time, the fastest terminals (still 80 x 24 though :-), and most important, a stimulating environment where I got to learn from other programmers. I also got access to a Unix system, learned C and shell programming, and at some point we had an Apple II (mostly remembered for hours of playing space invaders). I even got to implement a new (but very crummy) programming language!

All this time, programming was one of the most fun things in my life. I thought of ideas for new programs to write all the time. But interestingly, I wasn't very interested in using computers for practical stuff! Nor even to solve mathematical puzzles (except that I invented a clever way of programming Conway's Game of Life that came from my understanding of using logic gates to build a binary addition circuit).

What I liked most though was write programs to make the life of programmers better. One of my early creations was a text editor that was better than the system's standard text editor (which wasn't very hard :-). I also wrote an archive program that helped conserve disk space; it was so popular and useful that the data center offered it to all its customers. I liked sharing programs, and my own principles for sharing were very similar to what later would become Open Source (except I didn't care about licenses -- still don't :-).

As a term project I wrote a static analyzer for Pascal programs with another student. Looking back I think it was a horrible program, but our professor thought it was brilliant and we both got an A+. That's where I learned about parsers and such, and that you can do more with a parser than write a compiler.

I combined pleasure with a good cause when I helped out a small left-wing political party in Holland automate their membership database. This was until then maintained by hand as a collection of metal plates plates into which letters were stamped using an antiquated machine not unlike a steam hammer :-). In the end the project was not a great success, but my contributions (including an emulation of Unix's venerable "ed" editor program written in Cobol) piqued the attention of another volunteer, whose day job was as computer science researcher at the Mathematical Center. (Now CWI.)

This was Lambert Meertens. It so happened that he was designing his own programming language, named B (later ABC), and when I graduated he offered me a job on his team of programmers who were implementing an interpreter for the language (what we would now call a virtual machine).

The rest I have written up earlier in my Python history blog.

Friday, June 3, 2011

The depth and breadth of Python

As of late I'm noticing a trend: I'm spending more time having in-person in-depth conversations, and less time coding. While I regret the latter, I really enjoy the former. Certainly more than weekly meetings, code reviews, or bikeshedding email threads. (I'm not all that excited about blogging either, as you may have guessed; but some things just don't fit in 140 characters.)

Two conversations with visitors I particularly enjoyed this week were both with very happy Python users, and yet they couldn't be more different. This to me is a confirmation of Python's enduring depth and breadth: it is as far away of a one-trick language as you can imagine.

My first visitor was Annie Liu, a professor of computer science (with a tendency to theory :-) at Stony Brook University in New York State. During an animated conversation that lasted nearly three hours (and still she had more to say :-) she explained to me the gist of her research, which appears to be writing small Python programs that implement fundamental algorithms using set comprehensions, and then optimizing the heck out of it using an automated approach she summarized as the three I's: Iterate, incrementalize, and implement. While her academic colleagues laugh at her for her choice of such a non-theoretical language like Python, her students love it, and she seems to be having the last laugh, obtaining publication-worthy results that don't require advanced LaTeX skills, nor writing in a dead language like SETL (of which she is also a great fan, and which, via ABC, had some influence on Python -- see also below).

Annie told me an amusing anecdote about an inscrutable security standard produced by NiST a decade ago, with a fifty-page specification written in Z. She took a 12-page portion of it and translated it into a 120-line Python program, which was much more readable than the original, and in the process she uncovered some bugs in the spec!

Another anecdote she recounted had reached me before, but somehow I had forgotten about it until she reminded me. It concerns the origins of Python's use of indentation. The anecdote takes place long before Python was created. At an IFIP working group meeting in a hotel, one night the delegates could not agree about the best delimiters to use for code blocks. On the table were the venerable BEGIN ... END, the newcomers { ... }, and some oddities like IF ... FI and indentation. In desperation someone said the decision had to be made by a non-programmer. The only person available was apparently Robert Dewar's wife, who in those days traveled with her husband to these events. Despite the late hour, she was called down from her hotel room and asked for her independent judgement. Immediately she decided that structuring by pure indentation was the winner. Now, I've probably got all the details wrong here, but apparently Lambert Meertens was present, who went on to design Python's predecessor, ABC, though at the time he called it B (the italics meant that B was not the name of the language, but the name of the variable containing the name of the language). I checked my personal archives, and the first time I heard this was from Prof. Paul Hilfinger at Berkeley, who recounted a similar story. In his version, it was just Lambert Meertens and Robert Dewar, and Robert Dewar's wife chose indentation because she wanted to go to bed. Either way it is a charming and powerful story. (UPDATE: indeed the real story was quite different.)

Of course Annie had some requests as well. I'll probably go over these in more detail on python-ideas, but here's a quick rundown (of what I could remember):
  • Quantifiers. She is really longing for the "SOME x IN xs HAS pred" notation from ABC (and its sibling "EACH x IN xs HAS pred"), which superficially resemble Python's any() and all() functions, but have the added semantics of making x available in the scope executed when the test succeeds (or fails, in the case of EACH -- then x represents a counterexample).
  • Type declarations. (Though I think she would be happy with Python 3 function annotations, possibly augmented with the attribute declarations seen in e.g. Django and App Engine's model classes.)
  • Pattern matching, a la Erlang. I have been eying these myself from time to time; it is hard to find a syntax that really shines, but it seems to be a useful feature.
  • Something she calls labels or yield points. It seems somewhat similar to yield statements in generators, but not quite.
  • She has only recently begun to look at distributed algorithms (she had some Leslie Lamport anecdotes as well) and might prefer sets to be immutable after all. Though that isn't so clear; her work so far has actually benefited from mutating sets to maintain some algorithmic invariant. (The "incrementalize" of the three I's actually refers to a form of "differentiation" of expressions that produce a new set for each input.)
The contrast with my visitor the next day couldn't be greater. Through a former colleague I got an introduction to Drew Houston, co-founder and CEO of the vastly successful start-up company Dropbox. Dropbox currently has 25 million users, stores petabytes of data on Amazon S3, is profitable, and is not for sale. Drew is an easygoing MIT graduate who is equally comfortable discussing custom memory allocators, the world of venture capitalism, and how to keep engineers happy; he likes hard problems and winning.

Python plays an important role in Dropbox's success: the Dropbox client, which runs on Windows, Mac and Linux (!), is written in Python. This is key to the portability: everything except the UI is cross-platform. (The UI uses a Python-ObjC bridge on Mac, and wxPython on the other platforms.) Performance has never been a problem -- understanding that a small number of critical pieces were written in C, including a custom memory allocator used for a certain type of objects whose pattern of allocation involves allocating 100,000s of them and then releasing all but a few. Before you jump in to open up the Dropbox distro and learn all about how it works, beware that the source code is not included and the bytecode is obfuscated. Drew's no fool. And he laughs at the poor competitors who are using Java.

Next Monday I'm having lunch with another high-tech enterpreneur, a Y-combinator start-up founder using (and contributing to) App Engine. Maybe I should just cancel all weekly meetings and sign off from all mailing lists and focus on two things: meeting Python users and coding. That's the life!

Monday, January 24, 2011

Asynchronous RPC in App Engine Today

While I was laying the groundwork for a new datastore client library with support for asynchronous requests, I added some low-level support for asynchronous RPCs that you can use today. The only App Engine API with documented support for asynchronous RPCs is urlfetch, and it happens to be quite useful with that.

Suppose you want to fetch some data from a remote service. The remote service has two instances, both of which are slightly flaky. What you want to do is send off requests to both servers simultaneous (this is the easy part) and then wait for the first one to give you a result. The latter uses the new API that I'm about to describe here.

from google.appengine.api import urlfetch, apiproxy_stub_map

urls = ['http://service1.com', 'http://service2.com'] # Etc.

rpcs = []
for url in urls:
rpc = urlfetch.create_rpc(deadline=1.0)
urlfetch.make_fetch_call(rpc, url)
rpcs.append(rpc)

rpc = apiproxy_stub_map.UserRPC.wait_any(rpcs)
# Now rpc is the first rpc that returned a result. Have at it!

That's all! If you're interested in learning more about this handy class method, just check out its docstring in the App Engine SDK. Note that technically you should loop until it doesn't return None.

You can also repeatedly call wait_any() to get subsequent result. Make sure to remove the rpc it returns (if any) from the list, since otherwise it will return the same rpc over and over again: the specification of wait_any() says it returns the first rpc in the given list that completes, regardless of whether you have seen it before.

Also note that there currently is no way to cancel the other RPCs, which is why I passed a low deadline to the create_rpc() call. The problem is that even if you completely ignore the other RPCs, the App Engine runtime still waits for them to finish or timeout.

Finally, there is also a similar class method UserRPC.wait_all(), which waits until all RPCs in the list you pass it are complete. (It doesn't return anything.)

PS. Don't look too closely at the implementation of these methods. It may change as we think of a better way to do it. But we're committed to the API.