Thursday, December 18, 2008

The Emperor's New Mind

I recently re-read Roger Penrose's The Emperor's New Mind. I read it first in 1990, soon after it came out, and I still find it a stunning book, despite disagreeing with Penrose's speculations about consciousness somehow being boosted by quantum effect.

What makes the book so great in my view is that, in order to make his point, he has written an incredibly accessible textbook about a large number of topics in mathematics, physics and computer science. Penrose doesn't dumb down anything: he assumes the reader is an intelligent layperson, and explains each topic without compromise (though without getting into the technicalities of proofs).

Mathematical/computer topics include the theories of computability and algorithm complexity, Gödel's theorem, the Mandelbrot set, Turing machines (there's an appending with several pages of ones and zeros encoding a universal Turing machine -- I haven't checked it, but I'm sure that Penrose did), the Church-Turing thesis, and so on. In physics, he explains Maxwell's theory of electromagnetism (this was the first time I understood it), Phase space, special and general relativity, quantum theory, cosmology, black holes, space-time singularities and more. The table of contents reads like an overview of modern "hard" science. In the mean time Penrose also manages to explain Penrose tiling (the nonperiodic tiling of the plane, which he invented in the 1970s), the physics of nerve signals, Platonism, and important topics from the modern history of mathematics like Hilbert's program and Bertrand Russel's Principia Mathematica.

It's not surprising that Penrose shows prominently in the acknowledgments for Neil Stephenson's novel Anathem (in fact finding it there made me pick up my copy and start re-reading it).

Despite all this, I disagree with Penrose's claims about consciousness. Penrose does a good job separating his opinions on this topic from the science he presents, and in the end he cannot prove his hunches about consciousness any more than I can mine. Personally, I think that evolution is a perfectly fine explanation for our brains -- in other words, Richard Dawkins did a better job convincing me than Penrose. (I cannot recommend Dawkins' books enough -- you can skip The God Delusion though, just stick to his many excellent works about evolution and memes.)

Monday, December 8, 2008

In the Cloud or Not?

Some people love the cloud. (As in cloud computing, e.g. Google App Engine, Amazon EC2, Microsoft Azure.) Others hate it. They gripe about lock-in, proprietary APIs and so on. (I would provide links to examples of both attitudes, but I don't have time right now, and you can fill those in yourself easily. :-)

I wonder if, apart from the field being young, the differences of opinion may be similar to the different attitude towards home ownership: some folks hate renting, citing landlord conflicts, noisy neighbors and so on. Others hate home ownership, due to the outsize financial commitment and risk (all too clear to many these days), the never-ending maintenance (new roof, new fence, new furnace, new bathroom, new kitchen, it never ends). I'm kind of in the middle myself, having had good landlords in the past, and disliking the maintenance effort/cost for my own home these days, but enjoying the independence.

Obviously cloud computing would be more similar to renting, while traditional datacenter ownership to home ownership (though without the aspect of building up wealth through ownership :-). Someone else can take the analogy further, and compare different styles of cloud services to different ways landlords can run their business. (E.g. with Google App Engine you get carpet and furniture as part of the deal, and meals delivered as an option, while Amazon EC2 rents out bare concrete units where you can do as you please. There's a market for both.)

If that's the case, we should expect that the love/hate posts will never stop, and we'll never convince all haters to love the cloud. But there will be plenty of business for the landlords from those who prefer not to own their own servers, and we might as well cater to them rather than be discouraged by the cloud haters.

Friday, November 21, 2008

Scala?

I received a free copy of the Scala eBook by Martin Odersky, Lex Spoon, and Bill Venners. I tried to read it on my CalTrain commute, but ended up only skimming -- at 700+ pages it's a hefty tome. I'm also not in the primary target audience, as its focus seems to bounce between trying to entice Java programmers to try a better language, and an introduction to programming that happens to use Scala. I ended up going to the Scala website where I downloaded various other documents, such as a tutorial, a more technical overview paper, and the language reference.

Sad to say, I'm quite disappointed (in the language -- can't really say much about the quality of the eBook except that it's as big as the language). One of the first times I heard Scala mentioned it was a talk by Steve Yegge at Google I/O last May. Steve characterized Scala as "static typic's revenge" (I'm paraphrasing -- too lazy to watch the video to find that section) and went on to make fun of the incredible complexity of Scala's type system, which contains such esoteric concepts as type erasures, variance annotations, existential types, polymorphic method types, and many more (just scan the ToC of the language reference).

I have to agree with Steve -- if this is what it takes to have compile-time type-safety in a language, I'll take dynamic typing any day. There's got to be a better way -- perhaps Haskell? Haskell is a pure functional language with a fast implementation that seems to have solved the I/O problem of functional languages well -- while its Monads are truly deep and take real work to create, their use seems quite straightforeward.

Perhaps my biggest disappointment in Scala is that they have so many rules to make it possible to write code that looks straightforward, while being anything but -- these rules all seem to have innumerable exceptions, making it hard to know when writing simple code will work and when not. For example, the Scala book explains that you can use multiple 'if' qualifiers in its for-statement, but points out that you have to insert a semicolon in a position where you'd expect it to be inferred by the parser. To make matters even more mysterious, if you were to use curly braces instead of parentheses, you can omit the semicolon. All this is presented without any explanation. I found the explanation by combining two seemingly unrelated rules in the reference manual: first, the for-statement (technically called for-comprehension) allows both braces or parentheses with exactly the same semantics, but the rules for inferring semicolons are different inside braces than they are inside parentheses. Obvious, eh! :-)

Friday, November 14, 2008

Overheard

"All you can do with a shell script is make it worse. But since this is Python, you can make it better."

Monday, November 10, 2008

Yes I am a sore loser

I'm still devastated and baffled by the passing of California Proposition 8, the gay marriage ban.

What is the rationale behind a Yes vote? I cannot believe over half of Californians simply do what their pastor says. Actually, I cannot even believe that half the pastors in California are against gay marriage -- this is not a black and white issue even in religious circles, no matter what some claim the bible says. I'd like to hear from people who voted Yes and can explain their rationale. (So far the only pro-Prop-8 comments I've seen were slogans that fit on a bumper sticker but do not explain anything.)

And how is it possible that something called a state's constitution can be changed with a single, simple majority vote? In fact I would say it's not even a majority since a significant percentage of Californians eligible to vote didn't vote. Does this mean that in two years we can remove that amendment from the Californian constitution (or amend it out of existence) with a new referendum that is also approved by 50% plus one vote? That's not much of a constitution if you ask me! (In contrast, to change the Dutch constitution, both chambers of two successive parliaments must approve of the change, with a two-third majority.)

Thursday, November 6, 2008

Cisco Developer Contest

This hasn't had enough attention yet: Cisco is inviting application developers who "think outside the box", to innovate and promote the concept of the network as a platform. This is your opportunity to build exciting Linux based applications on the Cisco Application Extension Platform (AXP), and win a share of the total prize pool valued at US $100,000.

Read more at Cisco's contest site.

Monday, November 3, 2008

Bibles or computers -- it's the same thing

This subject was on my list to blog eventually; but this clipping from the weekly OLPC mailing made it relevant:
Antonio Battro presented Pope Benedict XVI with an XO on Friday at the Vatican. The occasion was a papal audience for the Pontifical Academy of Sciences, of which Antonio is a member. They spoke about OLPC's philosophy and objectives in the developing world. Benedict seemed deeply pleased by our work.
I'm not surprised that the pope is pleased by the OLPC program. The mentality from which it springs is the same mentality which in past centuries created the missionary programs. The idea is that we, the west, know what's good for the rest of the world, and that we therefore must push our ideas onto the "third world" by means of the most advanced technology available. In past centuries, that was arguably the printing press, so we sent missionaries armed with stacks of bibles. These days, we have computers, so we send modern missionaries (of our western lifestyle, including consumerism, global warming, and credit default swaps) armed with computers.

It's the same thing, really.

Thursday, October 30, 2008

Vote for Obama, No on California Prop 8

I'm not a US citizen, so I can't vote in US elections (though I pay taxes like the best of them!), but my wife is. We have two signs in our front yard this week. One simply says "Obama", the other "No on Prop 8". I'd like to put up the equivalent of those signs in my blog.

If you're eligible to vote in the US, do me and the rest of the world a favor and vote for Obama. If you're voting in California, please help to reject Proposition 8, which is a blatant curtailing of freedom, equality and happiness for gay Californians. Defeating Prop 8 in California will pave the way for equal rights in other states and eventually all of the US. Don't make me ashamed of living here!

PS. If you think you have no time to vote, think again.

Wednesday, October 29, 2008

What makes me feel good

Whenever I feel down, I look at the TIOBE programming community index, and it makes me feel better. :-)

Monday, October 27, 2008

Questions Answered

I have now answered the top 20 questions in my section of "Ask a Google Engineer". Many of the remaining ones sound inappropriate or unanswerable, so I don't expect I'll be answering them, unless the popular vote really brings some of them to the top.

Sunday, October 26, 2008

Why explicit self has to stay

Bruce Eckel has blogged about a proposal to remove 'self' from the formal parameter list of methods. I'm going to explain why this proposal can't fly.

Bruce's Proposal

Bruce understands that we still need a way to distinguish references to instance variables from references to other variables, so he proposes to make 'self' a keyword instead. Consider a typical class with one method, for example:
class C:
def meth(self, arg):
self.val = arg
return self.val
Under Bruce's proposal this would become:
class C:
def meth(arg): # Look ma, no self!
self.val = arg
return self.val
That's a saving of 6 characters per method. However, I don't believe Bruce proposes this so that he has to type less. I think he's more concerned about the time wasted by programmers (presumably coming from other languages) where the 'self' parameter doesn't need to be specified, and who occasionally forget it (even though they know better -- habit is a powerful force). It's true that omitting 'self' from the parameter list tends to lead to more obscure error messages than forgetting to type 'self.' in front of an instance variable or method reference. Perhaps even worse (as Bruce mentions) is the error message you get when the method is declared correctly but the call has the wrong number of arguments, like in this example given by Bruce:
Traceback (most recent call last):
File "classes.py", line 9, in
obj.m2(1)
TypeError: m2() takes exactly 3 arguments (2 given)
I agree that this is confusing, but I would rather fix this error message without changing the language.

Why Bruce's Proposal Can't Work

Let me first bring up a few typical arguments that are brought in against Bruce's proposal.

There's a pretty good argument to make that requiring explicit 'self' in the parameter list reinforces the theoretical equivalency between these two ways of calling a method, given that 'foo' is an instance of 'C':
foo.meth(arg) == C.meth(foo, arg)


Another argument for keeping explicit 'self' in the parameter list is the ability to dynamically modify a class by poking a function into it, which creates a corresponding method. For example, we could create a class that is completely equivalent to 'C' above as follows:
# Define an empty class:
class C:
pass

# Define a global function:
def meth(myself, arg):
myself.val = arg
return myself.val

# Poke the method into the class:
C.meth = meth
Note that I renamed the 'self' parameter to 'myself' to emphasize that (syntactically) we're not defining a method here. Now instances of C have a method with one argument named 'meth' that works exactly as before. It even works for instances of C that were created before the method was poked into the class.

I suppose that Bruce doesn't particularly care about the former equivalency. I agree that it's more of theoretical importance. The only exception I can think of is the old idiom for calling a super method. However, this idiom is pretty error-prone (exactly due to the requirement to explicitly pass 'self'), and that's why in Python 3000 I'm recommending the use of 'super()' in all cases.

Bruce can probably think of a way to make the second equivalency work -- there are some use cases where this is really important. I don't know how much time Bruce spent thinking about how to implement his proposal, but I suppose he is thinking along the lines of automatically adding an extra formal parameter named 'self' to all methods defined directly inside a class (I have to add 'directly' so that functions nested inside methods are exempted from this automatism). This way the first equivalency can be made to hold still.

However, there's one situation that I don't think Bruce can fix without adding some kind of ESP to the compiler: decorators. This I believe is the ultimate downfall of Bruce's proposal.

When a method definition is decorated, we don't know whether to automatically give it a 'self' parameter or not: the decorator could turn the function into a static method (which has no 'self'), or a class method (which has a funny kind of self that refers to a class instead of an instance), or it could do something completely different (it's trivial to write a decorator that implements '@classmethod' or '@staticmethod' in pure Python). There's no way without knowing what the decorator does whether to endow the method being defined with an implicit 'self' argument or not.

I reject hacks like special-casing '@classmethod' and '@staticmethod'. I also don't think it would be a good idea to automagically decide whether something is supposed to be a class method, instance method, or static method from inspection of the body alone (as someone proposed in the comments on Bruce's proposal): this makes it harder to tell how it should be called from the 'def' heading alone.

In the comments I saw some pretty extreme proposals to save Bruce's proposal, but generally at the cost of making the rules harder to follow, or requiring deeper changes elsewhere to the language -- making it infinitely harder to accept the proposal as something we could do in Python 3.1. For 3.1, by the way, the rule will be once again that new features are only acceptable if they remain backwards compatible.

The one proposal that has something going for it (and which can trivially be made backwards compatible) is to simply accept
def self.foo(arg): ...

inside a class as syntactic sugar for
def foo(self, arg): ...

I see no reason with this proposal to make 'self' a reserved word or to require that the prefix name be exactly 'self'. It would be easy enough to allow this for class methods as well:
@classmethod
def cls.foo(arg): ...
Now, I'm not saying that I like this better than the status quo. But I like it a lot better than Bruce's proposal or the more extreme proposals brought up in the comments to his blog, and it has the great advantage that it is backward compatible, and can be evolved into a PEP with a reference implementation without too much effort. (I think Bruce would have found out the flaws in his own proposal if he had actually gone through the effort of writing a solid PEP for it or trying to implement it.)

I could go on more, but it's a nice sunny Sunday morning, and I have other plans... :-)

Wednesday, October 22, 2008

Sorting a million 32-bit integers in 2MB of RAM using Python

Someone jokingly asked me how I would sort a million 32-bit integers in 2 megabytes of RAM, using Python. Taking up the challenge, I leared something about buffered I/O.

Obviously this is a joke question -- the data alone would take up 4 megabytes, assuming binary encoding! But there's a possible interpretation: given a file containing a million 32-bit integers, how would you sort them with minimal memory usage? This would have to be some kind of merge sort, where small chunks of the data are sorted in memory and written to a temporary file, and then the temporary files are merged into the eventual output area.

Here is my solution. I'll annotate it in a minute.

NOTE: All my examples use Python 3.0. The main difference in this case is the use of file.buffer to access the binary stream underlying the text stream file.

#!/usr/bin/env python3.0
import sys, array, tempfile, heapq
assert array.array('i').itemsize == 4

def intsfromfile(f):
while True:
a = array.array('i')
a.fromstring(f.read(4000))
if not a:
break
for x in a:
yield x

iters = []
while True:
a = array.array('i')
a.fromstring(sys.stdin.buffer.read(40000))
if not a:
break
f = tempfile.TemporaryFile()
array.array('i', sorted(a)).tofile(f)
f.seek(0)
iters.append(intsfromfile(f))

a = array.array('i')
for x in heapq.merge(*iters):
a.append(x)
if len(a) >= 1000:
a.tofile(sys.stdout.buffer)
del a[:]
if a:
a.tofile(sys.stdout.buffer)
On my Google desktop (a 3 year old PC running a Googlified Linux, rating about 34000 Python 3.0 pystones) this took about 5.4 seconds to run, with an input file containing exactly 1,000,000 32-bit random integers. That's not so bad, given that a straightforward in-memory sort of the same input took about 2 seconds:
#!/usr/bin/env python3.0
import sys, array
a = array.array('i', sys.stdin.buffer.read())
a = list(a)
a.sort()
a = array.array('i', a)
a.tofile(sys.stdout.buffer)
Back to the merge-sort solution. The first three lines are obvious:
#!/usr/bin/env python3.0
import sys, array, tempfile, heapq
assert array.array('i').itemsize == 4
The first line says we're using Python 3.0. The second line imports the modules we're going to need. The third line here makes it break on those 64-bit systems where the 'i' typecode doesn't represent a 32-bit int; I am making no attempts to write this code portably.

Then we define a little helper that is a generator which reads integers from a file and yields them one at a time:
def intsfromfile(f):
while True:
a = array.array('i')
a.fromstring(f.read(4000))
if not a:
break
for x in a:
yield x
This is where the performance tuning of the algorithm takes place: it reads up to 1000 integers at a time, and yields them one by one. I had originally written this without buffering -- it would just read 4 bytes from the file, convert them to an integer, and yield the result. But that ran about 4 times as slow! Note that we can't use a.fromfile(f, 1000) because the fromfile() method complains bitterly when there aren't enough values in the file, and I want the code to adapt automatically to however many integers are on the file. (It turns out we write about 10,000 integers to a typical temp file.)

Next we have the input loop. This repeatedly reads a chunk of 10,000 integers from the input file, sorts them in memory, and writes them to a temporary file. We then add an iterator over that temporary file, using the above intsfromfile() function, to a list of iterators that we'll use in the subsequent merge phase.

iters = []
while True:
a = array.array('i')
a.fromstring(sys.stdin.buffer.read(40000))
if not a:
break
f = tempfile.TemporaryFile()
array.array('i', sorted(a)).tofile(f)
f.seek(0)
iters.append(intsfromfile(f))
Note that for an input containing a million values, this creates 100 temporary files each containing 10,000 values.

Finally we merge all these files (each of which is sorted) together. The heapq module has a really nice function for this purpose: heapq.merge(iter1, iter2, ...) returns an iterator that yields the input values in order, assuming each input itself yields its values in order (as is the case here).

a = array.array('i')
for x in heapq.merge(*iters):
a.append(x)
if len(a) >= 1000:
a.tofile(sys.stdout.buffer)
del a[:]
if a:
a.tofile(sys.stdout.buffer)
This is another place where buffered I/O turned out to be essential: Writing each individual value to a file as soon as it is available slows the algorithm down about twofold. Thus, by simply adding input and output buffering, we gained a tenfold speed-up!

And that's the main moral of the story.

Another lesson is praise for the heapq module, which contains the iterator merge functionality needed in the output phase. Also, let's not forget the utility of the array module for managing binary data.

And finally, let this example remind you that Python 3.0 is notso different from Python 2.5!

Friday, October 10, 2008

About This Blog

You may wonder why I started a new blog, where I already have a perfectly fine blog on Artima. I am currently wondering too, since somehow my very first entry was flagged as spam by the helpful Blogger spam detection software, and now claims I have to request a manual review to prove I'm not a spammer. I'm guessing that I either used too many long, difficult words referring to schools of philosophy -- I've seen splogs containing long strings of text copied from philosophy textbooks. Or perhaps I quoted too many nonsense words from Anthem. Anyway, I've requested two reviews so far, but still nada. Fingers crossed.

As for the reason to start a new blog, the Artima blog is meant to be technical stuff about Python -- Artima in general is a site about OO programming, and I wanted to write about other stuff that interests me. I have a slew of topics lined up: other books I've read, more about my views on consciousness (with a lowercase 'c' :-), and a silly idea I call "planet of the nerds": what if Captain Kirk ran into a planet whose entire population was made up of nerds with IQs of 250 and no social skills? Would such a society even survive?

Anyway, this is just a test -- Blogger claims I cannot make new posts. Let's see if that's true!

Sunday, October 5, 2008

Thoughts after reading Neal Stephenson's Anathem

I just finished reading Neal Stephenson's new novel, Anathem. I don't know where to start: it's a fascinating story, if occasionally slow, but that seems unavoidable given the deep ideas Stephensons expose. I'm not an avid Science Fiction reader, though I don't avoid the genre. There's just so much else to read about! (And I'm a slow reader, especially in English, which is not my first language.) But I like Stephenson's work, which I first encountered with Cryptonomicon. I ate up the Baroque Cycle, and when Stephenson showed up at Google a few weeks ago to discuss Anathem, I tuned in from the SF office. Hearing him talk about his ideas made me want to read the book even more -- but I decided to do some "homework" first, and bought Snow Crash in the same Amazon order as Anathem, and read it first.

(Warning: mild spoilers may be ahead.)

Snow Crash is fascinating, but it rings a bit hollow -- in the end I felt like it was mostly a superbly crafted fantastic adventure story. The "deep ideas" in that book felt a bit fake: the technical details of how the Metaverse works make little sense from what I know about virtual reality systems (this was corroborated by a friend who worked in the game industry for 25 years), and the idea of an "ur-language" that is like the low-level software of our brains strikes me as nonsense from what I've read about the functioning of our brains (see note above :-). The idea of a virus or anti-virus that can be fed into our brains by appealing directly to this low-level software of hackers' brains then becomes rather bizarre, and the idea of feeding this same information into non-hackers' brains via the blood utter nonsense. So then we're left with a great, dark view of a possible future where many phenomena of modern society, from the mafia's code of honor to suburbs full of bored teenagers, have become fantastically exaggerated, and a lot of cool wish-it-could-be technology, from the Metaverse to skateboards and motorcycles with "smart wheels" (why doesn't the Wikipedia article mention those?). I find it hard to believe that there are many women who would enjoy this book, it's the ultimate testosterone fantasy.

Fast-forward to Anathem. Stephenson has gotten a lot better. It's more likely that women might appreciate this book (and not just because in the end the couple "get" each other and clearly will live happily ever after). As long as they're into philosophy and not afraid of long detailed explanations of space technology. Ok, so maybe it's still mostly a guy's thing (please prove me wrong!) but it's a lot less testosterone-laden, and the technology and "deep ideas" presented are a lot more realistic.

I find it interesting that in Snow Crash the (near) future has gotten completely out of hand, while in Anathem a world is depicted that develops more or less cyclically for thousands of years: after the high point of technology (roughly, today's age) things stay more or less the same, society and prosperity going down and up cyclically without much news being added. Stephenson's explanation for this is that most people prefer to deal with technology they can understand and tinker with, like internal combustion engines, rather than the more advanced space-age stuff that most Sci-Fi authors (including Stephenson, see Snow Crash) love to make up. Although nobody seems to object to the ubiquity of cell phones connected to the Internet -- they are apparently too useful (for society, or for the plot) to ban.

I loved the puzzles for the reader, small and large, that Stephenson spreads throughout the book. I recall that in Cryptonomicon he often starts a new chapter with a minute, detailed description of an everyday phenomenon from an unexpected vantage point or using precise scientific language, which often made me chuckle. There is some of that in Anathem, but the main class of puzzles is the introduction of new vocabulary (ostensibly because the world depicted is not Earth) for familiar things. E.g. ark for church, drummon, fetch and mobe for freight truck, pick-up truck and car, syntactic device or syndev for computer, praxis for engineering, Andrakhonic theorem for Pythagorean theorem, and Gardan’s Steelyard for Occam’s Razor. (I found one place where Stephenson apparently forgot to apply global substitution and used the word "computer-generated" instead of syndev-generated.) In general the new words are remarkably easy to remember once you've figured it out -- sometimes it helps to know Latin or French (for example fraa and suur are clearly from frère and soeur).

The only stuff that's hard is the names for the various philosophical schools -- probably because I'm not really all that well-schooled in this. I can recognize the references to Plato, but that's about it. There are probably tons of references to philosophic schools of the past two centuries, but I don't know my Nietzche from my Kant, and I'm not all that well-versed in interpretations of quantum mechanics. Except I did recognize the idea of consciousness as a quantum computer as originating with Roger Penrose -- I read his book long ago, and decided I simply disagreed. The whole idea of merging together world lines in such a way that the history of one world appears in the present of another sounds like a Sci-Fi device of the same nature as travel faster than light: useful as a plot device, but not realistic.

Yes, of course it's fiction! But Stephenson really does seem to try to make everything else realistic -- the ideas he presents have roots in real science, and so does much of the technology, from the "long now" clock to the electrodynamic tether propulsion device (both mentioned in the acknowledgements). So I guess at this point we (Stephenson and I) simply disagree on the correct interpretation of quantum mechanics, which is not something to be particularly worried about, as it's all highly speculative in the real scientific world too.

When we're talking about the nature of consciousness, I'm definitely in the camp of Douglas Hofstadter, who makes (what I think is) an eloquent point that it's just a bunch of extremely sophisticated software running on highly developed hardware, both evolved over a very long time, starting with the earliest animals that were capable of making decisions based on sensory input. I recently bought and read Hofstadter's I am a Strange Loop and loved most of it, although I had to skip some of the sentimental stuff; it made me pull out Gödel, Escher Bach and The Mind's I (both of which I'd read when I was about half my current age) and re-read much of them, enjoying them just as much as the first time (and probably understanding a lot more).

My only quibble with Hofstadter is his insistence on the importance of self-referentiality of consciousness. While I appreciate the relevance of introspection, and love the idea that the algorithms for searching our (various kinds of) memory are derived from the pattern matching algorithms originating in our sensory processing (which clearly was a precursor of consciousness), I don't believe that in order to be called "intelligent" a piece of software would necessarily have to have a concept of "I" that resembles that of a person. (And no, I don't think we're anywhere near developing such intelligent software yet -- we don't know enough about the most basic stuff like perception. For that matter, I think that the whole "Singularity" idea promoted by Ray Kurzweil is bulshytt, as the inhabitants of Stephenson's world would say.)

But I do agree with Hofstadter that there is no a priori reason why sufficiently complex (presumably layered) software couldn't be developed that has representations of real-world concepts that can be queried and updated and connected in ways very similar to the way this apparently happens in our brains. Personally, I think that one of the keys to the development of such software is better understanding of our preceptional mechanisms. Which is a long-winded segue into the work of Oliver Sacks, whose works I've also been reading recently. (I am about to dive into Musicophilia, his latest work.) Sacks writes yarns that are as accessible (or more) as Stephenson's, and almost as exciting (for me anyways), with the added value that they are science, not fiction.

I wish I could work in a reference to Richard Dawkins, whose ideals on evolution also seem highly relevant. (Even though to a 3rd generation atheist like myself his rant against organized religion seems a little over the top.) But I'll save that for later.