Sunday, October 26, 2008

Why explicit self has to stay

Bruce Eckel has blogged about a proposal to remove 'self' from the formal parameter list of methods. I'm going to explain why this proposal can't fly.

Bruce's Proposal

Bruce understands that we still need a way to distinguish references to instance variables from references to other variables, so he proposes to make 'self' a keyword instead. Consider a typical class with one method, for example:
class C:
def meth(self, arg):
self.val = arg
return self.val
Under Bruce's proposal this would become:
class C:
def meth(arg): # Look ma, no self!
self.val = arg
return self.val
That's a saving of 6 characters per method. However, I don't believe Bruce proposes this so that he has to type less. I think he's more concerned about the time wasted by programmers (presumably coming from other languages) where the 'self' parameter doesn't need to be specified, and who occasionally forget it (even though they know better -- habit is a powerful force). It's true that omitting 'self' from the parameter list tends to lead to more obscure error messages than forgetting to type 'self.' in front of an instance variable or method reference. Perhaps even worse (as Bruce mentions) is the error message you get when the method is declared correctly but the call has the wrong number of arguments, like in this example given by Bruce:
Traceback (most recent call last):
File "classes.py", line 9, in
obj.m2(1)
TypeError: m2() takes exactly 3 arguments (2 given)
I agree that this is confusing, but I would rather fix this error message without changing the language.

Why Bruce's Proposal Can't Work

Let me first bring up a few typical arguments that are brought in against Bruce's proposal.

There's a pretty good argument to make that requiring explicit 'self' in the parameter list reinforces the theoretical equivalency between these two ways of calling a method, given that 'foo' is an instance of 'C':
foo.meth(arg) == C.meth(foo, arg)


Another argument for keeping explicit 'self' in the parameter list is the ability to dynamically modify a class by poking a function into it, which creates a corresponding method. For example, we could create a class that is completely equivalent to 'C' above as follows:
# Define an empty class:
class C:
pass

# Define a global function:
def meth(myself, arg):
myself.val = arg
return myself.val

# Poke the method into the class:
C.meth = meth
Note that I renamed the 'self' parameter to 'myself' to emphasize that (syntactically) we're not defining a method here. Now instances of C have a method with one argument named 'meth' that works exactly as before. It even works for instances of C that were created before the method was poked into the class.

I suppose that Bruce doesn't particularly care about the former equivalency. I agree that it's more of theoretical importance. The only exception I can think of is the old idiom for calling a super method. However, this idiom is pretty error-prone (exactly due to the requirement to explicitly pass 'self'), and that's why in Python 3000 I'm recommending the use of 'super()' in all cases.

Bruce can probably think of a way to make the second equivalency work -- there are some use cases where this is really important. I don't know how much time Bruce spent thinking about how to implement his proposal, but I suppose he is thinking along the lines of automatically adding an extra formal parameter named 'self' to all methods defined directly inside a class (I have to add 'directly' so that functions nested inside methods are exempted from this automatism). This way the first equivalency can be made to hold still.

However, there's one situation that I don't think Bruce can fix without adding some kind of ESP to the compiler: decorators. This I believe is the ultimate downfall of Bruce's proposal.

When a method definition is decorated, we don't know whether to automatically give it a 'self' parameter or not: the decorator could turn the function into a static method (which has no 'self'), or a class method (which has a funny kind of self that refers to a class instead of an instance), or it could do something completely different (it's trivial to write a decorator that implements '@classmethod' or '@staticmethod' in pure Python). There's no way without knowing what the decorator does whether to endow the method being defined with an implicit 'self' argument or not.

I reject hacks like special-casing '@classmethod' and '@staticmethod'. I also don't think it would be a good idea to automagically decide whether something is supposed to be a class method, instance method, or static method from inspection of the body alone (as someone proposed in the comments on Bruce's proposal): this makes it harder to tell how it should be called from the 'def' heading alone.

In the comments I saw some pretty extreme proposals to save Bruce's proposal, but generally at the cost of making the rules harder to follow, or requiring deeper changes elsewhere to the language -- making it infinitely harder to accept the proposal as something we could do in Python 3.1. For 3.1, by the way, the rule will be once again that new features are only acceptable if they remain backwards compatible.

The one proposal that has something going for it (and which can trivially be made backwards compatible) is to simply accept
def self.foo(arg): ...

inside a class as syntactic sugar for
def foo(self, arg): ...

I see no reason with this proposal to make 'self' a reserved word or to require that the prefix name be exactly 'self'. It would be easy enough to allow this for class methods as well:
@classmethod
def cls.foo(arg): ...
Now, I'm not saying that I like this better than the status quo. But I like it a lot better than Bruce's proposal or the more extreme proposals brought up in the comments to his blog, and it has the great advantage that it is backward compatible, and can be evolved into a PEP with a reference implementation without too much effort. (I think Bruce would have found out the flaws in his own proposal if he had actually gone through the effort of writing a solid PEP for it or trying to implement it.)

I could go on more, but it's a nice sunny Sunday morning, and I have other plans... :-)

36 comments:

Matt Wilson said...

I like the explicit "self".

Anyhow, right now, the functools.partial takes a perfectly good regular method on a class and then turns it into a staticmethod.

Is this unavoidable? Is this a bug? Is there some way I can manually undo this effect, and add self back into my function?

Stuart Langridge said...

One thing that might be useful is to throw out a warning if a method is defined and the first parameter to it isn't called self. I know that "self" is only a convention, but *everyone* does it. The one or two people who decide to maliciously call their "self" parameter "this" can just block the warning, no? And warnings aren't fatal anyway. It'd annoy about 9 people and it'd save me about fifty times a day when I forget to put self at the beginning of a method's parameter list...

Georg said...

@stuart: Since people are free to call the argument whatever they like, Python would be enforcing a convention, and it has never done that. That's what tools like pylint are for. Pylint produces an error message 'Method should have "self" as first argument' for this case in its default configuration.

cratuki said...

I think it would be detrimental to have a fixed 'self' for methods (or to encourage that name through warnings) because it hampers nesting of classes within other class definitions. I had trouble thinking of an example of where you'd want to do this that isn't evil and may have failed:

class Example(object):
    def blah_method(self):
        self.counter=0;
        class Registers(object):
            def __init__(self_i):
                self.counter = self.counter+1
                self_i.id = self.counter
                self.another()
        Registers()
        Registers()
    def another(self):
        print 'another'

shevegen said...

Personally I do not especially appreciate "self". My bigger complaint though is that it can be an arbitrary name, and at least one project (conary) uses "r" instead of "self". I assume because it is shorter to type. And this is my main complaint - why is it possible to use any name one wants to for it? Of course most will use self, but Python enforces a rather strict non-ambiguity "there should be one obvious and easy way" ruleset, and I believe if in this case it is open for a change, in other cases it could be considered to change as well. To me there is no real big conceptual difference between a parser interpreting something as an error, as opposed to a "convention" which we could change at our own discretion - but let's face it, in the case for self, I claim that about 95% of every python writer will call it "self".

(Note though - I dont really complain per se, it will be much more interesting once most people will be using "Python 3000" or whatever name it gets to have.)

codditor said...

"The one proposal that has something going for it (and which can trivially be made backwards compatible) is to simply accept
def self.foo(arg): ..."

+1.

edcrypt said...

The "def self.foo(arg):" syntax makes me remember of the method definition syntax of Prothon (a dead Python-like prototyped language). By the way, someone could ressurect this idea. A language with Python's syntax and Io (iolanguage.com) prototyped semantics would be interesting to see.

tef said...

Being able to reason about the lifetime of self is why I like it being in the argument list.

In other languages, this or self is late bound within the language, and so you now have two rules - one for self and one for arguments.

I found that in javascript, it was awkward and clunky to use inner classes, or refer to the class within closures.

On the other hand foo.bar(a,b,c) being syntaxically equivilent for bar(foo,a,b,c) in both function calls *and* function definitions does seem like an neat proposal.

Daniel said...

"@classmethod
def cls.foo(arg): ..."

Would'nt the decorator in this case be redundant with the cls in cls.foo?

@Georg
True, but on the other side, python should have some kind f optional 'teacher'-mode, in which it will bring such warnings and extra detailed exception-texts.

Terra said...

Personally I like self. At first when moving from java I didn't like it as it doesn't make any sense from a java perspective, but now that I've gotten use to it I like it.

I think that such stylistic measures once taken in a language should be stuck to as they define the language and the new features.

For example java has its style of explicitly defining things in order to prevent people from doing bad things. If you don't want that you shouldn't use java.

Likewise with python it to me is defined around a simple parser so I expect things to happen simply without any complex or especially clever logic. To remove self might be clever, but it would also be confusing as the simple philosophy becomes clouded with things that are not really necessary.

Leonardo Santagada said...

We don't need any more warning messages, we need to fix the one we have.

"
TypeError: m2() takes exactly 3 arguments (2 given)
"

could be:
"
TypeError: m2() takes exactly 3 arguments (2 given)
maybe you forgot the self argument
"

Okay, it is not the best message, but something like it should be good enough

PouleJapon said...

By the way,
map(Buddy.name, buddies)
looks slightly better than :
map(lambda buddy: buddy.name(), buddies)

wavesplash said...

Quoting from reddit:

redditrasberry 2 points 5 hours ago[-]
I guess it's an aesthetic thing and therefore hard to reason about (I completely agree that there's no gigantic practical import to it).
I think this is a case where once you use the language enough you no longer notice the blemish, but if you are an occasional user (as I am) it stands out like a sore thumb.
To make a bad analogy, it's like a stain on your carpet in your front hallway. If you live with it for long enough you won't even see it any more. But to visitors coming to your house it's the most obvious thing. And it's particularly noticeable because the rest of Python is so nice - it's like I'm visiting an art gallery and everything is beautiful and pristine, but there on the carpet at the entrance is this huge stain that nobody has ever cleaned up.


http://www.reddit.com/r/programming/comments/79h9y/guido_van_rossum_responds_to_bruce_eckel_why/

Michele Cella said...

I really like the explicit "self"... to the point that I hate the implicit "super".

monk.e.boy said...

I like self, the 'wrong number of arguments' error message to me now parses as 'you missed self again'

I like that it forces you to see objects and methods as non magical beasts.

Would a good editor that auto completes methods solve this problem? Hack IDLE to do this? :-)

BTW why is it 'self' and not 'this'?

M-MZ said...

The explicit self is wonderful. Instead if wondering why you have to type this in Python, I've always wondered why you don't have to in other languages. It takes away the implicit "this" magic. Self makes perfect sense.

porneL said...

Method poking is a non-issue. For functions defined outside class require explit self.

I'd realy like explicit self in class definitions to be removed. It makes OO in Python look like an ugly after-thought/hack.

ondavian said...

After having programmed in Python for quite a while, I actually miss the explicitness of self in other languages.

It really has strong "say what you mean" semantics -- taking a superficial glance at the code, you can at once see whether instance or non-instance variables are accessed. Compare this with using markup (prefixing _ to member vars) to relay the meaning -- you have to know the conventions to grasp the difference, i.e. there is an extra level in comprehension (aha, _foo denotes this.foo). It's not uncommon to see people use explicit this in Java and C++ as well for that particular reason.

I'd say most people who grok Python at the "idiomatic" level instead of using it as an easier way to write Java would be seriously upset if self is abandoned.

Hopefully nothing like that will ever happen.

Ralph said...

I like the explicit self, and don't think `self' should be mandated, e.g. `my' avoids some clutter. BTW, it seems the code that's colourising your Python is treating `self' as a reserved word! Did Bruce write it? :-)

opensores said...

Personally, I like explicit self. The general idea I get reading comments about it here and elsewhere is that people that are new to python generally don't like it and people that are used to python generally do.

That probably means something.

Kevin Dangoor said...

I like explicit self, too. Just thought I'd point out that Python is flexible enough that if you don't like explicit self, you can just use the Selfless metaclass:

http://www.voidspace.org.uk/python/articles/metaclasses.shtml

jim said...

I always loved the explicit nature of self yet I remember myself beign preferable to a "def self.foo(arg)" syntax in the beginning.

I think what disturbs people the most is the optical 'verbosity' it causes. You have your normal arguments in one place, you dont want this special argument together(eg counting your arguements, doest feel nice)

But now, I simply add two spaces after my self's comma, just to visibly unbound it a little from the other args, and I think its the best solution overall.

Just my 2 cents of a worthless dollar of nowdays to the topic.

kevinle said...

Shouldn't the problem of forgetting typing self be easily addressed by the IDEs' Intellisense?

D said...

My apologies if this has already been stated, but chromakode on Reddit.com had some very good comments that I think nicely summarize how explicit self helps you out.

Here is the entire comment (minus a couple of sentences that pertained to a reply to an earlier comment)

"""Python's use of 'self' as an explicit argument is a slight syntactic trick that extremely cleverly glues together the bound/unbound programming experience. Having programmed in many OOP languages where 'this'/'self' are implicit, I have to say that I greatly prefer Python's way of doing it. It answers the following questions very elegantly:

How did self get defined locally in my method?

Explicit: you specified it as an argument, ether via instance.meth(args) or class.method(instance, args).

Implicit: I put it there for you automatically because you called a class instance method.

So how do I specify 'self' myself?

Explicit: you pass it as an argument.

Implicit: you use a language construct such as method.apply(instance, args).

How do I pass around bound methods (or more general: closures) for callbacks?

Explicit: Evaluating instance.method results in a bound method that calls method(instance, args). Notice how this syntax applies to normal invocations too... ( instance.method(args) <=> (instance.method)(args) ) == method(instance, args)

Implicit: You'll typically have to store 'self' on your own in a closure and use the language construct from above to call the method using it."""

Andrew said...

I think it makes sense to have an explicit self when grafting random methods onto classes since they presumably are not in the lexical scope of the class. When they are contained in a class block I don't see why you would need to define self.

It seems like self should always be the bound to the instance of the surrounding class the method is defined in. So a global def has no implicit self but a method does.

To keep things from getting lame like javascript closures would include self. The only problem is nested classes? How do you refer to an outer self?

def self.meth() seems unnecessary.

freed said...

I like the explicit 'self' in the body. Had a little concern of it in the argument list.

It seems to me Guido may consider the following syntax:

def self.foo(arg): ..."
def cls.foo(arg): ...

I want to say that: I LOVE this syntactic sugar and hope it could be considered to be implemented sooner.

jcd said...

I'd propose splitting the error message into two distinct error messages:

TypeError: m2() takes exactly 3 arguments (2 given)

Could become:

TypeError: unbound method m2() takes exactly 3 arguments (2 given)

and

TypeError: bound method m2() takes exactly 2 arguments (1 given)

(WV: ainsheti: "My proposal isn't great, but it ainsheti either.)

greg said...

I'd rather not have 'def self.meth(args)' because it would close off the possibility of generalizing the def statement to allow an arbitrary lvalue in place of the name. This would be useful for things like setting up a dictionary of functions:

funcs = {}

def funcs['a'](args):
...

def funcs['b'](args):
...

wickedbeast said...

Excellent post.

Love the BDFL's responsiveness.

I think the error message as it is...

TypeError: m2() takes exactly 3 arguments (2 given)

...says what it needs to say: any thinking person would go find out what arguments s/he missed and correct.

sit1way said...

Coming from the PHP "language", we have far worse problems than the explicit self required in Python.

Compared to Groovy, and especially Ruby, the so-called elegant language, Python is the absolute natural choice for a sick-of-PHP developer.

Nonetheless, this one small issue will drive me nuts, an entire application overflowing with an apparently optional self method param -- uggghhh, if only, please, we the numberless want-to-convert-to-python beings beg of you, get rid of the explicit self method param or I'll...learn Ruby, even @ and @@ is better than explicit self.

Seriously, everyone is still using Python 2, just sneak out the explicit self in 3.2 stable and nobody will notice.

Alright, alright, money, money talks, how much, in Euros, will it take to get explicit self removed??

ArneBab said...

I kinda like the proposal of

def a.b(*args)

automatically turning into

def b(a, *args)

This would make the dot-syntax more general and I would have a symmetry between calling methods and creating them.

Note however, that this would mean, that people would expect things like the following to work, too:

def a.b.c()

which would be equal to

def c(a, b)

and this might create some unclear situations, when calling a function from a module:

import a.b
a.b.c()

which would not be equal to

from a.b import c
c()

a.b.c() == c(a,b) != c()

(since in the first case, the function would get the package and module as arguments and in the second it would not)

And I feel that making this consistent, too, would require quite some changes in Python.

So I would prefer to have an improved Error message:

If a method I call misses one parameter and the defined method does not have self as first parameter, tell me:

“TypeError: b() takes n arguments (n+1 given). Did you add self as first argument?

Example:

>>> class A:
...  def b():
...   pass
>>> a = A()
>>> a.b()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: b() takes no arguments (1 given). Methods need self as first argument.

Rich said...

In my opinion, this aspect of python really just encourages me to write procedural code if/when I use python.(which may or may no be a good thing).
Why write
class Foo:
def bar(self,arg):
*code*

when I can just write
def foo_bar(foo,arg):
*code*

C├ędric Baudry said...

self is useless.

the arguments presented here are irrelevant.

Unknown said...

Coming from using Java for the past year or two, I've found that the only reason I would have needed to use self-references is when I wrote redundant variable names. I've literally never required the use of such a thing in Java because I don't reuse variable names in the same scope.

The larger issue I have with people getting snooty about stuff like this though, is that we're talking about an Interpreted language which abstracts most of the low-level stuff away from the programmer anyway. It seems to me that being anal-retentive about a self-reference is a splitting hairs somewhat. Also, Python doesn't support overloading, so let's step back for a moment:

A: Python doesn't support overloading, so you can't pull a Java and call self(args) from an overloaded constructor.

B: Using self allows you to re-use variable names, which -- and this is my opinion of course -- I think is a bad practice to begin with.

I like python, but I personally chalk up explicitly passed "self" references on the "weird idiosyncratic stuff programmers think is a good idea" board where Python is concerned -- right next to the "why the hell did Sun think it was a good idea to make everyone write System.out.print() for every console print statement" entry.

michael lovett said...

I don't like explicit self. I've never seen a case where it was to my advantage to specify "self", but on the contrary have had several instances where it was a pain when I forgot it.

I also don't buy "explicit is better than implicit". There are plenty of things the language does implicitly that we don't complain about and would be horrified if we had to be explicit. Some cases in point:

You can declare a local variable right now without a keyword. You don't have to say local foo = "bar", the language infers that you want a local variable from scope.

Along these same lines, why can't the language infer "self" as well?

Corn8Bit said...

@Michael lovett

I absolutely agree, and I'm not convinced at all that a decorator couldn't infer when self should be passed or not, which is Guido's claim.

My vote is for an implicit self parameter, and self being a keyword.