share
Stack Overflow"Least Astonishment" in Python: The Mutable Default Argument
[+804] [21] Stefano Borini
[2009-07-15 18:00:37]
[ python language-design least-astonishment ]
[ http://stackoverflow.com/questions/1132941/least-astonishment-in-python-the-mutable-default-argument ]

Anyone tinkering with Python long enough has been bitten (or torn to pieces) by the following issue:

def foo(a=[]):
    a.append(5)
    return a

Python novices would expect this function to always return a list with only one element: [5]. The result is instead very different, and very astonishing (for a novice):

>>> foo()
[5]
>>> foo()
[5, 5]
>>> foo()
[5, 5, 5]
>>> foo()
[5, 5, 5, 5]
>>> foo()

A manager of mine once had his first encountered with this feature, and called it "a dramatic design flaw" of the language. I replied that the behavior had an underlying explanation, and it is indeed very puzzling and unexpected if you don't understand the internals. However, I was not able to answer (to myself) the following question: what is the reason for binding the default argument at function definition, and not at function execution? I doubt the experienced behavior has a practical use (who really used static variables in C, without breeding bugs ?)

Edit:

Baczek made an interesting example. Together with most of your comments and Utaal's in particular, I elaborated further:

>>> def a():
...     print "a executed"
...     return []
... 
>>>            
>>> def b(x=a()):
...     x.append(5)
...     print x
... 
a executed
>>> b()
[5]
>>> b()
[5, 5]

To me, it seems that the design decision was relative to where to put the scope of parameters: inside the function or "together" with it?

Doing the binding inside the function would mean that x is effectively bound to the specified default when the function is called, not defined, something that would present a deep flaw: the def line would be "hybrid" in the sense that part of the binding (of the function object) would happen at definition, and part (assignment of default parameters) at function invocation time.

The actual behavior is more consistent: everything of that line gets evaluated when that line is executed, meaning at function definition.

Guido is a fantastic designer.

Edit

I reread all the very interesting and good answers you provided, and it was hard to assign a "correct tickmark", as everyone had good points in the answer. I marked Roberto's answer as correct because it was simpler and revealing, so that newcomers browsing this question can start from his answer and then delve into remaining more complex (but very insightful) answers.

(28) The real issue is the scope of the variable. No answer has yet discussed this, or addressed why a parameter to a function would have its default parameters globally scoped. It's certainly contrary to all other languages I've worked with. - Kieveli
(56) @Kieveli: It's not globally scoped. It's tied to the function object itself. When the function is instantiated (when the declaration is executed), the default parameter expression is evaluated (in the example above, a list literal), and the resulting object is bound to the created function object. If you place that function definition inside of another function and return it from the outer function, you will obtain a new function object each time the outer function is called, and each one will have its own default a list. They are not globally shared. - Miles
(2) Ahm, but for a global function, they are globally scoped, which is what he meant. Just as for a function defined in the function, the scope is the first function. So you are both right. @Kieveli: Well, it's a natural effect of the fact that Python is interpreted, dynamic and uses references for everything. What other languages have you used that fits that bill? Can't be that many. :-) - Lennart Regebro
(3) I may never get tired of posting this link to explain the difference between variable and names in python, tho I just learned about it: python.net/~goodger/projects/pycon/2007/idiomatic/… - This is for your first edit. Note how a from the first example and x from the second are just naming the same (mutable) object. - Cawas
How could the default values to a function parameter use any scope other than global or, in the case of class methods, class? Doesn't the scope of the default value have to be at least as broad as the function itself? If they had function scope, they wouldn't be initialized the first time the function was called, so they wouldn't be very useful as default values. - intuited
(2) Complementary question - Good uses for mutable default arguments - Jonathan
Didn't Guido once say that he meant "least surprising /to me/"? :) - fluffels
"who really used static variables in C, without breeding bugs ?" Static variables in C are a great way to create singleton modules; granted, they need to be used with care, but they do have their uses. - Woodrow Douglass
This reminds me of the famous similar gotcha in Lisp: "don't you modify a constant list, or you gonna have bad time!!" - Le Curious
And by the way, a = []; b = (a, a, a); a.append(1) (^_~) - Izkata
If you DO in fact want to create an empty list on every invocation if the parameter is omitted, couldn't you simply do: def foo(a=None): a = a or [] This is very common in JavaScript, and I'm wondering why I haven't seen this in Python since these conditionals are very common. - John Syrinek
static variables can be very useful in lambdas. - ValekHalfHeart
Given that only a handful of obscure use cases of thus behaving mutable default arguments have been discovered and documented in the complementary question, thanks @Jonathan, it seems that allowing them is a design flaw after all (from a usability perspective, in particular a noobtrap that sometimes bites even advanced noobs). - Evgeni Sergeev
@EvgeniSergeev - agreed. Also, there are more explicit ways to answer these Good uses for mutable function argument default values - Jonathan
@JohnSyrinek That is very common in Python, too; it's the standard approach for when you want a new mutable object on each call as the default value for an argument. - Carl Meyer
The "hybrid binding" argument doesn't really hold up. We could easily resolve the binding of a just before calling it. The resulting semantics would be identical to the the ubiquitous def f(x=None): if x is None: x = mutable() pattern. - bukzor
[+633] [2009-07-17 21:29:39] Roberto Liffredo [ACCEPTED]

Actually, this is not a design flaw, and it is not because of internals, or performance.
It comes simply from the fact that functions in Python are first-class objects, and not only a piece of code.

As soon as you get to think into this way, then it completely makes sense: a function is an object being evaluated on its definition; default parameters are kind of "member data" and therefore their state may change from one call to the other - exactly as in any other object.

In any case, Effbot has a very nice explanation of the reasons for this behavior in Default Parameter Values in Python [1].
I found it very clear, and I really suggest reading it for a better knowledge of how function objects work.

[1] http://effbot.org/zone/default-values.htm

What do you mean by "first-class objects", Roberto? Is this it, the "programmer's note" right above the "class definitions"? docs.python.org/reference/compound_stmts.html#class-definitions - Cawas
I mean that they are "objects", not much different from what you create by instantiating a class. - Roberto Liffredo
(49) Good answer, but I still think that it is a design flaw - Casebash
(17) To anyone reading the above answer, I strongly recommend you take the time to read through the linked Effbot article. As well as all the other useful info, the part on how this language feature can be used for result caching/memoisation is very handy to know! - Cam Jackson
(9) Even if it's a first-class object, one might still envision a design where the code for each default value is stored along with the object and re-evaluated each time the function is called. I'm not saying that would be better, just that functions being first-class objects does not fully preclude it. - gerrit
(1) @CamJackson - As a guy taking a college course in recusive algos, I'm curious how much these interesting memoization approaches in the effbot article are holding true today (2013) and in the light of Python 3. Thoughts? - DeaconDesperado
@DeaconDesperado I'm not exactly sure what you're referring to... Is there something in particular in Python 3 that makes memoization-via-mutable-default-argument unnecessary now? (My main projects have required libraries that only run on Python 2, so I haven't done a lot with Python 3.) - Cam Jackson
(1) I'm suprised that no one has mentioned jeffknupp.com/blog/2013/02/14/… which gives you a clearer understanding of the execution model and why a would "resurface" instead of being recreated every call? - Torxed
I would argue that this is a language design flaw for one very simple reason: A function definition should be internally state-less between invocations. I say "internally stateless", because state can externally be changed if data is passed by reference or as a side-effect of calling other state-changing methods. But with all external state removed, subsequent calls of a function should not effect the future behavior of said function. Violating this becomes a liability that undermines code modularization (in-terms of both separation of concerns and functional programming). - Ryan Delucchi
(12) Sorry, but anything considered "The biggest WTF in Python" is most definitely a design flaw. This is a source of bugs for everyone at some point, because no one expects that behavior at first - which means it should not have been designed that way to begin with. I don't care what hoops they had to jump through, they should have designed Python so that default arguments are non-static. - BlueRaja - Danny Pflughoeft
This is not a design flaw. It is a design decision; perhaps a bad one, but not an accident. The state thing is just like any other closure: a closure is not a function, and a function with mutable default argument is not a function. x = [ [1,2,3] ] * 5 is also a seriou source of bugs, but not a design flaw. - Elazar
@RyanDelucchi Your comment very nearly amounts to 'closures that allow assignment to closure variables, or closures that can involve mutable objects, are inherently bad'. You'll hate JavaScript. Things I find awesome, like the ability to construct generators with closures like var numbersIterator = function () {var i=0; return function () {return i++;}}(), will probably give you nightmares. - Mark Amery
(7) Whether or not it's a design flaw, your answer seems to imply that this behaviour is somehow necessary, natural and obvious given that functions are first-class objects, and that simply isn't the case. Python has closures. If you replace the default argument with an assignment on the first line of the function, it evaluates the expression each call (potentially using names declared in an enclosing scope). There is no reason at all that it wouldn't be possible or reasonable to have default arguments evaluated each time the function is called in exactly the same way. - Mark Amery
(1) The design doesn't directly follow from functions are objects. In your paradigm, the proposal would be to implement functions' default values as properties rather than attributes. - bukzor
@MarkAmery the difference is, perhaps, that in Javascript you can write normal functions too. You're unlikely to write a generator by accident. Python has the more specialised and advanced behaviour enabled by default, violating the principle of least surprise. - Robert Grant
@MarkAmery Having said that, I totally agree that the code that evaluates default arguments doesn't have to go look and see if there's already a member var with that name and use that instead of what the programmer specified the default argument to be. I think when they say default argument, they mean variable initialisation on object creation. - Robert Grant
@BlueRaja, I complete disagree. What if I want to create functions on the fly, with dynamically-generated default arguments, e.g. for i in range(10): { def newfunc(v=i): { pass } do_something_with_newfunc }? (Yes, this is something I have actually used.) When should the value of i be evaluated in your interpretation? - Dan Lenski
If you called foo() several times and then called it with foo([]), given your explanation, I would expect that it would "reset" the function. But it doesn't. On the next call of foo(), you still see the long list of 5's. - Saish
Why would you expect foo([]) to "reset" the function (defined as def foo(bar=[]):)? When you call it with the default argument, the bar name always refers to a specific list object. Calling it with a non-default argument does not change the list object referred to in the default case. - Dan Lenski
@DanLenski That is a horrible idea, and you should use closures. - Asad
Asad, certainly closures are a good alternative for the case I mentioned. But they wouldn't change the situation with regards to mutable defaults at all... - Dan Lenski
@DanLenski, it's nice for you that you can do that, but anything that allows this to the detriment of the normal use cases is a design flaw. Yes, it was intentional, but it was a (rare!) bad design decision. - alexis
@alexis, I haven't seen anyone propose a coherent model of objects and parameter declarations that would fix this particular case without a bunch of other obviously undesirable effects. Which is why I disagree that this behavior can be called a "bad design decision"; it's not simply an isolated feature but rather a consequence of many other desirable and useful features of Python's object and function model. - Dan Lenski
Right, I agree that it's a big-picture question. I've read the discussion in the python list, and it's clear that they were pushed to this model (and to keeping it in python 3) for complex reasons. I just don't find that convenience in generating functions in a loop is a decisive argument. - alexis
1
[+77] [2009-07-15 18:11:26] Eli Courtwright

Suppose you have the following code

fruits = ("apples", "bannanas", "loganberries")

def eat(food=fruits):
    ...

When I see the declaration of eat, the least astonishing thing is to think that if the first parameter is not given, that it will be equal to the tuple ("apples", "bannanas", "loganberries")

However, supposed later on in the code, I do something like

def some_random_function():
    global fruits
    fruits = ("blueberries", "mangos")

then if default parameters were bound at function execution rather than function declaration then I would be astonished (in a very bad way) to discover that fruits had been changed. This would be more astonishing IMO than discovering that your foo function above was mutating the list.

The real problem lies with mutable variables, and all languages have this problem to some extent. Here's a question: suppose in Java I have the following code:

StringBuffer s = new StringBuffer("Hello World!");
Map<StringBuffer,Integer> counts = new HashMap<StringBuffer,Integer>();
counts.put(s, 5);
s.append("!!!!");
System.out.println( counts.get(s) );  // does this work?

Now, does my map use the value of the StringBuffer key when it was placed into the map, or does it store the key by reference? Either way, someone is astonished; either the person who tried to get the object out of the Map using a value identical to the one they put it in with, or the person who can't seem to retrieve their ovject even though the key they're using is literally the same object that was used to put it into the map. (This is actually why Python doesn't allow its mutable builtin data types to be used as dictionary keys.)

Your example is a good one of a case where Python newcomers will be surprised and bitten. But I'd argue that if we "fixed" this, then that would only create a different situation where they'd be bitten instead, and that one would be even less intuitive. Moreover, this is always the case when dealing with mutable variables; you always run into cases where someone could intuitively expect one or the opposite behavior depending on what code they're writing.

I personally like Python's current approach: default function arguments are evaluated when the function is defined and that object is always the default. I suppose they could special-case using an empty list, but that kind of special casing would cause even more astonishment, not to mention be backwards incompatible.


(8) I think it's a matter of debate. You are acting on a global variable. Any evaluation performed anywhere in your code involving your global variable will now (correctly) refer to ("blueberries", "mangos"). the default parameter could just be like any other case. - Stefano Borini
(9) Actually, I don't think I agree with your first example. I'm not sure I like the idea of modifying an initializer like that in the first place, but if I did, I'd expect it to behave exactly as you describe — changing the default value to ("blueberries", "mangos"). - Ben Blank
(8) The default parameter is like any other case. What is unexpected is that the parameter is a global variable, and not a local one. Which in turn is because the code is executed at function definition, not call. Once you get that, and that the same goes for classes, it's perfectly clear. - Lennart Regebro
(2) Brilliant counterexample, +1 - IfLoop
I have no idea what to expect in that first code example. It really shouldn't be legal code to begin with (I seriously hope it isn't) - BlueRaja - Danny Pflughoeft
@EliCourtwright, this is the best answer to this question I have read. Nicely explained. As you show, a "fix" is not desirable here. My proposal for how to not confuse the newbies would be to emit a compile-time warning about mutable built-in types as default arguments, although this is not a very Pythonic thing to do either :-P - Dan Lenski
I find the example misleading rather than brilliant. If some_random_function() appends to fruits instead of assigning to it, the behaviour of eat() will change. So much for the current wonderful design. If you use a default argument that's referenced elsewhere and then modify the reference from outside the function, you are asking for trouble. The real WTF is when people define a fresh default argument (a list literal or a call to a constructor), and still get bit. - alexis
2
[+43] [2012-07-10 14:50:42] glglgl

AFAICS no one has yet posted the relevant part of the documentation [1]:

Default parameter values are evaluated when the function definition is executed. This means that the expression is evaluated once, when the function is defined, and that the same “pre-computed” value is used for each call. This is especially important to understand when a default parameter is a mutable object, such as a list or a dictionary: if the function modifies the object (e.g. by appending an item to a list), the default value is in effect modified. This is generally not what was intended. A way around this is to use None as the default, and explicitly test for it in the body of the function [...]

[1] http://docs.python.org/reference/compound_stmts.html#function-definitions

(8) The phrases "this is not generally what was intended" and "a way around this is" smell like they're documenting a design flaw. - bukzor
Documentation at last! +1! @bukzor: see the link in Liffredo's answer. The 'automatic' caching ability is amazing. - Matthew
@Matthew: I'm well aware, but it's not worth the pitfall. You'll generally see style guides and linters unconditionally flag mutable default values as wrong for this reason. The explicit way to do the same thing is to stuff an attribute onto the function (function.data = []) or better yet, make an object. - bukzor
@bukzor: Pitfalls need to be noted and documented, which is why this question is good and has received so many upvotes. At the same time, pitfalls don't necessarily need to be removed. How many Python beginners have passed a list to a function that modified it, and were shocked to see the changes show up in the original variable? Yet mutable object types are wonderful, when you understand how to use them. I guess it just boils down to opinion on this particular pitfall. - Matthew
3
[+37] [2009-07-15 23:21:09] Utaal

I know nothing about the Python interpreter inner workings (and I'm not an expert in compilers and interpreters either) so don't blame me if I propose anything unsensible or impossible.

Provided that python objects are mutable I think that this should be taken into account when designing the default arguments stuff. When you instantiate a list:

a = []

you expect to get a new list referenced by a.

Why should the a=[] in

def x(a=[]):

instantiate a new list on function definition and not on invocation? It's just like you're asking "if the user doesn't provide the argument then instantiate a new list and use it as if it was produced by the caller". I think this is ambiguous instead:

def x(a=datetime.datetime.now()):

user, do you want a to default to the datetime corresponding to when you're defining or executing x? In this case, as in the previous one, I'll keep the same behaviour as if the default argument "assignment" was the first instruction of the function (datetime.now() called on function invocation). On the other hand, if the user wanted the definition-time mapping he could write:

b = datetime.datetime.now()
def x(a=b):

I know, I know: that's a closure. Alternatively Python might provide a keyword to force definition-time binding:

def x(static a=b):

(6) You could do: def x(a=None): And then, if a is None, set a=datetime.datetime.now() - Anon
(3) I know, that was just an example to explain why I would prefer execution-time binding. - Utaal
(4) Thank you for this. I couldn't really put my finger on why this irks me to no end. You have done it beautifully with a minimum of fuzz and confusion. As someone comming from systems programming in C++ and sometimes naively "translating" language features, this false friend kicked me in the in the soft of the head big time, just like class attributes. I understand why things are this way, but I cannot help but dislike it, no matter what positive might come of it. At least it is so contrary to my experience, that I'll probably (hopefully) never forget it... - AndreasT
(2) @Andreas once you use Python for long enough, you begin to see how logical it is for Python to interpret things as class attributes the way it does - it is only because of the particular quirks and limitations of languages like C++ (and Java, and C#...) that it makes any sense for contents of the class {} block to be interpreted as belonging to the instances :) But when classes are first-class objects, obviously the natural thing is for their contents (in memory) to reflect their contents (in code). - Karl Knechtel
(1) Normative structure is no quirk or limitation in my book. I know it can be clumsy and ugly, but you can call it a "definition" of something. The dynamic languages seem a bit like anarchists to me: Sure everybody is free, but you need structure to get someone to empty the trash and pave the road. Guess I'm old... :) - AndreasT
4
[+28] [2009-07-15 18:54:45] Lennart Regebro

Well, the reason is quite simply that bindings are done when code is executed, and the function definition is executed, well... when the functions is defined.

Compare this:

class BananaBunch:
    bananas = []

    def addBanana(self, banana):
        self.bananas.append(banana)

This code suffers from the exact same unexpected happenstance. bananas is a class attribute, and hence, when you add things to it, it's added to all classes. The reason is exactly the same.

It's just "How It Works", and making it work differently in the function case would probably be complicated, and in the class case likely impossible, or at least slow down object instantiation a lot, as you would have to keep the class code around and execute it when objects are created.

Yes, it is unexpected. But once the penny drops, it fits in perfectly with how Python works in general. In fact, it's a good teaching aid, and once you understand why this happens, you'll grok python much better.

That said it should feature prominently in any good Python tutorial. Because as you mention, everyone runs into this problem sooner or later.


How do you define a class attribute that is different for each instance of a class? - Kieveli
(11) If it's different for each instance it's not a class attribute. Class attributes are attributes on the CLASS. Hence the name. Hence they are the same for all instances. - Lennart Regebro
(2) He wasn't asking for a description of Python's behavior, he was asking for the rationale. Nothing in Python is just "How It Works"; it all does what it does for a reason. - Glenn Maynard
(2) And I gave the rationale. - Lennart Regebro
(1) I wouldn't say that this "it's a good teaching aid", because it's not. - Tempus
How do you define an attribute in a class that is different for each instance of a class? (Re-defined for those who could not determine that a person not familiar with Python's naming convenctions might be asking about normal member variables of a class). - Kieveli
@Geo: Except that it is. It helps you understand a lot of things in Python. - Lennart Regebro
@Kievieli: You ARE talking about normal member variables of a class. :-) You define instance attributes by saying self.attribute = value in any method. For example __init__(). - Lennart Regebro
@Kieveli: Two answers: you can't, because any thing you define at a class level will be a class attribute, and any instance that accesses that attribute will access the same class attribute; you can, /sort of/, by using propertys -- which are actually class level functions that act like normal attributes but save the attribute in the instance instead of the class (by using self.attribute = value as Lennart said). - Ethan Furman
5
[+18] [2009-07-16 10:05:09] Brian

I used to think that creating the objects at runtime would be the better approach. I'm less certain now, since you do lose some useful features, though it may be worth it regardless simply to prevent newbie confusion. The disadvantages of doing so are:

1. Performance

def foo(arg=something_expensive_to_compute())):
    ...

If call-time evaluation is used, then the expensive function is called every time your function is used without an argument. You'd either pay an expensive price on each call, or need to manually cache the value externally, polluting your namespace and adding verbosity.

2. Forcing bound parameters

A useful trick is to bind parameters of a lambda to the current binding of a variable when the lambda is created. For example:

funcs = [ lambda i=i: i for i in range(10)]

This returns a list of functions that return 0,1,2,3... respectively. If the behaviour is changed, they will instead bind i to the call-time value of i, so you would get a list of functions that all returned 9.

The only way to implement this otherwise would be to create a further closure with the i bound, ie:

def make_func(i): return lambda: i
funcs = [make_func(i) for i in range(10)]

3. Introspection

Consider the code:

def foo(a='test', b=100, c=[]):
   print a,b,c

We can get information about the arguments and defaults using the inspect module, which

>>> inspect.getargspec(foo)
(['a', 'b', 'c'], None, None, ('test', 100, []))

This information is very useful for things like document generation, metaprogramming, decorators etc.

Now, suppose the behaviour of defaults could be changed so that this is the equivalent of:

_undefined = object()  # sentinel value

def foo(a=_undefined, b=_undefined, c=_undefined)
    if a is _undefined: a='test'
    if b is _undefined: b=100
    if c is _undefined: c=[]

However, we've lost the ability to introspect, and see what the default arguments are. Because the objects haven't been constructed, we can't ever get hold of them without actually calling the function. The best we could do is to store off the source code and return that as a string.


you could achieve introspection also if for each there was a function to create the default argument instead of a value. the inspect module will just call that function. - yairchu
@SilentGhost: I'm talking about if the behaviour was changed to recreate it - creating it once is the current behaviour, and why the mutable default problem exists. - Brian
@yairchu: That assumes the construction is safe to so (ie has no side effects). Introspecting the args shouldn't do anything, but evaluating arbitrary code could well end up having an effect. - Brian
A different language design often just means writing things differently. Your first example could easily be written as: _expensive = expensive(); def foo(arg=_expensive), if you specifically don't want it reevaluated. - Glenn Maynard
@Glenn - that's what I was referring to with "cache the variable externally" - it is a bit more verbose, and you end up with extra variables in your namespace though. - Brian
I think the comment problems are because they've implented restricted markdown for comments, so your _ is being treated as italic. if this shows as code, then you can use backticks to prevent it - Brian
@SilentGhost: Actually, rereading that, I can see your point - I worded that pretty badly. Edited to clarify my meaning. - Brian
In addition, to special-case this would add considerable complexity, especially in understanding the effect of inner function definitions. - Marcin
6
[+16] [2009-07-15 19:15:25] ymv

This behavior is easy explained by:

  1. function (class etc.) declaration is executed only once, creating all default value objects
  2. everything is passed by reference

So:

def x(a=0, b=[], c=[], d=0):
    a = a + 1
    b = b + [1]
    c.append(1)
    print a, b, c
  1. a doesn't change - every assignment call creates new int object - new object is printed
  2. b doesn't change - new array is build from default value and printed
  3. c changes - operation is performed on same object - and it is printed

Your #4 could be confusing to people, since integers are immutable and so that "if" is not true. For instance, with d set to 0, d.__add__(1) would return 1, but d would still be 0. - Anon
(Actually, add is a bad example, but integers being immutable still is my main point.) - Anon
yes, that wasn't good example - ymv
Realized it to my chagrin after checking to see that, with b set to [], b.__add__([1]) returns [1] but also leaves b still [] even though lists are mutable. My bad. - Anon
@ANon: there is __iadd__, but it doesn't work with int. Of course. :-) - Veky
7
[+12] [2009-07-15 20:18:14] Glenn Maynard

What you're asking is why this:

def func(a=[], b = 2):
    pass

isn't internally equivalent to this:

def func(a=None, b = None):
    a_default = lambda: []
    b_default = lambda: 2
    def actual_func(a=None, b=None):
        if a is None: a = a_default()
        if b is None: b = b_default()
    return actual_func
func = func()

except for the case of explicitly calling func(None, None), which we'll ignore.

In other words, instead of evaluating default parameters, why not store each of them, and evaluate them when the function is called?

One answer is probably right there--it would effectively turn every function with default parameters into a closure. Even if it's all hidden away in the interpreter and not a full-blown closure, the data's got to be stored somewhere. It'd be slower and use more memory.


(1) It wouldn't need to be a closure - a better way to think of it would simply to make the bytecode creating defaults the first line of code - after all you're compiling the body at that point anyway - there's no real difference between code in the arguments and code in the body. - Brian
(5) True, but it would still slow Python down, and it would actually be quite surprising, unless you do the same for class definitions, which would make it stupidly slow as you would have to re-run the whole class definition each time you instantiate a class. As mentioned, the fix would be more surprising than the problem. - Lennart Regebro
Agreed with Lennart. As Guido is fond of saying, for every language feature or standard library, there's someone out there using it. - Jason Baker
(1) Changing it now would be insanity--we're just exploring why it is the way it is. If it did late default evaluation to begin with, it wouldn't necessarily be surprising. It's definitely true that such a core a difference of parsing would have sweeping, and probably many obscure, effects on the language as a whole. - Glenn Maynard
8
[+12] [2009-07-15 23:18:36] Jason Baker

It's a performance optimization. As a result of this functionality, which of these two function calls do you think is faster?

def print_tuple(some_tuple=(1,2,3)):
    print some_tuple

print_tuple()        #1
print_tuple((1,2,3)) #2

I'll give you a hint. Here's the disassembly (see http://docs.python.org/library/dis.html):

#1

0 LOAD_GLOBAL              0 (print_tuple)
3 CALL_FUNCTION            0
6 POP_TOP
7 LOAD_CONST               0 (None)
10 RETURN_VALUE

#2

 0 LOAD_GLOBAL              0 (print_tuple)
 3 LOAD_CONST               4 ((1, 2, 3))
 6 CALL_FUNCTION            1
 9 POP_TOP
10 LOAD_CONST               0 (None)
13 RETURN_VALUE

I doubt the experienced behavior has a practical use (who really used static variables in C, without breeding bugs ?)

As you can see, there is a performance benefit when using immutable default arguments. This can make a difference if it's a frequently called function or the default argument takes a long time to construct. Also, bear in mind that Python isn't C. In C you have constants that are pretty much free. In Python you don't have this benefit.


how do you obtain the dissasembly? - Tempus
(4) Use the dis module: docs.python.org/library/dis.html - Jason Baker
How much of a difference could 3 LOAD_CONST 4 ((1, 2, 3)) make over even millions of iterations? ^0^. Maybe I'll profile and report back... - dimadima
9
[+12] [2011-05-23 04:24:30] Ben

This actually has nothing to do with default values, other than that it often comes up as an unexpected behaviour when you write functions with mutable default values.

>>> def foo(a):
    a.append(5)
    print a

>>> a  = [5]
>>> foo(a)
[5, 5]
>>> foo(a)
[5, 5, 5]
>>> foo(a)
[5, 5, 5, 5]
>>> foo(a)
[5, 5, 5, 5, 5]

No default values in sight in this code, but you get exactly the same problem.

The problem is that foo is modifying a mutable variable passed in from the caller, when the caller doesn't expect this. Code like this would be fine if the function was called something like append_5; then the caller would be calling the function in order to modify the value they pass in, and the behaviour would be expected. But such a function would be very unlikely to take a default argument, and probably wouldn't return the list (since the caller already has a reference to that list; the one it just passed in).

Your original foo, with a default argument, shouldn't be modifying a whether it was explicitly passed in or got the default value. Your code should leave mutable arguments alone unless it is clear from the context/name/documentation that the arguments are supposed to be modified. Using mutable values passed in as arguments as local temporaries is an extremely bad idea, whether we're in Python or not and whether there are default arguments involved or not.

If you need to destructively manipulate a local temporary in the course of computing something, and you need to start your manipulation from an argument value, you need to make a copy.


(1) Although related, I think this is distinct behaviour (as we expect append to change a "in-place"). That a default mutable is not re-instantiated on each call is the "unexpected" bit... at least for me. :) - Andy Hayden
10
[+11] [2012-11-22 18:09:04] hynekcer

1) The so-called problem of "Mutable Default Argument" is in general a special example demonstrating that:
"All functions with this problem suffer also from similar side effect problem on the actual parameter,"
That is against the rules of functional programming, usually undesiderable and should be fixed both together.

Example:

def foo(a=[]):                 # the same problematic function
    a.append(5)
    return a

>>> somevar = [1, 2]           # an example without a default parameter
>>> foo(somevar)
[1, 2, 5]
>>> somevar
[1, 2, 5]                      # usually expected [1, 2]

Solution: a copy
An absolutely safe solution is to copy or deepcopy the input object first and then to do whatever with the copy.

def foo(a=[]):
    a = a[:]     # a copy
    a.append(5)
    return a     # or everything safe by one line: "return a + [5]"

Many builtin mutable types have a copy method like some_dict.copy() or some_set.copy() or can be copied easy like somelist[:] or list(some_list). Every object can be also copied by copy.copy(any_object) or more thorough by copy.deepcopy() (the latter useful if the mutable object is composed from mutable objects). Some objects are fundamentally based on side effects like "file" object and can not be meaningfully reproduced by copy. copying [1]

Example problem for a similar SO question [2]

class Test(object):            # the original problematic class
  def __init__(self, var1=[]):
    self._var1 = var1

somevar = [1, 2]               # an example without a default parameter
t1 = Test(somevar)
t2 = Test(somevar)
t1._var1.append([1])
print somevar                  # [1, 2, [1]] but usually expected [1, 2]
print t2._var1                 # [1, 2, [1]] but usually expected [1, 2]

It shouldn't be neither saved in any public attribute of an instance returned by this function. (Assuming that private attributes of instance should not be modified from outside of this class or subclasses by convention. i.e. _var1 is a private attribute )

Conclusion:
Input parameters objects shouldn't be modified in place (mutated) nor they should not be binded into an object returned by the function. (If we prefere programming without side effects which is strongly recommended. see Wiki about "side effect" [3] (The first two paragraphs are relevent in this context.) .)

2)
Only if the side effect on the actual parameter is required but unwanted on the default parameter then the useful solution is def ...(var1=None): if var1 is None: var1 = [] More.. [4]

3) In some cases is the mutable behavior of default parameters useful [5].

[1] http://effbot.org/pyfaq/how-do-i-copy-an-object-in-python.htm
[2] http://stackoverflow.com/q/13484107/448474
[3] http://en.wikipedia.org/wiki/Side_effect_%28computer_science%29
[4] http://effbot.org/zone/default-values.htm#what-to-do-instead
[5] http://effbot.org/zone/default-values.htm#valid-uses-for-mutable-defaults

(1) I hope you're aware that Python is not a functional programming language. - Veky
(1) Yes, Python is a multi-paragigm language with some functional features. ("Don't make every problem look like a nail just because you have a hammer.") Many of them are in Python best practicies. Python has an interesting HOWTO Functional Programming Other features are closures and currying, not mentioned here. - hynekcer
11
[+8] [2009-07-16 12:19:23] Baczek

the shortest answer would probably be "definition is execution", therefore the whole argument makes no strict sense. as a more contrived example, you may cite this:

def a(): return []

def b(x=a()):
    print x

hopefully it's enough to show that not executing the default argument expressions at the execution time of the def statement isn't easy or doesn't make sense, or both.

i agree it's a gotcha when you try to use default constructors, though.


12
[+8] [2012-04-24 19:43:13] dimadima

This behavior is not surprising if you take the following into consideration:

  1. The behavior of read-only class attributes upon assignment attempts, and that
  2. Functions are objects (explained well in the accepted answer).

The role of (2) has been covered extensively in this thread. (1) is likely the astonishment causing factor, as this behavior is not "intuitive" when coming from other languages.

(1) is described in the Python tutorial on classes [1]. In an attempt to assign a value to a read-only class attribute:

...all variables found outside of the innermost scope are read-only (an attempt to write to such a variable will simply create a new local variable in the innermost scope, leaving the identically named outer variable unchanged).

Look back to the original example and consider the above points:

def foo(a=[]):
    a.append(5)
    return a

Here foo is an object and a is an attribute of foo (available at foo.func_defs[0]). Since a is a list, a is mutable and is thus a read-write attribute of foo. It is initialized to the empty list as specified by the signature when the function is instantiated, and is available for reading and writing as long as the function object exists.

Calling foo without overriding a default uses that default's value from foo.func_defs. In this case, foo.func_defs[0] is used for a within function object's code scope. Changes to a change foo.func_defs[0], which is part of the foo object and persists between execution of the code in foo.

Now, compare this to the example from the documentation on emulating the default argument behavior of other languages [2], such that the function signature defaults are used every time the function is executed:

def foo(a, L=None):
    if L is None:
        L = []
    L.append(a)
    return L

Taking (1) and (2) into account, one can see why this accomplishes the the desired behavior:

  • When the foo function object is instantiated, foo.func_defs[0] is set to None, an immutable object.
  • When the function is executed with defaults (with no parameter specified for L in the function call), foo.func_defs[0] (None) is available in the local scope as L.
  • Upon L = [], the assignment cannot succeed at foo.func_defs[0], because that attribute is read-only.
  • Per (1), a new local variable also named L is created in the local scope and used for the remainder of the function call. foo.func_defs[0] thus remains unchanged for future invocations of foo.
[1] http://docs.python.org/tutorial/classes.html
[2] http://docs.python.org/tutorial/controlflow.html#default-argument-values

13
[+8] [2013-01-15 11:02:03] jdborg

You can get round this by replacing the object (and therefore the tie with the scope):

def foo(a=[]):
    a = list(a)
    a.append(5)
    return a

Ugly, but it works.


(2) This is a nice solution in cases where you're using automatic documentation generation software to document the types of arguments expected by the function. Putting a=None and then setting a to [] if a is None doesn't help a reader understand at a glance what is expected. - Michael Scott Cuthbert
14
[+6] [2013-02-28 11:10:16] hugo24

A simple workaround using None

>>> def bar(b, data=None):
...     data = data or []
...     data.append(b)
...     return data
... 
>>> bar(3)
[3]
>>> bar(3)
[3]
>>> bar(3)
[3]
>>> bar(3, [34])
[34, 3]
>>> bar(3, [34])
[34, 3]

15
[+5] [2012-03-20 17:22:11] Marcin

The solutions here are:

  1. Use None as your default value (or a nonce object), and switch on that to create your values at runtime; or
  2. Use a lambda as your default parameter, and call it within a try block to get the default value (this is the sort of thing that lambda abstraction is for).

The second option is nice because users of the function can pass in a callable, which may be already existing (such as a type)


16
[+4] [2009-07-16 19:17:59] JonathanHayward

It may be true that:

  1. Someone is using every language/library feature, and
  2. Switching the behavior here would be ill-advised, but

it is entirely consistent to hold to both of the features above and still make another point:

  1. It is a confusing feature and it is unfortunate in Python.

The other answers, or at least some of them either make points 1 and 2 but not 3, or make point 3 and downplay points 1 and 2. But all three are true.

It may be true that switching horses in midstream here would be asking for significant breakage, and that there could be more problems created by changing Python to intuitively handle Stefano's opening snippet. And it may be true that someone who knew Python internals well could explain a minefield of consequences. However,

The existing behavior is not Pythonic, and Python is successful because very little about the language violates the principle of least astonishment anywhere near this badly. It is a real problem, whether or not it would be wise to uproot it. It is a design flaw. If you understand the language much better by trying to trace out the behavior, I can say that C++ does all of this and more; you learn a lot by navigating, for instance, subtle pointer errors. But this is not Pythonic: people who care about Python enough to persevere in the face of this behavior are people who are drawn to the language because Python has far fewer surprises than other language. Dabblers and the curious become Pythonistas when they are astonished at how little time it takes to get something working--not because of a design fl--I mean, hidden logic puzzle--that cuts against the intuitions of programmers who are drawn to Python because it Just Works.


(1) -1 Although a defensible perspective, this not an answer, and I disagree with it. Too many special exceptions beget their own corner cases. - Marcin
"The existing behavior is not Pythonic" is an amazingly ignorant thing to say about something so fundamental to Python, and betrays a woeful under-appreciation of its object model. - Matthew Trevor
(1) So then, it is "amazingly ignorant" to say that in Python it would make more sense for a default argument of [] to remain [] every time the function is called? - JonathanHayward
(1) And it is ignorant to consider as an unfortunate idiom setting a default argument to None, and then in the body of the body of the function setting if argument == None: argument = []? Is it ignorant to consider this idiom unfortunate as often people want what a naive newcomer would expect, that if you assign f(argument = []), argument will automatically default to a value of []? - JonathanHayward
In C, it is arguably not a FAIL for a main() with 'int a; scanf("%d", a);' to crash; the answer involves a dive into internals and an explanation that pass by reference is achieved by an (arguably surrogate) use of pointers. And in Tcl, it is arguably not a FAIL that things do not work in '{[...]}' that work everywhere else; in both cases the answer is, "Take a deep dive into internals, deeper than you understand now." - JonathanHayward
(1) But in Python, part of the spirit of the language is that you don't have to take too many deep dives; array.sort() works, and works regardless of how little you understand about sorting, big-O, and constants. The beauty of Python in the array sorting mechanism, to give one of innumerable examples, is that you are not required to take a deep dive into internals. And to say it differently, the beauty of Python is that one is not ordinarily required to take a deep dive into implementation to get something that Just Works. And there is a workaround (...if argument == None: argument = []), FAIL. - JonathanHayward
I agree with this post and have upvoted. - javadba
As a standalone, the statement x=[] means "create an empty list object, and bind the name 'x' to it." So, in def f(x=[]), an empty list is also created. It doesn't always get bound to x, so instead it gets bound to the default surrogate. Later when f() is called, the default is hauled out and bound to x. Since it was the empty list itself that was squirreled away, that same list is the only thing available to bind to x, whether anything has been stuck inside it or not. How could it be otherwise? - Jerry B
To achieve what you want, you'd have to save the text of the default value, and compile and execute it each each time the function is called, or compile the text and execute it for each call. The overhead for every call would be horrendous, multiplied by the number of default arguments. And a lot of the standard library functions have several defaults. And the results would depend on the context the call is made in, not the context the function was defined in. - Jerry B
17
[+2] [2012-07-04 19:51:30] jai
>>> def a():
>>>    print "a executed"
>>>    return []
>>> x =a()
a executed
>>> def b(m=[]):
>>>    m.append(5)
>>>    print m
>>> b(x)
[5]
>>> b(x)
[5, 5]

(2) @AustinHenley lack of explanation what's going on? - Tshepang
18
[+2] [2013-07-22 07:35:28] Norfeldt

This "bug" gave me a lot of overtime work hours! But I'm beginning to see a potential use of it (but I would have liked it to be at the execution time, still)

I'm gonna give you what I see as a useful example.

def example(errors=[]):
    # statements
    # Something went wrong
    mistake = True
    if mistake:
        tryToFixIt(errors)
        # Didn't work.. let's try again
        tryToFixItAnotherway(errors)
        # This time it worked
    return errors

def tryToFixIt(err):
    err.append('Attempt to fix it')

def tryToFixItAnotherway(err):
    err.append('Attempt to fix it by another way')

def main():
    for item in range(2):
        errors = example()
    print '\n'.join(errors)

main()

prints the following

Attempt to fix it
Attempt to fix it by another way
Attempt to fix it
Attempt to fix it by another way

19
[+2] [2013-08-22 05:58:41] user2384994

I think the answer to this question lies in how python pass data to parameter (pass by value or by reference), not mutability or how python handle the "def" statement.

A brief introduction. First, there are two type of data types in python, one is simple elementary data type, like numbers, and another data type is objects. Second, when passing data to parameters, python pass elementary data type by value, i.e., make a local copy of the value to a local variable, but pass object by reference, i.e., pointers to the object.

Admitting the above two points, let's explain what happened to the python code. It's only because of passing by reference for objects, but has nothing to do with mutable/immutable, or arguably the fact that "def" statement is executed only once when it is defined.

[] is an object, so python pass the reference of [] to a, i.e., a is only a pointer to [] which lies in memory as an object. There is only one copy of [] with, however, many references to it. For the first foo(), the list [] is changed to 1 [1] by append method. But Note that there is only one copy of the list object and this object now becomes 1 [2]. When running the second foo(), what effbot webpage says (items is not evaluated any more) is wrong. a is evaluated to be the list object, although now the content of the object is 1 [3]. This is the effect of passing by reference! The result of foo(3) can be easily derived in the same way.

To further validate my answer, let's take a look at two additional codes.

====== No. 2 ========

def foo(x, items=None):
    if items is None:
        items = []
    items.append(x)
    return items

foo(1)  #return [1]
foo(2)  #return [2]
foo(3)  #return [3]

[] is an object, so is None (the former is mutable while the latter is immutable. But the mutability has nothing to do with the question). None is somewhere in the space but we know it's there and there is only one copy of None there. So every time foo is invoked, items is evaluated (as opposed to some answer that it is only evaluated once) to be None, to be clear, the reference (or the address) of None. Then in the foo, item is changed to [], i.e., points to another object which has a different address.

====== No. 3 =======

def foo(x, items=[]):
    items.append(x)
    return items

foo(1)    # returns [1]
foo(2,[]) # returns [2]
foo(3)    # returns [1,3]

The invocation of foo(1) make items point to a list object [] with an address, say, 11111111. the content of the list is changed to 1 [4] in the foo function in the sequel, but the address is not changed, still 11111111. Then foo(2,[]) is coming. Although the [] in foo(2,[]) has the same content as the default parameter [] when calling foo(1), their address are different! Since we provide the parameter explicitly, items has to take the address of this new [], say 2222222, and return it after making some change. Now foo(3) is executed. since only x is provided, items has to take its default value again. What's the default value? It is set when defining the foo function: the list object located in 11111111. So the items is evaluated to be the address 11111111 having an element 1. The list located at 2222222 also contains one element 2, but it is not pointed by items any more. Consequently, An append of 3 will make items [1,3].

From the above explanations, we can see that the effbot [5] webpage recommended in the accepted answer failed to give a relevant answer to this question. What is more, I think a point in the effbot webpage is wrong. I think the code regarding the UI.Button is correct:

for i in range(10):
    def callback():
        print "clicked button", i
    UI.Button("button %s" % i, callback)

Each button can hold a distinct callback function which will display different value of i. I can provide an example to show this:

x=[]
for i in range(10):
    def callback():
        print(i)
    x.append(callback) 

If we execute x[7]() we'll get 7 as expected, and x[9]() will gives 9, another value of i.

[1] http://effbot.org/zone/default-values.htm
[2] http://effbot.org/zone/default-values.htm
[3] http://effbot.org/zone/default-values.htm
[4] http://effbot.org/zone/default-values.htm
[5] http://effbot.org/zone/default-values.htm

(1) Your last point is wrong. Try it and you'll see that x[7]() is 9. - Duncan
20
[+2] [2014-09-11 22:05:43] Saish

When we do this:

def foo(a=[]):
    ...

... we assign the argument a to an unnamed list, if the caller does not pass the value of a.

To make things simpler for this discussion, let's temporarily give the unnamed list a name. How about pavlo ?

def foo(a=pavlo):
   ...

At any time, if the caller doesn't tell us what a is, we reuse pavlo.

If pavlo is mutable (modifiable), and foo ends up modifying it, an effect we notice the next time foo is called without specifying a.

So this is what you see (Remember, pavlo is initialized to []):

 >>> foo()
 [5]

Now, pavlo is [5].

Calling foo() again modifies pavlo again:

>>> foo()
[5, 5]

Specifying a when calling foo() ensures pavlo is not touched.

>>> ivan = [1, 2, 3, 4]
>>> foo(a=ivan)
[1, 2, 3, 4, 5]
>>> ivan
[1, 2, 3, 4, 5]

So, pavlo is still [5, 5].

>>> foo()
[5, 5, 5]

21