What is the use of the
yield keyword in Python? What does it do?
For example, I'm trying to understand this code (**):
def node._get_child_candidates(self, distance, min_dist, max_dist): if self._leftchild and distance - max_dist < self._median: yield self._leftchild if self._rightchild and distance + max_dist >= self._median: yield self._rightchild
And this is the caller:
result, candidates = list(), [self] while candidates: node = candidates.pop() distance = node._get_dist(obj) if distance <= max_dist and distance >= min_dist: result.extend(node._values) candidates.extend(node._get_child_candidates(distance, min_dist, max_dist)) return result
What happens when the method
_get_child_candidates is called?
A list is returned? A single element is returned? Is it called again? When subsequent calls do stop?
** The code comes from Jochen Schulz (jrschulz), who made a great Python library for metric spaces. This is the link to the complete source: http://well-adjusted.de/~jrschulz/mspace/.
To understand what
yield does, you must understand what generators are. And before generators come iterables.
When you create a list, you can read its items one by one, and it's called iteration:
>>> mylist = [1, 2, 3] >>> for i in mylist: ... print(i) 1 2 3
Mylist is an iterable. When you use a list comprehension, you create a list, and so an iterable:
>>> mylist = [x*x for x in range(3)] >>> for i in mylist: ... print(i) 0 1 4
Everything you can use "for... in..." on is an iterable: lists, strings, files... These iterables are handy because you can read them as much as you wish, but you store all the values in memory and it's not always what you want when you have a lot of values.
Generators are iterators, but you can only iterate over them once. It's because they do not store all the values in memory, they generate the values on the fly:
>>> mygenerator = (x*x for x in range(3)) >>> for i in mygenerator: ... print(i) 0 1 4
It is just the same except you used
() instead of
. BUT, you can not perform
for i in mygenerator a second time since generators can only be used once: they calculate 0, then forget about it and calculate 1, and end calculating 4, one by one.
Yield is a keyword that is used like
return, except the function will return a generator.
>>> def createGenerator(): ... mylist = range(3) ... for i in mylist: ... yield i*i ... >>> mygenerator = createGenerator() # create a generator >>> print(mygenerator) # mygenerator is an object! <generator object createGenerator at 0xb7555c34> >>> for i in mygenerator: ... print(i) 0 1 4
Here it's a useless example, but it's handy when you know your function will return a huge set of values that you will only need to read once.
yield, you must understand that when you call the function, the code you have written in the function body does not run. The function only returns the generator object, this is a bit tricky :-)
Then, your code will be run each time the
for uses the generator.
Now the hard part:
The first time the
for calls the generator object created from your function, it will run the code in your function from the beginning until it hits
yield, then it'll return the first value of the loop. Then, each other call will run the loop you have written in the function one more time, and return the next value, until there is no value to return.
The generator is considered empty once the function runs but does not hit yield anymore. It can be because the loop had come to an end, or because you do not satisfy a "if/else" anymore.
# Here you create the method of the node object that will return the generator def node._get_child_candidates(self, distance, min_dist, max_dist): # Here is the code that will be called each time you use the generator object: # If there is still a child of the node object on its left # AND if distance is ok, return the next child if self._leftchild and distance - max_dist < self._median: yield self._leftchild # If there is still a child of the node object on its right # AND if distance is ok, return the next child if self._rightchild and distance + max_dist >= self._median: yield self._rightchild # If the function arrives here, the generator will be considered empty # there is no more than two values: the left and the right children
# Create an empty list and a list with the current object reference result, candidates = list(), [self] # Loop on candidates (they contain only one element at the beginning) while candidates: # Get the last candidate and remove it from the list node = candidates.pop() # Get the distance between obj and the candidate distance = node._get_dist(obj) # If distance is ok, then you can fill the result if distance <= max_dist and distance >= min_dist: result.extend(node._values) # Add the children of the candidate in the candidates list # so the loop will keep running until it will have looked # at all the children of the children of the children, etc. of the candidate candidates.extend(node._get_child_candidates(distance, min_dist, max_dist)) return result
This code contains several smart parts:
The loop iterates on a list but the list expands while the loop is being iterated :-) It's a concise way to go through all these nested data even if it's a bit dangerous since you can end up with an infinite loop. In this case,
candidates.extend(node._get_child_candidates(distance, min_dist, max_dist)) exhausts all the values of the generator, but
while keeps creating new generator objects which will produce different values from the previous ones since it's not applied on the same node.
extend() method is a list object method that expects an iterable and adds its values to the list.
Usually we pass a list to it:
>>> a = [1, 2] >>> b = [3, 4] >>> a.extend(b) >>> print(a) [1, 2, 3, 4]
But in your code it gets a generator, which is good because:
And it works because Python does not care if the argument of a method is a list or not. Python expects iterables so it will work with strings, lists, tuples and generators! This is called duck typing and is one of the reason why Python is so cool. But this is another story, for another question...
You can stop here, or read a little bit to see a advanced use of generator:
>>> class Bank(): # let's create a bank, building ATMs ... crisis = False ... def create_atm(self): ... while not self.crisis: ... yield "$100" >>> hsbc = Bank() # when everything's ok the ATM gives you as much as you want >>> corner_street_atm = hsbc.create_atm() >>> print(corner_street_atm.next()) $100 >>> print(corner_street_atm.next()) $100 >>> print([corner_street_atm.next() for cash in range(5)]) ['$100', '$100', '$100', '$100', '$100'] >>> hsbc.crisis = True # crisis is coming, no more money! >>> print(corner_street_atm.next()) <type 'exceptions.StopIteration'> >>> wall_street_atm = hsbc.create_atm() # it's even true for new ATMs >>> print(wall_street_atm.next()) <type 'exceptions.StopIteration'> >>> hsbc.crisis = False # trouble is, even post-crisis the ATM remains empty >>> print(corner_street_atm.next()) <type 'exceptions.StopIteration'> >>> brand_new_atm = hsbc.create_atm() # build a new one to get back in business >>> for cash in brand_new_atm: ... print cash $100 $100 $100 $100 $100 $100 $100 $100 $100 ...
It can be useful for various things like controlling access to a resource.
The itertools module contains special functions to manipulate iterables. Ever wish to duplicate a generator? Chain two generators? Group values in a nested list with a one liner? Map / Zip without creating another list?
An example? Let's see the possible orders of arrival for a 4 horse race:
>>> horses = [1, 2, 3, 4] >>> races = itertools.permutations(horses) >>> print(races) <itertools.permutations object at 0xb754f1dc> >>> print(list(itertools.permutations(horses))) [(1, 2, 3, 4), (1, 2, 4, 3), (1, 3, 2, 4), (1, 3, 4, 2), (1, 4, 2, 3), (1, 4, 3, 2), (2, 1, 3, 4), (2, 1, 4, 3), (2, 3, 1, 4), (2, 3, 4, 1), (2, 4, 1, 3), (2, 4, 3, 1), (3, 1, 2, 4), (3, 1, 4, 2), (3, 2, 1, 4), (3, 2, 4, 1), (3, 4, 1, 2), (3, 4, 2, 1), (4, 1, 2, 3), (4, 1, 3, 2), (4, 2, 1, 3), (4, 2, 3, 1), (4, 3, 1, 2), (4, 3, 2, 1)]
Iteration is a process implying iterables (implementing the
__iter__() method) and iterators (implementing the
Iterables are any objects you can get an iterator from. Iterators are objects that let you iterate on iterables.
More about it in this article about how does the for loop work . http://effbot.org/zone/python-for-statement.htm
When you see a function with
yield statements, apply this easy trick to understand what will happen:
result = at the start of the function.
return resultat the bottom of the function.
yieldstatements! Read and figure out code.
This trick may give you an idea of the logic behind the function, but what actually happens with
yield is significantly different that what happens in the list based approach. In many cases the yield approach will be a lot more memory efficient and faster too. In other cases this trick will get you stuck in an infinite loop, even though the original function works just fine. Read on to learn more...
First, the iterator protocol - when you write
for x in mylist: ...loop body...
Python performs the following two steps:
Gets an iterator for
iter(mylist) -> this returns an object with a
next() method (or
__next__() in Python 3).
[This is the step most people forget to tell you about]
Uses the iterator to loop over items:
Keep calling the
next() method on the iterator returned from step 1. The return value from
next() is assigned to
x and the loop body is executed. If an exception
StopIteration is raised from within
next(), it means there are no more values in the iterator and the loop is exited.
The truth is Python performs the above two steps anytime it wants to loop over the contents of an object - so it could be a for loop, but it could also be code like
otherlist is a Python list).
mylist is an iterable because it implements the iterator protocol. In a user defined class, you can implement the
__iter__() method to make instances of your class iterable. This method should return an iterator. An iterator is an object with a
next() method. It is possible to implement both
next() on the same class, and have
self. This will work for simple cases, but not when you want two iterators looping over the same object at the same time.
So that's the iterator protocol, many objects implement this protocol:
Note that a
for loop doesn't know what kind of object it's dealing with - it just follows the iterator protocol, and is happy to get item after item as it calls
next(). Built-in lists return their items one by one, dictionaries return the keys one by one, files return the lines one by one, etc. And generators return... well that's where
yield comes in:
def f123(): yield 1 yield 2 yield 3 for item in f123(): print item
yield statements, if you had three
return statements in
f123() only the first would get executed, and the function would exit. But
f123() is no ordinary function. When
f123() is called, it does not return any of the values in the yield statements! It returns a generator object. Also, the function does not really exit - it goes into a suspended state. When the
for loop tries to loop over the generator object, the function resumes from its suspended state, runs until the next
yield statement and returns that as the next item. This happens until the function exits, at which point the generator raises
StopIteration, and the loop exits.
So the generator object is sort of like an adapter - at one end it exhibits the iterator protocol, by exposing
next() methods to keep the
for loop happy. At the other end however, it runs the function just enough to get the next value out of it, and puts it back in suspended mode.
Usually you can write code that doesn't use generators but implements the same logic. One option is to use the temporary list 'trick' I mentioned before. That will not work in all cases, for e.g. if you have infinite loops, or it may make inefficient use of memory when you have a really long list. The other approach is to implement a new iterable class
SomethingIter that keeps state in instance members and performs the next logical step in it's
__next__() in Python 3) method. Depending on the logic, the code inside the
next() method may end up looking very complex and be prone to bugs. Here generators provide a clean and easy solution.
Think of it this way:
An iterator is just a fancy sounding term for an object that has a next() method. So a yield-ed function ends up being something like this:
def some_function(): for i in xrange(4): yield i for i in some_function(): print i
This is basically what the python interpreter does with the above code:
class it: def __init__(self): #start at -1 so that we get 0 when we add 1 below. self.count = -1 #the __iter__ method will be called once by the for loop. #the rest of the magic happens on the object returned by this method. #in this case it is the object itself. def __iter__(self): return self #the next method will be called repeatedly by the for loop #until it raises StopIteration. def next(self): self.count += 1 if self.count < 4: return self.count else: #a StopIteration exception is raised #to signal that the iterator is done. #This is caught implicitly by the for loop. raise StopIteration def some_func(): return it() for i in some_func(): print i
For more insight as to what's happening behind the scenes, the for loop can be rewritten to this:
iterator = some_func() try: while 1: print iterator.next() except StopIteration: pass
Does that make more sense or just confuse you more? :)
EDIT: I should note that this IS an oversimplification for illustrative purposes. :)
EDIT 2: Forgot to throw the StopIteration exception
I feel like I post a link to this presentation every day: David M. Beazly's Generator Tricks for Systems Programmers . If you're a Python programmer and you're not extremely familiar with generators, you should read this. It's a very clear explanation of what generators are, how they work, what the yield statement does, and it answers the question "Do you really want to mess around with this obscure language feature?"
SPOILER ALERT. The answer is: Yes. Yes, you do. http://www.dabeaz.com/generators/
yield keyword is reduced to two simple facts:
yieldkeyword anywhere inside a function, that function no longer returns via the
returnstatement. Instead, it immediately returns a lazy "pending list" object called a generator
rangeor dict-view, with a built-in protocol for visiting each element in a certain order.
In a nutshell: a generator is a lazy, incrementally-pending list, and
yield statements allow you to use function notation to program the list values the generator should incrementally spit out.
generator = myYieldingFunction(...) x = list(generator) generator v [x, ..., ???] generator v [x, x, ..., ???] generator v [x, x, x, ..., ???] StopIteration exception [x, x, x] done list==[x, x, x]
Let's define a function
makeRange that's just like Python's
makeRange(n) RETURNS A GENERATOR:
def makeRange(n): # return 0,1,2,...,n-1 i = 0 while i < n: yield i i += 1 >>> makeRange(5) <generator object makeRange at 0x19e4aa0>
To force the generator to immediately return its pending values, you can pass it into
list() (just like you could any iterable):
>>> list(makeRange(5)) [0, 1, 2, 3, 4]
The above example can be thought of as merely creating a list which you append to and return:
# list-version # # generator-version def makeRange(n): # def makeRange(n): """return [0,1,2,...,n-1]""" #~ """return 0,1,2,...,n-1""" TO_RETURN =  #> i = 0 # i = 0 while i < n: # while i < n: TO_RETURN += [i] #~ yield i i += 1 # i += 1 return TO_RETURN #> >>> makeRange(5) [0, 1, 2, 3, 4]
There is one major difference though; see the last section.
An iterable is the last part of a list comprehension, and all generators are iterable, so they're often used like so:
# _ITERABLE_ >>> [x+10 for x in makeRange(5)] [10, 11, 12, 13, 14]
To get a better feel for generators, you can play around with the
itertools module (be sure to use
chain.from_iterable rather than
chain when warranted). For example, you might even use generators to implement infinitely-long lazy lists like
itertools.count(). You could implement your own
def enumerate(iterable): zip(count(), iterable), or alternatively do so with the
yield keyword in a while-loop.
Please note: generators can actually be used for many more things, such as implementing coroutines  or non-deterministic programming or other elegant things. However, the "lazy lists" viewpoint I present here is the most common use you will find.
This is how the "Python iteration protocol" works. That is, what is going on when you do
list(makeRange(5)). This is what I describe earlier as a "lazy, incremental list".
>>> x=iter(range(5)) >>> next(x) 0 >>> next(x) 1 >>> next(x) 2 >>> next(x) 3 >>> next(x) 4 >>> next(x) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration
The built-in function
next() just calls the objects
.next() function, which is a part of the "iteration protocol" and is found on all iterators. You can manually use the
next() function (and other parts of the iteration protocol) to implement fancy things, usually at the expense of readability, so try to avoid doing that...
Normally, most people would not care about the following distinctions and probably want to stop reading here.
In Python-speak, an iterable is any object which "understands the concept of a for-loop" like a list
[1,2,3], and an iterator is a specific instance of the requested for-loop like
[1,2,3].__iter__(). A generator is exactly the same as any iterator, except for the way it was written (with function syntax).
When you request an iterator from a list, it creates a new iterator. However, when you request an iterator from an iterator (which you would rarely do), it just gives you a copy of itself.
Thus, in the unlikely event that you are failing to do something like this...
> x = myRange(5) > list(x) [0, 1, 2, 3, 4] > list(x) 
... then remember that a generator is an iterator; that is, it is one-time-use. If you want to reuse it, you should call
myRange(...) again. Those who absolutely need to clone a generator (for example, who are doing terrifyingly hackish metaprogramming) can use
itertools.tee if absolutely necessary, since the copyable iterator Python
 standards proposal has been deferred.
There's one extra thing to mention: a function that yields doesn't actually have to terminate. I've written code like this:
def fib(): last, cur = 0, 1 while True: yield cur last, cur = cur, last + cur
Then I can use it in other code like this:
for f in fib(): if some_condition: break coolfuncs(f);
It really helps simplify some problems, and makes some things easier to work with.
yield is just like return. It returns whatever you tell it to. The only difference is that the next time you call the function, execution starts from the last call to the yield statement.
In the case of your code, the function get_child_candidates is acting like an iterator so that when you extend your list, it adds one element at a time to the new list.
list.extend calls an iterator until it's exhausted. In the case of the code sample you posted, it would be much clearer to just return a tuple and append that to the list.
An example in plain language. I will provide a correspondence between high-level human concepts to low-level python concepts.
I want to operate on a sequence of numbers, but I don't want to bother my self with the creation of that sequence, I want only to focus on the operation I want to do. So, I do the following:
defining the generator function, i.e. the function containing a
.next()on the generator object.
StopIterationexception The generator function does not need to raise the exception, it's raised automatically when the function ends or issues a
This is what a generator does (a function that contains a
yield); it starts executing, pauses whenever it does a
yield, and when asked for a
.next() value it continues from the point it was last. It fits perfectly by design with the iterator protocol of python, which describes how to sequentially request for values.
The most famous user of the iterator protocol is the
for command in python. So, whenever you do a:
for item in sequence:
it doesn't matter if
sequence is a list, a string, a dictionary or a generator object like described above; the result is the same: you read items off a sequence one by one.
defining a function which contains a
yield keyword is not the only way to create a generator; it's just the easiest way to create one.
It's returning a generator. I'm not particularly familiar with Python, but I believe it's the same kind of thing as C#'s iterator blocks  if you're familiar with those.
There's an IBM article  which explains it reasonably well (for Python) as far as I can see.
The key idea is that the compiler/interpreter/whatever does some trickery so that as far as the caller is concerned, they can keep calling next() and it will keep returning values - as if the generator method was paused. Now obviously you can't really "pause" a method, so the compiler builds a state machine for you to remember where you currently are and what the local variables etc look like. This is much easier than writing an iterator yourself. http://csharpindepth.com/Articles/Chapter11/StreamingAndIterators.aspx
Yield gives you a generator.
def get_odd_numbers(i): return range(1, i, 2) def yield_odd_numbers(i): for x in range(1, i, 2): yield x foo = get_odd_numbers(10) bar = yield_odd_numbers(10) foo [1, 3, 5, 7, 9] bar <generator object yield_odd_numbers at 0x1029c6f50> bar.next() 1 bar.next() 3 bar.next() 5
As you can see, in the first case foo holds the entire list in memory at once. It's not a big deal for a list with 5 elements, but what if you want a list of 5 million? Not only is this a huge memory eater, it also costs a lot of time to build at the time that the function is called. In the second case, bar just gives you a generator. A generator is an iterable--which means you can use it in a for loop, etc, but each value can only be accessed once. All the values are also not stored in memory at the same time; the generator object "remembers" where it was in the looping the last time you called it--this way, if you're using an iterable to (say) count to 50 billion, you don't have to count to 50 billion all at once and store the 50 billion numbers to count through. Again, this is a pretty contrived example, you probably would use itertools if you really wanted to count to 50 billion. :)
This is the most simple use case of generators. As you said, it can be used to write efficient permutations, using yield to push things up through the call stack instead of using some sort of stack variable. Generators can also be used for specialized tree traversal, and all manner of other things.
For those who prefer a minimal working example, meditate on this interactive Python  session:
>>> def f(): ... yield 1 ... yield 2 ... yield 3 ... >>> g = f() >>> for i in g: ... print i ... 1 2 3 >>> for i in g: ... print i ... >>> # Note that this time nothing was printed
There is one type of answer that I don't feel has been given yet, among the many great answers that describe how to use generators. Here is the PL theory answer:
yield statement in python returns a generator. A generator in python is a function that returns continuations (and specifically a type of coroutines, but continuations represent the more general mechanism to understand what is going on).
Continuations in programming languages theory are a much more fundamental kind of computation than their utility in practice would give credit for. This is perhaps because they are extremely hard to reason about in the general case and also very difficult to implement. But the idea of what a continuation is, is straightforward: it is the state of a computation that has not yet finished.
To use a continuation basically entails that you start out at one point in your program, where the state of all variables looks like one thing and define the continuation. Then you make some changes, call some functions. Upon applying the continuation, your original state is restored.
yield is called, what it does is tell the function to return a continuation. When the function is called again, the function starts off in the place that it left off. So the function returns some value, and rebinds the function name to have the value of a continuation that starts off at some arbitrary point in the middle of the function again.
In other words,
yield is sort of short for:
def generatorfun(): return (some_value,lambda:<g>)
where g represents the rest of the generator (e.g. additional instances of yield) and function application actually is syntactic sugar for rebinding the rest of the function to the resulting generator.
as such the function returns both the current value, and the remainder of the computation to be performed. This design pattern is an example of continuation passing style, where the remaining computation is passed as a return argument. I use this notation instead of call/cc and mutation -- where the same behavior is achieved with control flow magic and global variables because it is significantly easier to understand and corresponds more closely to existing language constructs in python.
As a Python generator:
from itertools import islice def fib_gen(): a, b = 1, 1 while True: yield a a, b = b, a + b assert [1, 1, 2, 3, 5] == list(islice(fib_gen(), 5))
Using lexical closures instead of generators
def ftake(fnext, last): return [fnext() for _ in xrange(last)] def fib_gen2(): #funky scope due to python2.x workaround #for python 3.x use nonlocal def _(): _.a, _.b = _.b, _.a + _.b return _.a _.a, _.b = 0, 1 return _ assert [1,1,2,3,5] == ftake(fib_gen2(), 5)
Using object closures instead of generators (because ClosuresAndObjectsAreEquivalent )
class fib_gen3: def __init__(self): self.a, self.b = 1, 1 def __call__(self): r = self.a self.a, self.b = self.b, self.a + self.b return r assert [1,1,2,3,5] == ftake(fib_gen3(), 5)
I was going to post "read page 19 of Beazley's 'Python: Essential Reference' for a quick description of generators", but so many others have posted good descriptions already.
Also, note that
yield can be used in coroutines as the dual of their use in generator functions. Although it isn't the same use as your code snippet,
(yield) can be used as an expression in a function. When a caller sends a value to the method using the
send() method, then the coroutine will execute until the next
(yield) statement is encountered.
Generators and coroutines are a cool way to set up data-flow type applications. I thought it would be worthwhile knowing about the other use of the
yield statement in functions.
From a programming viewpoint, the iterators are implemented as thunks
To implement thunks (also called anonymous functions), one uses messages sent to a closure object, which has a dispatcher, and the dispatcher answers to "messages".
"next" is a message sent to a closure, created by "iter" call.
Here is a mental image of what
I like to think of a thread as having a stack (even if it's not implemented that way).
When a normal function is called, it puts its local variables on the stack, does some computation, returns and clears the stack. The values of its local variables are never seen again.
yield function, when it's called first, it similarly adds its local variables to the stack, but then takes its local variables to a special hideaway instead of clearing them, when it returns via
yield. A possible place to put them would be somewhere in the heap.
Note that it's not the function any more, it's a kind of an imprint or ghost of the function that the
for loop is hanging onto.
When it is called again, it retrieves its local variables from its special hideaway and puts them back on the stack and computes, then hides them again in the same way.