The function makes few assumptions about the representation of the graph; instead of a graph object, it takes in two function arguments that are called to describe the graph:
- def NODES(): an iterable returning all nodes
- def EDGES(node): an iterable returning all nodes reached via node's outgoing edges
- def READY(node): called when we know node is not part of any cycles
def find_cycle(NODES, EDGES, READY):Discussion: The EDGES() function may be called multiple times for the same node, and the for loop does some duplicate work in that case. A straightforward fix for this inefficiency is to maintain a parallel stack of iterators that is pushed and popped at the same times at the main stack, and at all times contains iter(node). I'll leave that version as an excercise.
todo = set(NODES())
while todo:
node = todo.pop()
stack = [node]
while stack:
top = stack[-1]
for node in EDGES(top):
if node in stack:
return stack[stack.index(node):]
if node in todo:
stack.append(node)
todo.remove(node)
break
else:
node = stack.pop()
READY(node)
return None
Update: Fixed a typo in the algorithm (EDGES(top)) and renamed all to todo.
I think another potential inefficiency are the linear searches for what's on the stack. A fix might be to maintain a map nodes_by_index, which map from the nodes on the stack to their indices... Still, I like your version, it's very readable.
ReplyDeleteThat's very nice. I think it would make it even more easy to read and understand if `all` was renamed to `unchecked`.
ReplyDeleteIf you need to do this for large graphs, Tarjan's algorithm can do it in linear time.
ReplyDeleteHere is a Python implementation.
@Paul: I believe my algorithm can be linear time too (in something like the number of nodes+edges) with the iter-stack optimization and e.g. a set parallel to the stack to be able to test "node in stack" in constant time (what @Johannes pointed out).
ReplyDeleteThat's assuming stack pushes and pops and hash table lookups are linear time, which IMO they are or can be made to be with enough trickery (e.g. pre-allocating a fixed stack).
@mlk: good point. Or 'todo' (I like short names :-).
You could also make the function in to an iterable so that it can find all cycles. Someone who wants just one could just call it and get the first element.
ReplyDeleteAh, true.
ReplyDeleteTarjan's algorithm can find *all* the cycles in a directed graph (or rather, all the strongly connected components, which includes things more complicated than cycles), with the same worst case complexity as detecting a single cycle, (which, now that I read your post more carefully, is what you are doing here). It's interesting that it's possible to do this fast.
By the way, "for node in EDGES(node):" should probably read "for node in EDGES(top):".
http://grapy.googlecode.com has a strongly connected components function you can use for this as well.
ReplyDelete@Paul: I agree with you, the top variable is not used, make only sense if it's:
ReplyDelete"for node in EDGES(top):"
I would recommend checking out the networkx graph library.
ReplyDeleteApart from being quite mature, it can also find strongly connected parts, which seems to be what you're interested in.
This is not a linear algorithm because statements like "if node in stack" and "if node in all" are actually nested loops.
ReplyDeleteHere is another way to find cycles:
- count the number of incoming edges for each node. O(n+e)
- put each node with 0 incoming edges in a set S. O(n)
- while S has nodes:
- take a node n from S.
- delete n and update the number of incoming edges of the nodes connected to n. if any of those nodes now have 0 incoming edges, put them in S.
- endwhile O(n+e)
- if S is empty put there are still nodes left, you have cycle.
@rjprins: True, "if node in stack" is a search loop (but could easily be replaced by a hash table lookup as I said above). However "if node in todo" is a set membership test, which is a hash table lookup, not an implied loop.
ReplyDeleteTo everyone recommending existing graphing libraries: I'm actually not interested in the cycles, I'm interesting in proving their absence, and then doing a bottom-up algorithm on the DAG. The graphs I am thinking of are not all that large; I'm expecting thousands of nodes but not millions (if you must know, it represents a program, not a social network :-).
I had this problem way back in schooldays, I resolved it (in Fortran...) like this :
ReplyDeleteAnother way to see the problem is through node activation, which results in matrix/vector multiplication.
"Activate" each node.
Iterate :
On each iteration, each node will activate its descendents and desactivate itself, and keep being activated only if one of its predecessors do activate it (see it as a flow of information that nodes push into their children).
After n iterations, if any node is still activated, there must be a cycle (or else all of the initial activations on the nodes would have extinguished).
To achieve it, the mathematical operation is matrix multiplication.
See the graph as a matrix of 0's where i does not point to j, and 1 where it does.
Multiply your matrix by a vector of 1's, get the result and iterate : multiply n times (n is the number of nodes) your matrx by the last result. If the sum of your last vector is greater than 1, you have a graph.
I didn't count the brute version of this algorythm might not be linear at all, rather in the O(n*n). Optimizations can be done (iterate only while you still have 1's, etc.).
@Guido:
ReplyDelete1. Even when your problem is more limited in scope, I find using existing libraries is best. (Especially since finding strongly connected parts is O(V+E) (DFS twice), and any algorithm to prove that there aren't any cycles could not be better in this regard.)
2. Your programs don't have loops? Care to elaborate, or is this a future blog post? :)
@lorg: (a) It's not so much the limited scope of my problem but the fact that the algorithm is really only a small part of much more complex code I am writing anyway. (It's less than 20 lines!)
ReplyDelete(b) Loops are runtime cycles, not compile-time cycles. My edges are dependencies between variable and function definitions, not between values. Eventually I will have to deal with recursive function definitions, but then I'll somehow have to break those cycles since I don't want to deal with cyclical definitions.
your code is similar to something i used just the other day - it's basically your routine (i assume - i've not read it in detail, but assume it's dfs checking for duplicates) turned into a generator for edges: http://www.ics.uci.edu/~eppstein/PADS/DFS.py
ReplyDeletein your case i think you'd just filter for the nontree nodes.
I'm going to be off topic for which I apologize, but the bit of code here reminded me of something I was curious about earlier.
ReplyDeleteIs the 2 space indentation part of an internal Google coding guideline? I first noticed it in the app engine SDK. It seems all Python code coming out of Google shares this trait.
Just curious.. and sorry to get off topic. :D
Thank you for starting the history of Python blog. I find it fascinating and look forward to reading the next post.
Why the uppercased NODES, EDGES, and READY? Don't want to be a hobgoblin, and I'm not, I prefer `my' to `self', but PEP 8 exists. :-)
ReplyDelete> Why the uppercased NODES, EDGES, and READY?
ReplyDeleteI was trying to emphasize the algorithm. These three parametrize the algorithm, and I felt the need to call them out. Any realistic implementation would not use functions but expand these in-line according to the needs of the application. So you can think of them as template parameters or macros. :-)
It is interesting discussion. I am also looking for an algorithm to find cycles + loop from directed graph possibly with linear complexity. This discussion gives me some idea. I will post which method I choose later
ReplyDeleteAbrehet
sorry.what we want is not present here.we want the efficient algorithm to find out loop in a directed graph.
ReplyDeleteWhat is the definition of a node that is READY? Playing with some cycles, it isn't quite obvious to me.
ReplyDelete# aren't d and e not part of a cycle?
graph = { 'a': ['b', 'c'],
'b': ['c'],
'c': ['a'], # cycle
'd': ['e',],
'e': [],
}
def ready(node):
print "node %s is not in a cycle" % node
def nodes():
""" Return all of the nodes """
return graph.keys()
def edges(node):
""" Return the children of node """
return graph[node]
cycles = find_cycle(nodes, edges, ready)
print "cycles: %s" % cycles
$ ./cycle.py
cycles: ['a', 'b', 'c']
I don't think Guido's definition is that it will call READY() on all nodes that aren't in a cycle before returning a cycle that exists, or None. From a quick look, whether some get passed to READY() before the first cycle is spotted depends. Break you cycle and then see if READY() is being called.
ReplyDeleteI just got an email fro Peter Norvig pointing out that the efficiency of the algorithm can be fixed by using an OrderedDict. Nice!
ReplyDeleteHere's my version using OrderedDict -- O(1) membership, and no ned to re-generate links.
ReplyDeletedef find_cycle(nodes, links):
"""Given a set of nodes, and a function links(node) => [child, ...], return
a path of nodes that forms a cycle, or None if there is no cycle. Example:
>>> find_cycle(range(10), lambda x: [x+1, (x+2)%10])
[9, 1, 3, 5, 7, 9]
"""
# Adapted from a blog post by Guido:
# http://neopythonic.blogspot.com/2009/01/detecting-cycles-in-directed-graph.html
# Maintain path as a a depth-first path in the implicit tree;
# path is represented as an OrderedDict of {node: [child,...]} pairs.
while nodes:
root = nodes.pop()
path = OrderedDict({root: links(root)})
while path:
# Consider the children at the fringe of the tree
children = path[next(reversed(path))]
while children:
child = children.pop()
if child in path: ## Cycle found
pathlist = list(path)
return pathlist[pathlist.index(child):] + [child]
if child in nodes: ## New child not seen before
path[child] = links(child)
nodes.remove(child)
break
else: ## no more children; pop back up a level
path.popitem()
return None
Here's Peter's version: https://www.dropbox.com/s/s0hc6jr47hmoitf/find_cycle_norvig.py (He posted it here but the indentation was all messed up.)
ReplyDeleteSome literatur:
ReplyDeleteK.A.HawickandH.A.James,
EnumeratingCircuitsandLoopsinGraphswithSelf-ArcsandMultiple-Arcs