Comments on Neopythonic: Detecting Cycles in a Directed Graph

Some literatur: K.A.HawickandH.A.James, Enumerat...

2013-05-26T03:40:18.966-07:00

Some literatur:

K.A.HawickandH.A.James,
EnumeratingCircuitsandLoopsinGraphswithSelf-ArcsandMultiple-Arcs

Here's Peter's version: https://www.dropbo...

2013-03-01T09:30:01.452-08:00

Here's Peter's version: https://www.dropbox.com/s/s0hc6jr47hmoitf/find_cycle_norvig.py (He posted it here but the indentation was all messed up.)

Here's my version using OrderedDict -- O(1) me...

2013-02-28T21:55:43.976-08:00

Here's my version using OrderedDict -- O(1) membership, and no ned to re-generate links.

def find_cycle(nodes, links):
"""Given a set of nodes, and a function links(node) => [child, ...], return
a path of nodes that forms a cycle, or None if there is no cycle. Example:
>>> find_cycle(range(10), lambda x: [x+1, (x+2)%10])
[9, 1, 3, 5, 7, 9]
"""
# Adapted from a blog post by Guido:
# http://neopythonic.blogspot.com/2009/01/detecting-cycles-in-directed-graph.html
# Maintain path as a a depth-first path in the implicit tree;
# path is represented as an OrderedDict of {node: [child,...]} pairs.
while nodes:
root = nodes.pop()
path = OrderedDict({root: links(root)})
while path:
# Consider the children at the fringe of the tree
children = path[next(reversed(path))]
while children:
child = children.pop()
if child in path: ## Cycle found
pathlist = list(path)
return pathlist[pathlist.index(child):] + [child]
if child in nodes: ## New child not seen before
path[child] = links(child)
nodes.remove(child)
break
else: ## no more children; pop back up a level
path.popitem()
return None

I just got an email fro Peter Norvig pointing out ...

2013-02-28T21:29:41.597-08:00

I just got an email fro Peter Norvig pointing out that the efficiency of the algorithm can be fixed by using an OrderedDict. Nice!

I don't think Guido's definition is that i...

2012-04-12T11:55:31.028-07:00

I don't think Guido's definition is that it will call READY() on all nodes that aren't in a cycle before returning a cycle that exists, or None. From a quick look, whether some get passed to READY() before the first cycle is spotted depends. Break you cycle and then see if READY() is being called.

What is the definition of a node that is READY? P...

2012-04-12T11:00:39.755-07:00

What is the definition of a node that is READY? Playing with some cycles, it isn't quite obvious to me.

# aren't d and e not part of a cycle?
graph = { 'a': ['b', 'c'],
'b': ['c'],
'c': ['a'], # cycle
'd': ['e',],
'e': [],
}

def ready(node):
print "node %s is not in a cycle" % node

def nodes():
""" Return all of the nodes """
return graph.keys()

def edges(node):
""" Return the children of node """
return graph[node]

cycles = find_cycle(nodes, edges, ready)

print "cycles: %s" % cycles

$ ./cycle.py
cycles: ['a', 'b', 'c']

sorry.what we want is not present here.we want the...

2010-08-27T21:55:55.013-07:00

sorry.what we want is not present here.we want the efficient algorithm to find out loop in a directed graph.

It is interesting discussion. I am also looking fo...

2009-03-17T09:50:00.000-07:00

It is interesting discussion. I am also looking for an algorithm to find cycles + loop from directed graph possibly with linear complexity. This discussion gives me some idea. I will post which method I choose later
Abrehet

> Why the uppercased NODES, EDGES, and READY?I ...

2009-03-06T11:48:00.000-08:00

> Why the uppercased NODES, EDGES, and READY?

I was trying to emphasize the algorithm. These three parametrize the algorithm, and I felt the need to call them out. Any realistic implementation would not use functions but expand these in-line according to the needs of the application. So you can think of them as template parameters or macros. :-)

Why the uppercased NODES, EDGES, and READY? Don't...

2009-02-25T08:12:00.000-08:00

Why the uppercased NODES, EDGES, and READY? Don't want to be a hobgoblin, and I'm not, I prefer `my' to `self', but PEP 8 exists. :-)

I'm going to be off topic for which I apologize, b...

2009-02-12T14:28:00.000-08:00

I'm going to be off topic for which I apologize, but the bit of code here reminded me of something I was curious about earlier.

Is the 2 space indentation part of an internal Google coding guideline? I first noticed it in the app engine SDK. It seems all Python code coming out of Google shares this trait.

Just curious.. and sorry to get off topic. :D

Thank you for starting the history of Python blog. I find it fascinating and look forward to reading the next post.

your code is similar to something i used just the ...

2009-02-09T13:59:00.000-08:00

your code is similar to something i used just the other day - it's basically your routine (i assume - i've not read it in detail, but assume it's dfs checking for duplicates) turned into a generator for edges: http://www.ics.uci.edu/~eppstein/PADS/DFS.py

in your case i think you'd just filter for the nontree nodes.

@lorg: (a) It's not so much the limited scope of m...

2009-01-30T21:15:00.000-08:00

@lorg: (a) It's not so much the limited scope of my problem but the fact that the algorithm is really only a small part of much more complex code I am writing anyway. (It's less than 20 lines!)

(b) Loops are runtime cycles, not compile-time cycles. My edges are dependencies between variable and function definitions, not between values. Eventually I will have to deal with recursive function definitions, but then I'll somehow have to break those cycles since I don't want to deal with cyclical definitions.

@Guido:1. Even when your problem is more limited i...

2009-01-30T13:50:00.000-08:00

@Guido:
1. Even when your problem is more limited in scope, I find using existing libraries is best. (Especially since finding strongly connected parts is O(V+E) (DFS twice), and any algorithm to prove that there aren't any cycles could not be better in this regard.)

2. Your programs don't have loops? Care to elaborate, or is this a future blog post? :)

I had this problem way back in schooldays, I resol...

2009-01-30T10:46:00.000-08:00

I had this problem way back in schooldays, I resolved it (in Fortran...) like this :

Another way to see the problem is through node activation, which results in matrix/vector multiplication.

"Activate" each node.

Iterate :
On each iteration, each node will activate its descendents and desactivate itself, and keep being activated only if one of its predecessors do activate it (see it as a flow of information that nodes push into their children).

After n iterations, if any node is still activated, there must be a cycle (or else all of the initial activations on the nodes would have extinguished).

To achieve it, the mathematical operation is matrix multiplication.

See the graph as a matrix of 0's where i does not point to j, and 1 where it does.

Multiply your matrix by a vector of 1's, get the result and iterate : multiply n times (n is the number of nodes) your matrx by the last result. If the sum of your last vector is greater than 1, you have a graph.

I didn't count the brute version of this algorythm might not be linear at all, rather in the O(n*n). Optimizations can be done (iterate only while you still have 1's, etc.).

@rjprins: True, "if node in stack" is a search loo...

2009-01-30T10:35:00.000-08:00

@rjprins: True, "if node in stack" is a search loop (but could easily be replaced by a hash table lookup as I said above). However "if node in todo" is a set membership test, which is a hash table lookup, not an implied loop.

To everyone recommending existing graphing libraries: I'm actually not interested in the cycles, I'm interesting in proving their absence, and then doing a bottom-up algorithm on the DAG. The graphs I am thinking of are not all that large; I'm expecting thousands of nodes but not millions (if you must know, it represents a program, not a social network :-).

This is not a linear algorithm because statements ...

2009-01-30T09:44:00.000-08:00

This is not a linear algorithm because statements like "if node in stack" and "if node in all" are actually nested loops.

Here is another way to find cycles:

- count the number of incoming edges for each node. O(n+e)
- put each node with 0 incoming edges in a set S. O(n)
- while S has nodes:
- take a node n from S.
- delete n and update the number of incoming edges of the nodes connected to n. if any of those nodes now have 0 incoming edges, put them in S.
- endwhile O(n+e)
- if S is empty put there are still nodes left, you have cycle.

I would recommend checking out the networkx graph ...

2009-01-30T08:37:00.000-08:00

I would recommend checking out the networkx graph library.

Apart from being quite mature, it can also find strongly connected parts, which seems to be what you're interested in.

@Paul: I agree with you, the top variable is not u...

2009-01-30T04:47:00.000-08:00

@Paul: I agree with you, the top variable is not used, make only sense if it's:
"for node in EDGES(top):"

http://grapy.googlecode.com has a strongly connect...

2009-01-30T00:06:00.000-08:00

http://grapy.googlecode.com has a strongly connected components function you can use for this as well.

Ah, true.Tarjan's algorithm can find all the cyc...

2009-01-29T17:08:00.000-08:00

Ah, true.

Tarjan's algorithm can find *all* the cycles in a directed graph (or rather, all the strongly connected components, which includes things more complicated than cycles), with the same worst case complexity as detecting a single cycle, (which, now that I read your post more carefully, is what you are doing here). It's interesting that it's possible to do this fast.

By the way, "for node in EDGES(node):" should probably read "for node in EDGES(top):".

You could also make the function in to an iterable...

2009-01-29T16:40:00.000-08:00

You could also make the function in to an iterable so that it can find all cycles. Someone who wants just one could just call it and get the first element.

@Paul: I believe my algorithm can be linear time t...

2009-01-29T14:58:00.000-08:00

@Paul: I believe my algorithm can be linear time too (in something like the number of nodes+edges) with the iter-stack optimization and e.g. a set parallel to the stack to be able to test "node in stack" in constant time (what @Johannes pointed out).

That's assuming stack pushes and pops and hash table lookups are linear time, which IMO they are or can be made to be with enough trickery (e.g. pre-allocating a fixed stack).

@mlk: good point. Or 'todo' (I like short names :-).

If you need to do this for large graphs, Tarjan's ...

2009-01-29T14:19:00.000-08:00

If you need to do this for large graphs, Tarjan's algorithm can do it in linear time.

Here is a Python implementation.

That's very nice. I think it would make it even mo...

2009-01-29T13:46:00.000-08:00

That's very nice. I think it would make it even more easy to read and understand if `all` was renamed to `unchecked`.