In the past few days I’ve been thinking about implementing depth-first-search algorithm that doesn’t rely on the stack (data structure) and the stack (activation record). Graph traversal relying on the most minimum data structures. I decided to go ahead and try. To refresh my memory, after a few years of not writing graph algorithms, I decided on https://en.wikipedia.org/wiki/Depth-first_search
Why I avoided recursion:
Graph/Tree problems naturally lend themselves to recursion: A tree is a tree and it’s children can be thought of as trees (smaller trees). A graph is a graph and it’s nodes are also special cases of graphs (smaller graphs). This concept lends to the tendency of thinking of each individual node (sub tree or sub graph) as a unique tree/graph which is true. Therefore examining each node might require putting each node in the stack, calling each traversal function n times to traverse n nodes. Depending on the size of the graph, in the worst case, this would overload the stack with too much data. Additionally for each function added to the call stack the cpu will do more work to remember all the nodes added to the stack (inserting and popping data on the activation records). This isn’t ideal.
An alternative to loading each node into the stack is to use one activation record and keep replacing it’s contents with the current node being examined during traversal and this can be done by using a loop.
Loop based implementation:
An approach to a loop based implementation is to first be aware of all the potential states a node can be in.
A node can either be the root node (where the graph traversal begins at), a node with children (1 or more nodes it’s pointing to), a node that has no children (points to no nodes). When a node is loaded into the stack it will be examined for the above properties. If the value being searched for isn’t found in the current node the search (traversal) can continue by examining the children nodes (if any) or the parent node (previous node) if any, if no other possible path is found then the search must be complete.
Avoiding searching already searched nodes:
It’s important to keep track of already visited nodes otherwise the search may never end. One way to avoid visiting already visited nodes is to write each node and its children onto a stack data structure and pop each node once visited. In lieu of a stack I added a “prev” field to each node indicating each node’s parent or null/None if not applicable. In Depth First Search the ability to go back to the previous node allows the search to continue if a leaf node is encountered. In Breadth First Search the previous node may need to be the next sibling node as opposed to the parent since the BFS occurs one depth at a time.
After writing about avoiding searching previously searched nodes, during an implementation of BFS I forgot about keeping track of visited nodes. If you notice that you’re experiencing an infinite loop during testing you’re probably not marking visited nodes.
One way to notice infinite loops is to print the values being added to the queue and the current queue size in each iteration. This will visibly show an infinite loop is happening especially if testing your algorithm on a small graph. If the printed output shows the size of the queue as strangely large or as never reaching 0 (zero) then something is wrong in the implementation.
Marking nodes as visited is the only way to prevent cycles (if any cycles exist).
Thinking about BFS without the use of any additional data structures will require some more additional thought. During the winter break I’ll make time to examine the issue properly :}
[TODO: Implement BFS without using additional data structures]
Here is the working Python code I have so far: