Chatbots and wanderer - What is graph traversal?

2021-3-13 by Chris

Chatbot conversations can be easily depicted as a graph. Graph databases work fundamentally differently than classic relational approaches. Not only the storage of the data is different. Also the way you can read the data is special.

One way to read data from a graph database is the Graph Traversal Algorithm in its various facets. Most graph databases master this in more or less adaptable versions.

Imagine a network of nodes connected by directional lines. Directed means that the connections always have a certain direction. So each of them will have a starting node and an end node. Within the nodes data can be stored. Data can also be stored directly inside the connections.

In the example below, the green nodes are questions with their suggestions. The blue ones are chat messages. The challenge now is to get information from this network like:

  • Which question is the first one?
  • Which questions comes second?
  • How many different answere nodes must be displayed?
  • When to send which message?

There is a big yellow starting point within this network. This is the beginning of our conversation. From there we can start a so-called traversal process, which gradually follows the outgoing connections. On the way, the process collects information from our nodes and edges and returns it as a result.


Take a look at this live example.

Hiking on winding paths

One could also imagine this process as a kind of wanderer following connections in the form of streets. The questions and answers on his way are crosses with signposts. These give the hiker information about where he comes from and where he can still walk to. Each of the street is a one-way road. The hiker can only go through it in one direction. This means that the questions and all other nodes in the network have incoming and outgoing connections. Only when the hiker comes to a dead end, he must return the way he came from until he discovers a path that was not walked before.

This traversal of the network-like structure is also referred as Graph Traversal, Path Traversal or Graph Search. Basically, you can distinguish between depth-first search and breadth-first search. In the first variant, the hiker first follows the paths increasingly deeper into the thicket. In the second method, he first follows all outgoing roads of an intersection and only then he will go further into the depths. uses the depth first approach most of the time.

Collapsing networks

OK. But how to get the information in the correct order? Which is the first question? What is the second? Its easy: By traversing the structure, the network becomes a simple sequence of nodes. Because we will not visit the already visited nodes again in one traversal.

The graph above collapses into this chain during the first traversal:

  • Message 1
  • Message 2
  • Question 1

Depending on your decisions the chain may looks different through the next traversal cycle. For example if we choose suggstion 2 from the first question the chain ends up like this:

  • Message 1
  • Message 2
  • Question 1
  • Message 4
  • Question 2

Hiking maps and signposts

The exciting thing about it: Everything happens in just one query. So I just have to ask the database once and will get back a result chain containing all found nodes, their order and their data. Among other things, the result can be controlled by giving rules to the wanderer before he starts walking. For example:

  • Follow only answered suggestions
  • Go first in the depth and then in the width
  • Take bigger edges first
  • Etc...

In this way, we can use simple rules to define which information from the network is relevant to us. It does not matter if we have to cross 10, 100 or 1000 different items. At this point, an incredible strength of the graph database becomes visible: We can not only map the data in a realistic way, but also read it out in the same uncomplicated way, thanks to the connections, by literally traversing it.

I have shown you a simplified example here because there are not just messages, questions and answer options in our network. There is a complex context that needs to be crossed.

Title image from Pixabay