Here we go, my first blog entry, ever. Let’s start with two short quizzes.
1. The well known Dijkstra’s algorithm is:
[a] A greedy algorithm
[b] A dynamic programming algorithm
[c] A primaldual algorithm
[d] It was discovered by Dantzig
2. Which is the best C++ implementation of Dijkstra’s algorithm among the following?
[a] The Boost Graph Library (BGL)
[b] The COINOR Lemon Graph Library
[c] The Google OrTools
[d] Hei dude! We can do better!!!
What is your answer for the first question? … well, the answers are all correct!
And for the second question? To know the correct answer, sorry, you have to read this post to the end…
If you are curious to learn more about the classification of the Dijkstra’s algorithm proposed in the first three answers,
please consider reading [1] and [2]. Honestly, I did not know that the algorithm was independently discovered by Dantzig [3] as a special
case of Linear Programming.
However, Dantzig is credited for the first version of the bidirectional
Dijkstra’s algorithm (should we called it Dantzig’s algorithm?), which is nowadays the best performing
algorithm on general graphs.
The bidirectional Dijkstra’s algorithm is used as benchmark to measure the speedup of modern specialized
shortest path algorithms for road networks [4,5], those algorithms that are implemented, for instance, in our
GPS navigation systems, in yours smartphones (I don’t have one, argh!), in Google Maps Directions, and Microsoft Bing Maps.
Why a first blog entry on Dijkstra’s algorithm? That’s simple.
 Have you ever implemented an efficient version of this wellknown and widely studied algorithm?
 Have you ever used the version that is implemented in wellreputed graph libraries, such as, the
Boost Graph Library (BGL),
the COINOR Lemon, and/or
Google OrTools?
I did while programming in C++, and I want to share with you my experience.
The Algorithm
The algorithm is quite simple. First partition the nodes of the input graph G=(N,A) in three
sets: the sets of (1) scanned, (2) reachable, and (3) unvisited nodes.
Every node has a distance label and a predecessor vertex . Initially, set the
label of the source node , while set for all other nodes. Moreover,
the node s is placed in the set of reachable nodes, while all the other nodes are unvisited.
The algorithm proceedes as follows: select a reachable node i with minimum distance label,
and move it in the set of scanned nodes, it will be never selected again.
For each arc (i,j) in the forward star of node i check if node j has distance label ;
if it is the case, update the label and the predecessor vertex .
In addition, if the node was unvisited, move it in the
set of reachable nodes. If the selected node i is the destination node
t, stop the algorithm.
Otherwise, continue by selecting the next node i with minimum distance label.
The algorithm stops either when it scans the destination node t or the set of reachable nodes is empty.
For the nice properties of the algorithm consult any textbook in computer science or operations research.
At this point it should be clear why Dijkstra’s algorithm is greedy: it always select a reachable node with
minimum distance label. It is a dynamic programming algorithm because it maintains the
recursive relation
for all .
If you are familiar with Linear Programming, you should recognize that the distance labels
play the role of dual variable of a flow based formulation of the shortest path problem,
and the Dijkstra’s algorithm costructs a primal solution (i.e. a path) that satisfies the dual
constraints .
Graphs and Heaps
The algorithm uses two data structures: the input graph G and the set of reachable nodes Q.
The graph G can be stored with an adjacency list, but be sure that the arcs are stored in contiguous memory,
in order to reduce the chance of cache misses when scanning the forward stars. In my implementation, I have used
a std::vector to store the forward star of each node.
The second data structure, the most important, is the priority queue Q.
The queue has to support three operations: push, update, and extractmin.
The type of priority queue used determines the worstcase complexity of the Dijkstra’s algorithm.
Theoretically, the best strongly polynomial worstcase complexity is achieved via a Fibonacci heap.
On road networks, the Multi Bucket heap yields a weakly polynomial worstcase complexity that
is more efficient in practice [4,5]. Unfortunately, the Fibonacci Heap is a rather complex data structure,
and lazy implementations end up in using a simpler Binomial Heap.
The good news is that the Boost Library from version 1.49 has a Heap library.
This library contains several type of heaps that share a common interface:
daryheap, binomialheap, fibonacciheap, pairingheap, and skewheap.
The worstcase complexity of the basic operations are summarized in a
nice table.
Contrary to textbooks, these heaps are ordered in non increasing order (they are maxheap instead of minheap), that means
that the top of the heap is always the element with highest priority. For implementing Dijkstra,
where all arc lengths are non negative, this is not a problem: we can store the elements with
the distance changed in sign (sorry for the rough explanation, but if you are really intrested it is better to read directly the source code).
The big advantage of boost::heap is that it allows to program Dijkstra once, and to compile it
with different heaps via templates. If you wonder why the Boost Graph Library does not use boost::heap,
well, the reason is that BGL was implemented a few years ago, while boost::heap appeared this year.
Benchmarking on Road Networks
Here is the point that maybe interests you the most: can we do better than wellreputed C++ graph libraries?
I have tried three graph libraries:
Boost Graph Library (BGL) v1.51,
COINOR Lemon v1.2.3, and
Google OrTools cheked out from svn on Sep 7th, 2012.
They all have a Dijkstra implementation, even if I don’t know the implementation details.
As a plus, the three libraries have python wrappers (but I have not test it).
The BGL is a header only library. Lemon came after BGL.
BGL, Lemon, and my implementation use (different) Fibonacci Heaps, while I have not clear what type of priority queue is used by OrTools.
Disclaimer: Google OrTools is much more than a graph library: among others, it has a Constraint Programming solver with very nice
features for Large Neighborhood Search; however, we are interested here only in its Dijkstra implementation.
Constraint Programming will be the subject of another future post.
A few tests on instances taken from the last DIMACS challenge on Shortest Path problems
show the pros and cons of each implementation. Three instances are generated using the rand graph generator, while 10 instances are
road networks. The test are done on my late 2008 MacBookPro using the apple gcc4.2 compiler.
All the source code, scripts, and even this post text, are available on github.
RAND Graphs
The first test compares the four implementations on 3 graphs with different density d that is the ratio .
The graphs are:
 Rand 1: with n=10000, m=100000, d=0.001
 Rand 2: with n=10000, m=1000000, d=0.01
 Rand 3: with n=10000, m=10000000, d=0.1
For each graph, 50 queries between different pairs of source and destination nodes are performed.
The table below reports the average of query times (total time divided by query numbers).
The entries in bold highlight the shortest time per row.
Graph 
MyGraph 
BGL 
Lemon 
OrTools 
Rand 1 
0.0052 
0.0059 
0.0074 
1.2722 
Rand 2 
0.0134 
0.0535 
0.0706 
1.6128 
Rand 3 
0.0705 
0.5276 
0.7247 
4.2535 
In these tests, it looks like my implementation is the winner… wow!
Although, the true winner is the boost::heap library, since the nasty implementation details
are delegated to that library.
… but come on! These are artificial graphs: who is really interested in shortest paths on random graphs?
Road Networks
The second test uses road networks that are very sparse graphs.
We report only average computation time in seconds over 50 different pair of sourcedestination nodes.
We decided to leave out OrTools since it is not very performing on very sparse graphs.
This table below shows the average query time for the standard implementations that use Fibonacci Heaps.
Area 
nodes 
arcs 
MyGraph 
BGL 
Lemon 
Western USA 
6,262,104 
15,248,146 
2.7215 
2.7804 
3.8181 
Eastern USA 
3,598,623 
8,778,114 
1.9425 
1.4255 
2.7147 
Great Lakes 
2,758,119 
6,885,658 
0.1808 
0.8946 
0.2602 
California and Nevada 
1,890,815 
4,657,742 
0.5078 
0.5808 
0.7083 
Northeast USA 
1,524,453 
3,897,636 
0.6061 
0.5662 
0.8335 
Northwest USA 
1,207,945 
2,840,208 
0.3652 
0.3506 
0.5152 
Florida 
1,070,376 
2,712,798 
0.1141 
0.2753 
0.1574 
Colorado 
435,666 
1,057,066 
0.1423 
0.1117 
0.1965 
San Francisco Bay 
321,270 
800,172 
0.1721 
0.0836 
0.2399 
New York City 
264,346 
733,846 
0.0121 
0.0677 
0.0176 
From this table, BGL and my implementation are equally good, while Lemon comes after.
What would happen if we use a diffent type of heap?
This second table shows the average query time for the Lemon graph library with a specialized
Binary Heap implementation,
and my own implementation with generic 2Heap and 3Heap (binary and ternary heaps) and with a Skew Heap.
Note that in order to use a different heap I just modify a single line of code.
Area 
nodes 
arcs 
2Heap 
3Heap 
Skew Heap 
Lemon 2Heap 
Western USA 
6,262,104 
15,248,146 
1.977 
1.934 
2.104 
1.359 
Eastern USA 
3,598,623 
8,778,114 
1.406 
1.372 
1.492 
0.938 
Great Lakes 
2,758,119 
6,885,658 
0.132 
0.130 
0.135 
0.109 
California and Nevada 
1,890,815 
4,657,742 
0.361 
0.353 
0.372 
0.241 
Northeast USA 
1,524,453 
3,897,636 
0.433 
0.421 
0.457 
0.287 
Northwest USA 
1,207,945 
2,840,208 
0.257 
0.252 
0.256 
0.166 
Florida 
1,070,376 
2,712,798 
0.083 
0.081 
0.080 
0.059 
Colorado 
435,666 
1,057,066 
0.100 
0.098 
0.100 
0.064 
San Francisco Bay 
321,270 
800,172 
0.121 
0.117 
0.122 
0.075 
New York City 
264,346 
733,846 
0.009 
0.009 
0.009 
0.007 
Mmmm… I am no longer the winner: COINOR Lemon is!
This is likely due to the specialized binary heap implementation of the Lemon library.
Instead, the boost::heap library has a daryheap, that for d=2 is a generic binary heap.
So what?
Dijkstra’s algorithm is so beatiful because it has the elegance of simplicity.
Using an existing efficient heap data structure, it is easy to implement an “efficient” version of the algorithm.
However, if you have spare time, or you need to solve shortest path problems on a specific type of graphs (e.g., road networks),
you might give a try with existing graph libraries, before investing developing time in your own implementation.
In addition, be sure to read [4] and the references therein contained.
All the code I have used to write this post is available on github.
If you have any comment or criticism, do not hesitate to comment below.
References

Pohl, I.
Bidirectional and heuristic search in path problems.
Department of Computer Science, Stanford University, 1969.
[pdf]

Sniedovich, M.
Dijkstra’s algorithm revisited: the dynamic programming connexion.
Control and cybernetics vol. 35(3), pages 599620, 2006.
[pdf]

Dantzig, G.B.
Linear Programming and Extensions.
Princeton University Press, Princeton, NJ, 1962.

Delling, D. and Sanders, P. and Schultes, D. and Wagner, D.
Engineering route planning algorithms.
Algorithmics of large and complex networks
Lecture Notes in Computer Science, Volume 5515, pages 117139, 2009.
[doi]

Goldberg, A.V. and Harrelson, C.
Computing the shortest path: Astar search meets graph theory.
Proc. of the sixteenth annual ACMSIAM symposium on Discrete algorithms, 156165, 2005.
[pdf]