Skip to content

g105b/graph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Implementation of Dijkstra's algorithm

Finds the shortest path between two nodes in a graph, featuring asymmetric, weighted connections.

See the Wikipedia article on Dijkstra's algorithm, and see the example provided on the Wikipedia article in example/wikipedia.php.

This repository represents a Graph object, which can contain multiple Node objects. Nodes can be given multiple Connection objects, which connect two Nodes -- in one direction -- with a weight. A bidirectional connection requires two separate connections, and a bidirectional connection may have different weights in either direction.

Given a starting Node and an ending Node, the algorithm provides an array of Node objects that represent the shortest path between the Connections in the Graph.

The underlying data structure in this implementation is a Priority Queue.

Example usages

  • In a city, many roads are connected with junctions. Some roads are faster than others. With a list of all junctions as Nodes, and all roads as Connections, from point A to point B (any two junctions) in the city, what is the path that takes the least amount of travel time, given that longer roads may have faster speeds? This is sometimes known as "The Travelling Salesperson problem".
  • In computer networks, multiple devices are connected with physical wire or wireless radio links. Each link has its own bandwidth. When transferring data between device A and device B, calculations can be made to optimise speed of transfer versus cost of bandwidth.
  • In a national electrical grid simulation, there will be hundreds of parallel routes, all with different lengths and number of connected homes/businesses. The flow of electricity will always flow through the path of least resistance, and calculating this path for the simulation will allow for better efficiencies in the grid.
  • A large social network of a billion users may need to provide context to connections outside your personal network - suggesting that you might know person A because you are connected to person B, who is connected to person C, who in turn, is connected to person A.

Complexity & efficiency

In this implementation, all nodes of the graph are imported into the Priority Queue. This is not required for the shortest path to be calculated, but is done this way for completeness and code readability. From testing, the underlying optimisations of the SplPriorityQueue provide their own optimisations, and no noticeable efficiency difference can be perceived when only adding the nodes as they are detected.

Below is the result of a stress test, available at example/complexity.php, sorted by seconds taken to execute. The test was performed on a 2017 model ThinkPad X1 Carbon laptop, generation 6 (Intel Core vPro i7-8650U). Each row of the table represents a separate Graph, showing the total nodes in the first column and number of connections in the second column. Complexity is calculated by multiplying Nodes by Connections.

The Graph::findShortestPath() function takes an optional callback function. If provided, it is called for each connection of each discovered node. This is counted and labelled "Iterations". Seconds is calculated for the findShortestPath call, not including any setup time.

The Complexity:Seconds relationship is exponential, and begins rising past 1 second after 10,000 Nodes with 5,000 connections.

Nodes Connections Complexity Iterations Seconds
10 0 0 0 0.000
10 1 10 1 0.000
10 5 50 2 0.000
10 10 100 6 0.000
100 1 100 0 0.000
100 10 1,000 3 0.000
100 50 5,000 32 0.000
100 100 10,000 28 0.000
1,000 10 10,000 6 0.000
1,000 100 100,000 95 0.008
10,000 100 1,000,000 20 0.018
1,000 500 500,000 438 0.035
1,000 1,000 1,000,000 906 0.062
10,000 10,000 100,000,000 928 0.641
10,000 1,000 10,000,000 996 0.700
10,000 5,000 50,000,000 4,910 3.373
100,000 1,000 100,000,000 830 5.630
100,000 10,000 1,000,000,000 6,388 45.002
100,000 50,000 5,000,000,000 25,515 228.543
100,000 100,000 10,000,000,000 70,283 992.409

Future optimisations

The repository is currently written as a pure implementation of the algorithm, but many future optimisations are possible. TThis can be achieved by extending the base classes provided in this repository.

Optimisations are often incredibly specific to particular use cases. A typical optimisation with this algorithm is to introduce caching, and an amount of estimation (a close-enough shortest path). This can be achieved by choosing how many steps to cache for each node (X), then storing a list of all connections within X steps. This information can be stored within the Graph, greatly reducing the time complexity required to calculate lengthy paths in enormous Graphs, with the trade-off of being an estimation, rather than a perfect calculation.

Another optimisation is to split the loop into multiple index-based chunks, so each chunk can be run in parallel on separate threads, or even separate computers.