Finding the shortest route between every Oxford college




Some runs can be physically hard: the route might be very long, there might be lots of elevation gain, the path might be muddy. But fairly few runs are computationally hard in that you have to solve a difficult optimisation problem in order to run them. The travelling salesman problem (or “TSP”) is a graph-theoretical problem that leads to computationally hard running routes. Formally, it is defined like so:

TSP (formally): Let $G = (V, E, w)$ be a weighted graph where:

  • $V$ is a finite set of vertices
  • $E \subseteq V \times V$ is a finite set of edges, a symmetric relation on vertices
  • $w : E \to \mathbb R _ {\ge 0}$ is a symmetric function assigning each edge $(u, v) \in E$ a positive weight

A tour in $G$ is a sequence of vertices $(v _ 0, \ldots, v _ k)$ such that $v _ 0 = v _ k$ and each vertex occurs in $V$ exactly once. Define the weight of a tour as

\[\sum^{k} _ {i = 1} w(v _ {i-1},v _ i)\]

Call a tour optimal if it is the lowest weight among all possible tours. Given such a $G$, can you find an optimal tour?

But informally, it asks the question:

TSP (informally): Given a list of landmarks and distances between them, can you find the shortest-distance route that visits every landmark?

So called the “travelling salesman” problem because it lets you save the most amount of fuel as you drive from city to city selling your product, the TSP is very famous in computer science as a prototypical example of a hard optimisation problem: it’s very likely that it’s impossible to write a computer program that does much better than checking every possible route.

The travelling scholar problem

Many online examples for the travelling salesman problem use abstract examples, like finding the shortest TSP tour of a graph like this:

Here’s a more concrete example. This is a plot of every college and permanent private hall in Oxford:

The travelling salesman problem asks: what is the shortest running route that visits every college? Here is one optimal solution:

  1. Christ Church
  2. Pembroke College
  3. Oriel College
  4. Corpus Christi College
  5. Merton College
  6. The Queen’s College
  7. St Edmund Hall
  8. University College
  9. St Hilda’s College
  10. Magdalen College
  11. St Catherine’s College
  12. Linacre College
  13. Mansfield College
  14. Harris Manchester College
  15. New College
  16. Hertford College
  17. All Souls College
  18. Brasenose
  19. Lincoln College
  20. Jesus College
  21. Exeter College
  22. Balliol College
  23. Trinity College
  24. Wadham College
  25. Reuben College
  26. Keble College
  27. Lady Margaret Hall
  28. Wolfson College
  29. St Hugh’s College
  30. Kellogg College
  31. Wycliffe Hall
  32. St Antony’s College
  33. St. Anne’s College
  34. Green Templeton College
  35. Somerville College
  36. Regen’s Park Colelge
  37. St Cross College
  38. Blackfriars
  39. St John’s College
  40. Worcester College
  41. Nuffield College
  42. St. Peter’s College
  43. Campion Hall
  44. Christ Church

And here’s what actually running it looks like:

(GPX file of the route).

Of course, this is not unique: it doesn’t matter if you start and end at any other college or run around in reverse since it will still be the same distance. This solution is optimal according to the software I used, assuming that it knows the accurate walking distances between each college. It’s possible that there’s shortcuts that it doesn’t know about (e.g. in the above it’s actually possible to go diagonally across the park next to “New Marston Meadows”) and this could change the solution.

Cooking your own optimal tours at home

It would be very easy to come up with a route between every college by hand, but you wouldn’t know that it was optimal. Here’s how you can use a computer to find a guaranteed optimal solution.

  • Ingredients: I used Wolfram Language (formely Mathematica), but if you don’t have access to this, it would also be possible to find a tour using free and open source tools like OR-Tools (Python, C# or Java) as well as manually calculating all the $N^2$ pairwise distances for the distance matrix.
  • Make a spreadsheet of the coordinates for every landmark you want to visit. I did this using Google Sheets and Google Maps. Google Maps has a very handy feature where if you right click anywhere, you can immediately copy the coordinates of that point to the clipboard.

  • Convert this into a list that can be used in Wolfram Language. I did this by writing an Google Sheets formula which would generate the strings you see on the right in the image above. There’s probably much better ways of doing this but I’m not very familiar with Wolfram. I used the following formula for this: =CONCATENATE("{", char(34), A1, char(34), ", ", "GeoPosition[{", B1, "}]", "},")

  • Use the following Wolfram Language program to actually calculate the optimal tour:

namedCoords = {...the list from above...};

distanceFunc[i_, j_] := TravelDistance[{namedCoords[[i, 2]], namedCoords[[j, 2]]}, TravelMethod -> "Walking"];

tour = FindShortestTour[Range[Length[namedCoords]], DistanceFunction -> distanceFunc, PerformanceGoal -> "Quality"];

namedTour = namedCoords[[#]] & /@ tour[[2]]; namedTour
  • There are two important things in the above code:
    • Firstly, the TravelMethod needs to be set to "Walking" so that it will use walking distances rather than driving distances.
    • Secondly, setting PerformanceGoal to "Quality" ensures that the solution is optimal rather than an approximation. If you have a lot landmarks, you might need to set the goal to "Speed" instead for it to find a route in any reasonable amount of time. After all, it is a hard problem!
  • Look at the list and then plot the route. The above program prints out a list of landmark names. I then used plotaroute.com to create a GPX file which routed between each of the colleges in order.
  • Go and run/walk/drive it! If you’re lacking motivation while you’re out actually completing it, remember out of all possible running routes you could’ve made, this one will be the shortest.

Why is the TSP difficult?

One way of measuring the difficulty of problems is the idea of a complexity class. These are ways of categorising problems by measuring the amount of time or space that’s needed to solve them according to how complicated their input is.

Two very important classes are $\mathbf P$, the set of all problems solvable in “polynomial time”, and $\mathbf{NP}$, the set of all problems solvable in “nondeterministic polynomial time”. Skimming over a lot of detail, if a problem is in $\mathbf P$ it means that there’s a computer program that solves it in an amount of time that scales polynomially with the size of the input. Polynomial growth roughly means that the time it takes is given by a formula like $\text{time taken} = \text{size of input}^n$ for some $n$ that stays the same.

If a problem is in $\mathbf{NP}$, it means that it has an algorithm where the amount of time it takes scales exponentially with the size of the input (sort of). For example, the time taken might be given by a formula like $\text{time taken} = n^{\text{size of input}}$ where $n$ is once again a number that stays the same (and is greater than $1$). This is much faster than polynomial!

To be slightly more specific, $\mathbf P$ and $\mathbf{NP}$ also contain all problems where the time it takes to solve them grows slower than polynomially or exponentially. As a consequence of this definition, every problem that is in $\mathbf P$ is also in $\mathbf{NP}$ since exponential growth always beats polynomial growth in the long run. Written mathematically, this is $\mathbf P \subseteq \mathbf{NP}$.

These classes are concerned with “decision problems”: given some input, an algorithm should output true or false depending on whether some condition is met. This includes problems like determining whether a number is prime, checking whether a list is sorted, or seeing if graph admits a Hamiltonian cycle.

The TSP isn’t strictly a decision problem, since a solution to the TSP is instead a path instead of a simple yes or no answer. But it’s possible to convert the TSP into a decision problem by constructing a related question: “given this weighted graph $G = (V, E, w)$ and a positive number $k$, is there a travelling salesman tour with weight less than $k$?”. This question has a yes or no answer and so is a decision problem.

It turns out that you can give an exponential time algorithm for this problem and so it belongs in $\mathbf{NP}$. But this isn’t quite enough to say that the TSP has no algorithm that is quicker than exponential, since to do this, you need to be able to say that the TSP is also not in $\mathbf{P}$.

If the TSP were not in $\mathbf P$, it would imply that $\mathbf P \ne \mathbf{NP}$, which is perhaps the biggest and most famous unsolved problem in computer science. As far as we know, it could be that $\mathbf P = \mathbf{NP}$, and that there is some polynomial time algorithm for TSP, but we just don’t know what that algorithm is. But all evidence so far points towards this not being the case.

Appendix: Some specifics

The tour above uses the walking distance calculated by the TravelDistance function between each college, rather than the straight-line distance. This is to avoid routes that superficially seem to minimise the distance, but actually involve impossible cuts through buildings and grass you’re not supposed to walk on. Interestingly, this actually makes the problem harder – if you’re using straight line distance, then it’s actually an instance of the “metric travelling salesman problem” which is a little easier to solve. For example, the metric travelling salesman problem admits quick approximation algorithms. These are polynomial time algorithms which give you a solution to the TSP, not necessarily optimal, but guaranteed to be within some factor of the optimal. The Christofides algorithm is an approximation algorithm for the TSP which is guaranteed to be at most $1.5$ times the length of the optimal solution.

I used the following coordinates for the colleges. I tried to pick the front of the colleges since the location you get from just typing the name into Google Maps would sometimes be at a random spot inside the college, which might mess up the walking distance calculation made by Wolfram.

College Longitude, Latitude
Christ Church 51.75020082543448, -1.2566674102132913
All Souls College 51.75343005838777, -1.2535421840374497
Jesus College 51.75355829550146, -1.2563561206347236
Harris Manchester College 51.75570974565153, -1.2517368067224688
Hertford College 51.75413914302557, -1.2537729448834982
Mansfield College 51.7576346663018, -1.252968999604003
Linacre College 51.75922245440031, -1.2503182622363669
St Catherine’s College 51.7569560469285, -1.2448284803107663
St Hilda’s College 51.74886513448575, -1.2450390380822247
St Edmund Hall 51.75306261199469, -1.2503734001348958
The Queen’s College 51.75279045166381, -1.2510212502819025
University College 51.75263560778315, -1.2519799786046888
New College 51.754856449240314, -1.2506139859520802
Merton College 51.75132657767746, -1.2521736550639375
Campion Hall 51.74985372046716, -1.258275242760393
Exeter College 51.75365371291273, -1.2562940981724775
Corpus Christi College 51.75110042582413, -1.2536934627049032
St. Peter’s College 51.752834313714196, -1.2604859882110688
Nuffield College 51.753037984505816, -1.26353223356112
Oriel College 51.75137850581435, -1.2539442321908754
Pembroke College 51.75019381391679, -1.2578721890536433
Worcester College 51.75492721828783, -1.2632964159183049
Regen’s Park Colelge 51.756683990593, -1.2609539167942967
Reuben College 51.758043352078516, -1.255855936159926
Somerville College 51.75975987446296, -1.2613391652668022
Green Templeton College 51.76135742874665, -1.262502359524542
St. Anne’s College 51.762115950301705, -1.2625991961845513
Wycliffe Hall 51.76286088344048, -1.2602895767179565
Kellogg College 51.76402677743072, -1.2604664652219917
St Antony’s College 51.76345024923746, -1.263612658298954
St Hugh’s College 51.76743490692448, -1.2624612514859312
Wolfson College 51.771054769081786, -1.2556974680921367
Lady Margaret Hall 51.76464338450338, -1.2543814764066628
Blackfriars 51.756251586014706, -1.2596273012134334
St Cross College 51.75667802506971, -1.2600647545638362
St John’s College 51.75605832863825, -1.2590340001478713
Balliol College 51.75442804120147, -1.2572171251803663
Wadham College 51.75574172460674, -1.254760154606231
Trinity College 51.75449619271991, -1.256710320873993
Keble College 51.759217321427734, -1.257168600436832
Magdalen College 51.75204251112559, -1.247512344710711
Lincoln College 51.75313221405275, -1.2560645073210122
Brasenose 51.75331532290772, -1.2543009131353227



Related posts