TSP (formally): Let $G = (V, E, w)$ be a weighted graph where:
A tour in $G$ is a sequence of vertices $(v _ 0, \ldots, v _ k)$ such that $v _ 0 = v _ k$ and each vertex occurs in $V$ exactly once. Define the weight of a tour as
\[\sum^{k}_{i = 1} w(v_{i-1},v_i)\]Call a tour optimal if it is the lowest weight among all possible tours. Given such a $G$, can you find an optimal tour?
But informally, it asks the question:
TSP (informally): Given a list of landmarks and distances between them, can you find the shortest-distance route that visits every landmark?
So called the “travelling salesman” problem because it lets you save the most amount of fuel as you drive from city to city selling your product, the TSP is very famous in computer science as a prototypical example of a hard optimisation problem: it’s very likely that it’s impossible to write a computer program that does much better than checking every possible route.
Many online examples for the travelling salesman problem focus use abstract examples on made-up graphs. Here’s a more concrete example. This is a plot of every college and permanent private hall in Oxford:
The travelling salesman problem asks: what is the shortest running route that visits every college? Here is one optimal solution:
And here’s what actually running it looks like:
Of course, it doesn’t matter if you start and end at any other college or run around in reverse since it will still be the same distance. This solution is optimal according to the software I used, assuming that it knows the accurate walking distances between each college. It’s possible that there’s shortcuts that it doesn’t know about (e.g. in the above it’s actually possible to go diagonally across the park next to “New Marston Meadows”) and this could change the solution.
It would be very easy to come up with a route between every college by hand, but you wouldn’t know that it was optimal. Here’s how you can use a computer to find a guaranteed optimal solution.
Convert this into a list that can be used in Wolfram Language. I did this by writing an Google Sheets formula which would generate the strings you see on the right in the image above. There’s probably much better ways of doing this but I’m not very familiar with Wolfram. I used the following formula for this: =CONCATENATE("{", char(34), A1, char(34), ", ", "GeoPosition[{", B1, "}]", "},")
Use the following Wolfram Language program to actually calculate the optimal tour:
namedCoords = {...the list from above...};
distanceFunc[i_, j_] := TravelDistance[{namedCoords[[i, 2]], namedCoords[[j, 2]]}, TravelMethod -> "Walking"];
tour = FindShortestTour[Range[Length[namedCoords]], DistanceFunction -> distanceFunc, PerformanceGoal -> "Quality"];
namedTour = namedCoords[[#]] & /@ tour[[2]]; namedTour
TravelMethod
needs to be set to "Walking"
so that it will use walking distances rather than driving distances.PerformanceGoal
to "Quality"
ensures that the solution is optimal rather than an approximation. If you have a lot landmarks, you might need to set the goal to "Speed"
instead for it to find a route in any reasonable amount of time. After all, it is a hard problem!One way of measuring the difficulty of problems is the idea of a complexity class. These are ways of categorising problems by measuring the amount of time or space that’s needed to solve them according to how complicated their input is.
Two very important classes are $\mathbf P$, the set of all problems solvable in “polynomial time”, and $\mathbf{NP}$, the set of all problems solvable in “nondeterministic polynomial time”. Skimming over a lot of detail, if a problem is in $\mathbf P$ it means that there’s a computer program that solves it in an amount of time that scales polynomially with the size of the input. Polynomial growth roughly means that the time it takes is given by a formula like $\text{time taken} = \text{size of input}^n$ for some $n$ that stays the same.
If a problem is in $\mathbf{NP}$, it means that it has an algorithm where the amount of time it takes scales exponentially with the size of the input (sort of). For example, the time taken might be given by a formula like $\text{time taken} = n^{\text{size of input}}$ where $n$ is once again a number that stays the same (and is greater than $1$). This is much faster than polynomial!
To be slightly more specific, $\mathbf P$ and $\mathbf{NP}$ also contain all problems where the time it takes to solve them grows slower than polynomially or exponentially. As a consequence of this definition, every problem that is in $\mathbf P$ is also in $\mathbf{NP}$ since exponential growth always beats polynomial growth in the long run. Written mathematically, this is $\mathbf P \subseteq \mathbf{NP}$.
These classes are concerned with “decision problems”: given some input, an algorithm should output true or false depending on whether some condition is met. This includes problems like determining whether a number is prime, checking whether a list is sorted, or seeing if graph admits a Hamiltonian cycle.
The TSP isn’t strictly a decision problem, since a solution to the TSP is instead a path instead of a simple yes or no answer. But it’s possible to convert the TSP into a decision problem by constructing a related question: “given this weighted graph $G = (V, E, w)$ and a positive number $k$, is there a travelling salesman tour with weight less than $k$?”. This question has a yes or no answer and so is a decision problem.
It turns out that you can give an exponential time algorithm for this problem and so it belongs in $\mathbf{NP}$. But this isn’t quite enough to say that the TSP has no algorithm that is quicker than exponential, since to do this, you need to be able to say that the TSP is also not in $\mathbf{P}$.
If the TSP were not in $\mathbf P$, it would imply that $\mathbf P \ne \mathbf{NP}$, which is perhaps the biggest and most famous unsolved problem in computer science. As far as we know, it could be that $\mathbf P = \mathbf{NP}$, and that there is some polynomial time algorithm for TSP, but we just don’t know what that algorithm is. But all evidence so far points towards this not being the case.
The tour above uses the walking distance calculated by the TravelDistance
function between each college, rather than the straight-line distance. This is to avoid routes that superficially seem to minimise the distance, but actually involve impossible cuts through buildings and grass you’re not supposed to walk on. Interestingly, this actually makes the problem harder – if you’re using straight line distance, then it’s actually an instance of the “metric travelling salesman problem” which is a little easier to solve. For example, the metric travelling salesman problem admits quick approximation algorithms. These are polynomial time algorithms which give you a solution to the TSP, not necessarily optimal, but guaranteed to be within some factor of the optimal. The Christofides algorithm is an approximation algorithm for the TSP which is guaranteed to be at most $1.5$ times the length of the optimal solution.
I used the following coordinates for the colleges. I tried to pick the front of the colleges since the location you get from just typing the name into Google Maps would sometimes be at a random spot inside the college, which might mess up the walking distance calculation made by Wolfram.
College | Longitude, Latitude |
---|---|
Christ Church | 51.75020082543448, -1.2566674102132913 |
All Souls College | 51.75343005838777, -1.2535421840374497 |
Jesus College | 51.75355829550146, -1.2563561206347236 |
Harris Manchester College | 51.75570974565153, -1.2517368067224688 |
Hertford College | 51.75413914302557, -1.2537729448834982 |
Mansfield College | 51.7576346663018, -1.252968999604003 |
Linacre College | 51.75922245440031, -1.2503182622363669 |
St Catherine’s College | 51.7569560469285, -1.2448284803107663 |
St Hilda’s College | 51.74886513448575, -1.2450390380822247 |
St Edmund Hall | 51.75306261199469, -1.2503734001348958 |
The Queen’s College | 51.75279045166381, -1.2510212502819025 |
University College | 51.75263560778315, -1.2519799786046888 |
New College | 51.754856449240314, -1.2506139859520802 |
Merton College | 51.75132657767746, -1.2521736550639375 |
Campion Hall | 51.74985372046716, -1.258275242760393 |
Exeter College | 51.75365371291273, -1.2562940981724775 |
Corpus Christi College | 51.75110042582413, -1.2536934627049032 |
St. Peter’s College | 51.752834313714196, -1.2604859882110688 |
Nuffield College | 51.753037984505816, -1.26353223356112 |
Oriel College | 51.75137850581435, -1.2539442321908754 |
Pembroke College | 51.75019381391679, -1.2578721890536433 |
Worcester College | 51.75492721828783, -1.2632964159183049 |
Regen’s Park Colelge | 51.756683990593, -1.2609539167942967 |
Reuben College | 51.758043352078516, -1.255855936159926 |
Somerville College | 51.75975987446296, -1.2613391652668022 |
Green Templeton College | 51.76135742874665, -1.262502359524542 |
St. Anne’s College | 51.762115950301705, -1.2625991961845513 |
Wycliffe Hall | 51.76286088344048, -1.2602895767179565 |
Kellogg College | 51.76402677743072, -1.2604664652219917 |
St Antony’s College | 51.76345024923746, -1.263612658298954 |
St Hugh’s College | 51.76743490692448, -1.2624612514859312 |
Wolfson College | 51.771054769081786, -1.2556974680921367 |
Lady Margaret Hall | 51.76464338450338, -1.2543814764066628 |
Blackfriars | 51.756251586014706, -1.2596273012134334 |
St Cross College | 51.75667802506971, -1.2600647545638362 |
St John’s College | 51.75605832863825, -1.2590340001478713 |
Balliol College | 51.75442804120147, -1.2572171251803663 |
Wadham College | 51.75574172460674, -1.254760154606231 |
Trinity College | 51.75449619271991, -1.256710320873993 |
Keble College | 51.759217321427734, -1.257168600436832 |
Magdalen College | 51.75204251112559, -1.247512344710711 |
Lincoln College | 51.75313221405275, -1.2560645073210122 |
Brasenose | 51.75331532290772, -1.2543009131353227 |
Consider a signature with a single binary relation $<$. The theory $\pmb T _ {UDLO}$ is the set of sentences entailed by the axioms:
\[\begin{aligned} F_1 &: \forall x \lnot(x) \\\\ F_2 &: \forall x \forall y \forall z(x < y \land y < z \to x < z) \\\\ F_3 &: \forall x \forall y (x < y \lor y < x \lor x = y) \\\\ F_4 &: \forall x \forall y (x < y \to \exists z(x < z \land z <y)) \\\\ F_5 &: \forall x \exists y \exists z (y < x < z) \end{aligned}\]@Prove that $\pmb T _ {UDLO}$ is complete, decidable and has quantifier elimination.
We first show that $\pmb T _ {UDLO}$ has an effective quantifier-elimination procedure.
Consider a formula $\exists x F$ where $F$ is quantifier free. We aim to construct a quantifier formula $G$ with the same free variables as $\exists x F$ such that for any structure $\mathcal A$ that satisfies all sentences in $\pmb T _ {UDLO}$ and any valuation $\pmb a$ in $\mathcal A$ of the free variables, we have $\mathcal A \models \exists x F(\pmb a)$ if and only if $\mathcal A \models G(\pmb a)$.
Firstly, convert $F$ into a logically equivalent formula in DNF. Furemore, we can eliminate negative literals by replacing the subformula $\lnot (x _ i < x _ j)$ with $x _ i = x _ j \lor x _ j < x _ i$ and replacing $\lnot (x _ i = x _ j)$ with $x _ i < x _ j \lor x _ j < x _ i$.
So we may assume $F$ is in DNF and is negation-free. Then using the equivalence
\[\exists x (F_1 \lor F_2) \equiv \exists x F_1 \lor \exists x F_2\]it suffices to show how to eliminate the quantifier $\exists x$ in the case $F$ is a conjunction of atomic formulas. And using the equivalence $\exists x (F _ 1 \land F _ 2) \equiv \exists x F _ 1 \land F _ 2$ where $x$ is not free in $F _ 2$, it suffices to be able to eliminate the quantifier $\exists x$ in the case where $F$ is a conjunction of atomic formulas all of which mention $x$.
Such formulas have the form:
For some variable $y$.
If $F$ contains a conjunct $x < x$, then we have $G = \mathbf{false}$.
If $F$ contains a conjunct $x = y$ for some variable $y$, then we have $G = F[y/x]$.
If neither of the above applies then (after deleting conjuncts of the form $x = x$ if present), we can write $F$ in the form:
\[F = \bigwedge^m_{i = 1} l_i < x \land \bigwedge^n_{j = 1} x < u_j\]where $l _ i$ and $u _ j$ are variables different from $x$. If $m = 0$, then there are no lower bounds on $x$ and $G = \mathbf{true}$ since the order relation is unbounded. If $n = 0$, then $G = \mathbf{true}$ also. Otherwise, by the density of the order relation,
\[G = \bigwedge^m_{i = 1} \bigwedge^n_{j = 1} l_i < u_j\]Decidability of $\pmb T _ {UDLO}$ follows from the existence of this procedure, since we can remove the quantifiers from the inside out, and each inner proposition is equivalent to either $\mathbf{true}$ or $\mathbf{false}$.
$T _ {UDLO}$ is also complete for the same reason, since this shows every formula holds or its negation holds.
]]>The following closure properties are useful in showing certain theories are decidable.
Suppose $f : \Sigma \to \Gamma$ is a map between alphabets, and $f^\ast : \Sigma^\ast \to \Gamma^\ast$ is its extension to strings by
\[f^\ast(\sigma_1 \cdots \sigma_m) = f^\ast(\sigma_1) \cdots f^\ast(\sigma_m)\]In this context, can @state two closure properties of regular languages.
@Define Presburger arithmetic.
The theory of the structure $(\mathbb N, 0, 1, +, <)$ under the intended interpretation (@todo, is this an accurate definition? Or should you state the axioms?)
@State a decidability result about Presburger arithmetic.
is decidable.
@Prove that Presburger arithmetic is decidable.
It suffices to show that $\text{Th}(\mathbb N, 0, 1, +)$ is decidable, since any formula over $(\mathbb N, 0, 1, +, <)$ can be rewritten as a formula using $+$ and $=$ that defines the same property on $\mathbb N$ (@todo, actually check this).
To show this is decidable, we reduce determining whether a statement is satisfiable to membership of a specific word in a regular languages.
Consider a quantifier-free formula $F$ over variables $x _ 1, \ldots, x _ n$ and consider the corresponding alphabet
\[\Sigma_n = \\{ \begin{bmatrix} 0 \\\\ 0 \\\\ \vdots \\\\ 0 \end{bmatrix}, \dots, \begin{bmatrix} 1 \\\\ 1 \\\\ \vdots \\\\ 1 \end{bmatrix} \\}\]which represents a valuation of these variables using little endian, where the value of $x _ i$ corresponds to the $i$-th row of the vector.
Define $A _ =$ representing the subset of $\Sigma _ 2^\ast$ where the elements satisfy $x _ 1 = x _ 2$ by the DFA:
And define $A _ +$ representing the subset of $\Sigma _ 3^\ast$ where the elements satisfy $x _ 1 + x _ 2 = x _ 3$ by:
We now define $A _ F$ by induction on the structure of $F$. Define
\[\pi_{} : \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \mapsto \begin{bmatrix} x_i \\ \vdots \\ x_k \end{bmatrix}\]This covers all quantifier-free formulas.
Now consider a sentence
\[Q_1 x_1 \cdots Q_n x_n F\]in prenex form. For $k = 0, \ldots, n$ we write $F _ k := Q _ {k+1} x _ {k+1} Q _ n x _ n F^\ast$ an define a corresponding automaton $A _ k$ over alphabet $\Sigma _ k$ such that $A _ k$ accepts the set of values of variables $x _ 1, \ldots, x _ k$ that satisfy $F _ k$. An invariant is that $A _ k$ has non-empty lanugage iff formula $F _ k$ is satisfiable.
Take $A _ n$ to be $A _ F$.
Now suppose that $F _ {k-1} = \exists x _ k F _ k$. By induction, $A _ k$ is an automaton on an alphabet $A _ k$ corresponding to $F _ k$. Define $A _ {k-1}$ to be an automaton whose lanugage is $\pi(L(A _ k))$ where $\pi : \Sigma _ k \to \Sigma _ {k-1}$ is the map that projects out the $k$-th coordinate of each tuple in $\Sigma _ k$.
Handle the universal quantifier $\forall x _ k$ by treating it as shorthand for $\lnot \exists x _ k \lnot$.
We end up with an automaton $A _ 0$ for the sentence $F _ 0$ over the alphabet $\Sigma _ 0$. This automaton has non-empty lanugage iff $(\mathbb N, +, 0, 1)$ satisfies $F _ 0$.
Suppose $\sigma$ is a signature with a single binary relation $R$. Can you define the “theory fo the random graph” $\pmb T _ {RG}$ in this context, and give intutitive definitions of each axiom?
The $\sigma$-theory axiomatised by the sentences
\[\begin{aligned} F_1& : \exists x \exists y \lnot(x=y) \\\\ F_2& : \forall x \lnot R(x, x) \\\\ F_3& : \forall x \forall y(R(x, y) \to R(y, x)) \\\\ \\\\ H_{m,n}&: \forall x_1 \cdots \forall x_m \forall y_1 \cdots \forall y_n \left( \left(\bigwedge^m_{i=1} \bigwedge^n_{j=1} \lnot(x_i = y_j)\right) \to \exists z \bigwedge^m_{i = 1}R(x_i, z) \land \bigwedge^n_{j = 1}\lnot R(y_j, z) \right) \end{aligned}\]Suppose $\sigma$ is a signature with a single binary relation $R$ and $\pmb T _ {RG}$ is the theory of the random graph, described by the axioms
\[\begin{aligned} F_1& : \exists x \exists y \lnot(x=y) \\\\ F_2& : \forall x \lnot R(x, x) \\\\ F_3& : \forall x \forall y(R(x, y) \to R(y, x)) \\\\ \\\\ H_{m,n}&: \forall x_1 \cdots \forall x_m \forall y_1 \cdots \forall y_n \left( \left(\bigwedge^m_{i=1} \bigwedge^n_{j=1} \lnot(x_i = y_j)\right) \to \exists z \bigwedge^m_{i = 1}R(x_i, z) \land \bigwedge^n_{j = 1}\lnot R(y_j, z) \right) \end{aligned}\]@Prove that for all $m, n \in \mathbb N$, we have
\[\lim_{N \to \infty} \mathbb P_N(H_{m,n}) = 1\]where $\mathbb P _ N(\varphi)$ is the probability that a random graph with $N$ nodes satisfies $\varphi$.
Let $N > n+m$ and pick some tuples $\pmb a = (a _ 1, \ldots, a _ m)$, $\pmb b = (b _ 1, \ldots, b _ n)$ drawn from the set $\{1, \ldots, N\}$.
Claim: For a graph $G$ drawn uniformly at random from $\pmb G _ N$,
\[\mathbb P\left( G \not\models \exists z \left( \bigwedge^m_{i = 1}E(a_i, z) \land \bigwedge^n_{j = 1} E(b_j, z)\right) \right) \le q^{N - m - n}\]where
\[q := 1 - 2^{-n-m} < 1\]Proof: For each possible choice $c$ from
\[\\{1, \ldots, N\\} \setminus \\{a_1, \ldots, a_m, b_1, \ldots, b_m\\}\]the probability
\[G \not\models \bigwedge^m_{i = 1} E(a_i, c) \land \bigwedge^n_{j = 1} E(b_j, c)\]is at most $q$. Since these are independent events for the $N - m - n$ choices of $c$, the claim follows.
Proof of original statement: Take the union bound ($\mathbb P(\bigcup A _ n) \le \sum \mathbb P(A _ n)$) over the $N^{n+m}$ possible choices of $a _ 1, \ldots, a _ m, b _ 1, \ldots, b _ n \in \{1, \ldots, N\}$. We have that
\[\mathbb P_N(\lnot H_{m,n}) \le N^{n+m} q^{N-n-m}\]Since $q < 1$, $\lim _ {N \to \infty} \mathbb P _ N(H _ {m,n}) = 1$.
Suppose $\sigma$ is a signature with a single binary relation $R$ and $\pmb T _ {RG}$ is the theory of the random graph, described by the axioms
\[\begin{aligned} F_1& : \exists x \exists y \lnot(x=y) \\\\ F_2& : \forall x \lnot R(x, x) \\\\ F_3& : \forall x \forall y(R(x, y) \to R(y, x)) \\\\ \\\\ H_{m,n}&: \forall x_1 \cdots \forall x_m \forall y_1 \cdots \forall y_n \left( \left(\bigwedge^m_{i=1} \bigwedge^n_{j=1} \lnot(x_i = y_j)\right) \to \exists z \bigwedge^m_{i = 1}R(x_i, z) \land \bigwedge^n_{j = 1}\lnot R(y_j, z) \right) \end{aligned}\]Can you state a result about the probability that a formula is satisfied.
For every $\sigma$-formula $\varphi$ the limit $\lim _ {N \to \infty} \mathbb P _ N(\varphi)$ exists and is either zero or one, and $\pmb T _ {RG} = \{\varphi : \lim _ {N \to \infty} \mathbb P _ N(\varphi) = 1\}$.
Suppose $\sigma$ is a signature with a single binary relation $R$ and $\pmb T _ {RG}$ is the theory of the random graph, described by the axioms
\[\begin{aligned} F_1& : \exists x \exists y \lnot(x=y) \\\\ F_2& : \forall x \lnot R(x, x) \\\\ F_3& : \forall x \forall y(R(x, y) \to R(y, x)) \\\\ \\\\ H_{m,n}&: \forall x_1 \cdots \forall x_m \forall y_1 \cdots \forall y_n \left( \left(\bigwedge^m_{i=1} \bigwedge^n_{j=1} \lnot(x_i = y_j)\right) \to \exists z \bigwedge^m_{i = 1}R(x_i, z) \land \bigwedge^n_{j = 1}\lnot R(y_j, z) \right) \end{aligned}\]@Prove that for every $\sigma$-formula $\varphi$ the limit $\lim _ {N \to \infty} \mathbb P _ N(\varphi)$ exists and is either zero or one, and $\pmb T _ {RG} = \{\varphi : \lim _ {N \to \infty} \mathbb P _ N(\varphi) = 1\}$ (you can assume $\pmb T _ {RG}$ is complete).
Since $\pmb T _ {RG}$ is complete, it suffices to show that
\[\lim_{N \to \infty} \mathbb P_N(\varphi) = 1\]for every $\varphi$ in $\pmb T _ {RG}$.
By the compactness theorem for first-order logic, there exist $m, n \in \mathbb N$ such that $\{F _ 1, F _ 2, F _ 3, H _ {m,n}\}$ entails $\varphi$. Hence $\mathbb P _ N(\varphi) \ge \mathbb P _ N(H _ {m,n})$ which entails $\lim _ {N \to \infty} \mathbb P _ N(\varphi) = 1$.
@Define what it means for a theory $T$ to be decidable.
$\pmb T$ has an algorithm that, given a sentence $F$, determines whether or not $F \in \pmb T$.
@Define what it means for a theory $\pmb T$ to admit quantifier elimination.
For any formula of the form $\exists x F$ where $F$ is quantifier-free, there exists a quantifier-free formula $G$ with the same free variables and $\pmb T \models \exists x F \leftrightarrow G$.
How can you use a quantifier elimination procedure and something else to show a theory $T$ is decidable?
@Define what it means for a set to be cofinite.
The complement is a finite set.
Suppose $\sigma$ is a signature. @Define a theory $\pmb T$.
A set of $\sigma$-sentences that is closed under semantic entailment, i.e. if $\pmb T \models F$, then $F \in \pmb T$.
Suppose $\sigma$ is a signature and $\mathcal A$ is a $\sigma$-structure. @Define $\text{Th}(\mathcal A)$.
The set of sentences that are satisfied in $\mathcal A$.
Suppose $\pmb S$ is a set of sentences. How can you construct a theory, and what is $S$ called in this context?
$\pmb S$ is called the axioms of $\pmb T$.
@Define what it means for a theory $\pmb T$ to be complete.
For any sentence $F$, either $F \in \pmb T$ or $\lnot F \in \pmb T$.
@Define a signature $\sigma$ in first-order logic.
Suppose:
In this context, @define the set of $\sigma$-terms.
Suppose:
In this context, @define the set of fomulas in first-order logic.
Suppose:
@Define an atomic formula in this context.
If $t _ 1, \ldots, t _ k$ are terms and $P$ is a $k$-ary function symbol, then $P(t _ 1, \ldots, t _ k)$ is an atomic formula.
Which is it: $\forall x F \land G$ is
In a formula $\exists x G$, what is the scope of $\exists x$?
What does it mean for a variable $x$ in a propositional formula to be bound/free?
Suppose you have a signature $\sigma$ which defines
In this context, @define a $\sigma$-structure $\mathcal A$.
Suppose you have a signature $\sigma$ which defines
and a corresponding $\sigma$-structure $\mathcal A$:
In this context, can you inductively @define for each term $t$ a corresponding function $t^\mathcal A : A^n \to A$?
Suppose:
@Define the satisfaction relation
\[\mathcal A \models F(\pmb a)\].
It is defined inductively like so:
If working in first-order logic with equality, then:
@Define what it means for a first-order formula $F$ over a signature $\sigma$ to be satisfiable.
$\mathcal A \models F(\pmb a)$ for some $\sigma$-structure $\mathcal A$ and valuation $\pmb a$ of free variables in $F$.
@Define what it means for a first-order formula $F$ over a signature $\sigma$ to be valid.
$\mathcal A \models F(\pmb a)$ for every $\sigma$-structure $\mathcal A$ and valuation $\pmb a$ of free variables in $F$.
@Define the notation
\[\pmb S \models F\]where $\pmb S$ is a set of first-order formulas over a signature $\sigma$.
Every $\sigma$-structure $\mathcal A$ that satisfies $\pmb S$ also satisfies $F$. (@todo, does this definition mean that the formulas don’t have any free variables?)
@Define what it means for a formula to be in prenex form, and what the matrix of the formula is.
The formula can be written
\[Q_1 y_1 Q_2 y_2 \cdots Q_n y_n F\]where each $Q _ i$ is a quantifier and $F$ contains no quantifiers. and $F$ is the matrix of the formula.
Suppose:
@Define
\[F[t/x]\]The formula with $t$ substituted for every free occurences of $x$ in $F$.
@State the translation lemma.
Suppose:
@Define a partial assignment.::
A function
\[v : D \to \\{0, 1\\}\]where $D$ is either the infinite set $\{p _ 1, \ldots, p _ n\}$ or a finite initial segment $\{p _ 1, \ldots, p _ n\}$.
Suppose:
@Define what it means for $v’$ to extend $v$.
and
\[v[p_i] = v'[p_i]\]for all $p _ i \in \text{dom}(v)$.
@State the compactness theorem.
Suppose:
Then:
@Prove the compactness theorem, i.e. that if:
then:
Forward direction: If $S$ is satisfiable then every finite subset has to be satisfiable.
Backward direction: Let $S$ be a set of formulas such that every finite subset of $S$ is satisfiable.
Say a partial assignment $v$ is “good” if it satisfies any formula $F \in S$ that only mentions propositional variables in the domain of $v$.
Note that for each $n \in \mathbb N$, there is a partial assignment $v$ with formulas that mention only propositional variables $p _ 1, p _ 2, \ldots, p _ n$. Consider a subset $S’ \subseteq S$ that only mentions formulas on propositional variables $p _ 1, \ldots, p _ n$.
$S’$ might be infinite, but it can only contain finitely many formulas up to logical equivalence.
Since all finite subsets of $S$ are satisfiable, it follows $S’$ is satisfiable by a partial assignment $v$, and by construction this assignment is good.
Now we construct a sequence of good partial assignments $v _ 0, v _ 1, \ldots$ with $\text{dom}(v _ n) = \{p _ 1, \ldots, p _ n\}$ where $v _ {n+1}$ extends $v _ n$ for each $n$ and the inductive hypothesis that there are infinitely many good partial assignments that extend $v _ n$.
To do this, take $v _ 0$ to be the assignment with an empty domain. Since there is a good assignment with domain $\{p _ 1, \ldots, p _ n\}$ for every $n$, there are infinitely many good assignments that extend $v _ 0$. So the inductive hypothesis is satisfied.
Now suppose we have constructed assignments $v _ 0, \ldots, v _ n$ all satisfying the inductive hypothesis. Take two assignments $v, v’$ that extend $v _ n$ with $\text{dom}(v) = \text{dom}(v’) = \{p _ 1, \ldots, p _ {n+1}\}$. Since any proper extension of $v _ n$ is an extension of either $v$ or $v’$, it follows that one (or both) of $v$ and $v’$ have infinitely mnay good extensions. Let $v _ {n+1}$ be $v$ or $v’$ depending on which satisfies this property, and therefore the inductive hypothesis is satisfied.
Now define a total assignment $v[p _ n] := v _ n[p _ n]$ for each $n \in \mathbb N$. Then $v$ satisfies all formulas in $S$, since for any formula in $S$ that contains $n$ propositional variables $\{p _ 1, \ldots, p _ n\}$, $v _ n$ satisfies $F$. Since $v$ extends $v _ n$, it follows $v$ satisfies $F$ also.
A textbook written by Michael Artin (son of Emil Artin!) on algebra.