Geometric Deep Learning HT26, Graphs
Flashcards
Basic definitions
4d3d61f89a444dfc8ef08ac6dbc5ffc2@Define the difference between transductive and inductive tasks on graphs.
- In a transductive task, the graph is the same but the nodes may differ for each task
- In an inductive task, the graph can be completely different for each task
de0936941b464c52a5d9d10d784fd444@State the desired property of functions acting on graphs.
They should be permutation invariant to the ordering of the nodes.
bbb2f80ffe3440b2afd9bcb5352d1ec4Suppose we have some function $f(\mathbf X, \mathbf A)$ operating on graph nodes and features. @Define what it means for $f$ to be permutation invariant and what it means to be permutation equivariant.
and
\[\mathbf F(\mathbf{PX}, \mathbf{PAP}^\top) = \mathbf{PF}(\mathbf X, \mathbf A)\]for any permutation matrix $\mathbf P$.
5fbae686c642471181a7dabfea0ad54e@State the general blueprint for constructing permutation equivariant functions on graphs, and visualise this procedure.
Define a local function $\phi$ that operates over a node and its neighbourhood, $\phi(\mathbf x _ u, \mathbf X _ {\mathcal N _ u})$. Then a permutation equivariant function $\mathbf F$ can be constructed by applying $\phi$ to every node’s neighbourhood in isolation, i.e.
\[F(\mathbf{X}, A) = \begin{bmatrix} \cdots \quad \phi(\mathbf{x} _ 1, \mathbf{X} _ {\mathcal{N} _ 1}) \quad \cdots \\ \cdots \quad\phi(\mathbf{x} _ 2, \mathbf{X} _ {\mathcal{N} _ 2}) \quad \cdots \\ \vdots \\ \cdots \quad\phi(\mathbf{x} _ n, \mathbf{X} _ {\mathcal{N} _ n}) \quad \cdots \end{bmatrix}\]
Graph neural networks
ffccc2c7df13486eb753caca31e4008c@Define and @visualise the three “flavours” of ways to aggregate information from surrounding nodes in a graph neural network.
Convolutional:
\[\mathbf h _ u = \phi\left( \mathbf x _ u, \bigoplus _ {v \in \mathcal N} c _ {uv} \psi(\mathbf x _ v) \right)\]where $c _ {uv}$ is a constant that depends only on the structure of the graph (i.e. the adjacency matrix), $\psi$ is a (potentially learned) function that transforms the features of the neighbourhood $\mathcal N$, $\bigoplus$ is some permutation-invariant aggregation function, and $\phi$ is some (learnable) function that updates these features.
Attentional:
\[\mathbf h _ u = \phi\left( \mathbf x _ u, \bigoplus _ {v \in \mathcal N _ u} a(\mathbf x _ u, \mathbf x _ v) \psi(\mathbf x _ v)\right)\]where $a$ is a learnable self-attention mechanism that computed the importance coefficients $\alpha _ {uv} = a(\mathbf x _ u, \mathbf x _ v)$ implicitly, and are often softmax-normalised across all neighbours.
Message-passing:
\[\mathbf h _ u = \phi\left(\mathbf x _ u, \bigoplus _ {v \in \mathcal N _ u} \psi(\mathbf x _ u, \mathbf x _ v) \right)\]where $\psi$ is a learnable message function, computing $v$’s vector sent to $U$.

7a12fb4d86d34fe3ad36d477d497f553@State the containments between the relative expressive power of GNNs implemented via:
- convolution
- attention
- message-passing
d151a8c220444de7bd16e159e4d6a42b@State how a graph convolution neighbourhood averaging operation could be performed over a graph.

4ee5c9379945456fb637d6aefa74cd4c@Visualise one layer of a graph convolution network with softmax activation.

219b4c7d9e414f9bb525a2c0426d6a93@Visualise a two-layer graph convolution network with activation $\sigma$ and final output $\text{softmax}$.

Expressivity
1b7d4207d1e7444b860fd4c61cf85e03@Define what it means for two node-attributed graphs $G = (V, E, \mathbf X)$ and $G’ = (V’, E’, \mathbf X’)$ to be isomorphic.
There exists a bijection $\phi : V \to V’$ such that you have
- Structure preservation: $u \sim v$ in $G$ iff $\phi(u) \sim \phi(v)$ in $G’$
- Feature preservation: $\mathbf x _ u = \mathbf x’ _ {\phi(u)}$
(i.e. they are isomorphic as graphs and features are preserved).
515e0b0f94334c9d8399ceabc7a0af73@State a theorem giving a condition for a class of functions to be able to universally approximate any permutation invariant function on graphs.
A class of functions is universally approximating permutation-invariant functions on graphs with finite node features iff it can discriminate graph isomorphisms.
4dedfb35d72f4d5485d62598624c6066@Visualise the containment between the expressive power of functions that can be computed by message-parsing neural networks and all permutation-invariant functions on graphs.

Weisfeiler-Lehman test
7b08daf8a1b9473a9d07d02f65ba79f1What is the Weisfeiler-Lehman test at a high level?
A necessary but insufficient condition for graph isomorphism, based on iterative colour refinement.
488766392fb44809ad458b14d64bebc6@Visualise the relative expressive powers of functions that can be computed by message-passing graph neural networks, those that can be computed by WL-test style functions, those that depend on the counts of certain substructures, and all permutation invariant functions.

c3675b54999c426c85c10ae4e1819bb5@State a theorem about the expressive power of a graph isomorphism network.
Suppose:
- Graph node features are from a discrete countable set (often not true in practice)
- We have an injective aggregator $\bigoplus$ and $\phi$
- The readout function is graph-wise
Then this MPNN is as powerful as the WL-test.
c5ffb6962e7e4c2aa55e9ae7c6f26d43@State a theorem about the number of aggregators needed to distinguish between real multisets of size $n$.
At least $n$ aggregators are needed.
81b24b214d08437e8abdc3ee612df08f@Define principled neighbourhood aggregation.
An aggregation function defined as moments of neighbour features:
\[\bigoplus = \begin{bmatrix} I \\ S(D, \alpha = 1) \\ S(D, \alpha = -1) \end{bmatrix} \otimes \begin{bmatrix} \mu \\ \sigma _ {\max} \\ m _ {\min} \end{bmatrix}\]where we define scalers $S(d, \alpha)$ as
\[S(d, \alpha) = \left( \frac{\log(d+1)}{\delta} \right)^\alpha\]dbb2b7a46f064a059b316ea3d6a4f4dcWhat are $k$-WL tests at a high-level?
Colour refinement over $k$-tuples, where two $k$-tuples are adjacent if they differ in one node.
9750d04a0a194bd2961498f0bd567a84What is the idea behind random node features and random message passing neural networks (rMPMMs)?
Attach a random feature to every node of the graph, then apply a MPNN. This is to break symmetries that would confuse the WL-test.
0ff0cdc681ec4bc4bfcfaa6b725e0c17What is the idea behind a graph substructure networks (GSNs)?
Choose a bank of substructures containing graphs $H$ of size $k$, and count the occurence of each $H$ in every node or edge of the input graph.
9b547048a9144eed9418788d351343e8@Visualise the class of graphs that GSNs can distinguish compared to $k$-WL tests.

ca8c6c1bd7f24a39b8a9b70d7faaaec1What is the idea behind subgraph GNNs?
Create a collection of subgraphs of the original graph by deleting edges. Then run a permutation-invariant network over the features generated by GNNs applied to each subgraph. This can distinguish graphs the WL-test cannot.
Transformers
976290ba2cb84d68b03ff66f730b3565How can you interpret transformers as a type of GNN?
Their architecture is equivalent to an attention GNN on a complete graph with positional encoding.
Over-squashing and bottlenecks
- Sometimes properties of the graph means that long-term interactions are not modelled way
- Capacity bounds
Exotic message passing
- Some more unusual ways of message passing

