Geometric Deep Learning HT26, Graphs


Flashcards

Basic definitions

4d3d61f89a444dfc8ef08ac6dbc5ffc2

@Define the difference between transductive and inductive tasks on graphs.


  • In a transductive task, the graph is the same but the nodes may differ for each task
  • In an inductive task, the graph can be completely different for each task
de0936941b464c52a5d9d10d784fd444

@State the desired property of functions acting on graphs.


They should be permutation invariant to the ordering of the nodes.

bbb2f80ffe3440b2afd9bcb5352d1ec4

Suppose we have some function $f(\mathbf X, \mathbf A)$ operating on graph nodes and features. @Define what it means for $f$ to be permutation invariant and what it means to be permutation equivariant.


\[f(\mathbf {PX}, \mathbf{PAP}^\top) = f(\mathbf X,\mathbf A)\]

and

\[\mathbf F(\mathbf{PX}, \mathbf{PAP}^\top) = \mathbf{PF}(\mathbf X, \mathbf A)\]

for any permutation matrix $\mathbf P$.

5fbae686c642471181a7dabfea0ad54e

@State the general blueprint for constructing permutation equivariant functions on graphs, and visualise this procedure.


Define a local function $\phi$ that operates over a node and its neighbourhood, $\phi(\mathbf x _ u, \mathbf X _ {\mathcal N _ u})$. Then a permutation equivariant function $\mathbf F$ can be constructed by applying $\phi$ to every node’s neighbourhood in isolation, i.e.

\[F(\mathbf{X}, A) = \begin{bmatrix} \cdots \quad \phi(\mathbf{x} _ 1, \mathbf{X} _ {\mathcal{N} _ 1}) \quad \cdots \\ \cdots \quad\phi(\mathbf{x} _ 2, \mathbf{X} _ {\mathcal{N} _ 2}) \quad \cdots \\ \vdots \\ \cdots \quad\phi(\mathbf{x} _ n, \mathbf{X} _ {\mathcal{N} _ n}) \quad \cdots \end{bmatrix}\]

Graph neural networks

ffccc2c7df13486eb753caca31e4008c

@Define and @visualise the three “flavours” of ways to aggregate information from surrounding nodes in a graph neural network.


Convolutional:

\[\mathbf h _ u = \phi\left( \mathbf x _ u, \bigoplus _ {v \in \mathcal N} c _ {uv} \psi(\mathbf x _ v) \right)\]

where $c _ {uv}$ is a constant that depends only on the structure of the graph (i.e. the adjacency matrix), $\psi$ is a (potentially learned) function that transforms the features of the neighbourhood $\mathcal N$, $\bigoplus$ is some permutation-invariant aggregation function, and $\phi$ is some (learnable) function that updates these features.

Attentional:

\[\mathbf h _ u = \phi\left( \mathbf x _ u, \bigoplus _ {v \in \mathcal N _ u} a(\mathbf x _ u, \mathbf x _ v) \psi(\mathbf x _ v)\right)\]

where $a$ is a learnable self-attention mechanism that computed the importance coefficients $\alpha _ {uv} = a(\mathbf x _ u, \mathbf x _ v)$ implicitly, and are often softmax-normalised across all neighbours.

Message-passing:

\[\mathbf h _ u = \phi\left(\mathbf x _ u, \bigoplus _ {v \in \mathcal N _ u} \psi(\mathbf x _ u, \mathbf x _ v) \right)\]

where $\psi$ is a learnable message function, computing $v$’s vector sent to $U$.

7a12fb4d86d34fe3ad36d477d497f553

@State the containments between the relative expressive power of GNNs implemented via:

  • convolution
  • attention
  • message-passing

\[\text{convolution} \subseteq \text{attention} \subseteq \text{message-passing}\]
d151a8c220444de7bd16e159e4d6a42b

@State how a graph convolution neighbourhood averaging operation could be performed over a graph.


4ee5c9379945456fb637d6aefa74cd4c

@Visualise one layer of a graph convolution network with softmax activation.


219b4c7d9e414f9bb525a2c0426d6a93

@Visualise a two-layer graph convolution network with activation $\sigma$ and final output $\text{softmax}$.


e2207dd66ae7443db54da61c37d5fc17

@Visualise the scalable inception-like graph network architecture.


b1200c2eb62f468ebd5b00b5e1483f43

@Visualise the GraphSAGE architecture.


Expressivity

1b7d4207d1e7444b860fd4c61cf85e03

@Define what it means for two node-attributed graphs $G = (V, E, \mathbf X)$ and $G’ = (V’, E’, \mathbf X’)$ to be isomorphic.


There exists a bijection $\phi : V \to V’$ such that you have

  • Structure preservation: $u \sim v$ in $G$ iff $\phi(u) \sim \phi(v)$ in $G’$
  • Feature preservation: $\mathbf x _ u = \mathbf x’ _ {\phi(u)}$

(i.e. they are isomorphic as graphs and features are preserved).

515e0b0f94334c9d8399ceabc7a0af73

@State a theorem giving a condition for a class of functions to be able to universally approximate any permutation invariant function on graphs.


A class of functions is universally approximating permutation-invariant functions on graphs with finite node features iff it can discriminate graph isomorphisms.

4dedfb35d72f4d5485d62598624c6066

@Visualise the containment between the expressive power of functions that can be computed by message-parsing neural networks and all permutation-invariant functions on graphs.


Weisfeiler-Lehman test

7b08daf8a1b9473a9d07d02f65ba79f1

What is the Weisfeiler-Lehman test at a high level?


A necessary but insufficient condition for graph isomorphism, based on iterative colour refinement.

488766392fb44809ad458b14d64bebc6

@Visualise the relative expressive powers of functions that can be computed by message-passing graph neural networks, those that can be computed by WL-test style functions, those that depend on the counts of certain substructures, and all permutation invariant functions.


c3675b54999c426c85c10ae4e1819bb5

@State a theorem about the expressive power of a graph isomorphism network.


Suppose:

  • Graph node features are from a discrete countable set (often not true in practice)
  • We have an injective aggregator $\bigoplus$ and $\phi$
  • The readout function is graph-wise

Then this MPNN is as powerful as the WL-test.

c5ffb6962e7e4c2aa55e9ae7c6f26d43

@State a theorem about the number of aggregators needed to distinguish between real multisets of size $n$.


At least $n$ aggregators are needed.

81b24b214d08437e8abdc3ee612df08f

@Define principled neighbourhood aggregation.


An aggregation function defined as moments of neighbour features:

\[\bigoplus = \begin{bmatrix} I \\ S(D, \alpha = 1) \\ S(D, \alpha = -1) \end{bmatrix} \otimes \begin{bmatrix} \mu \\ \sigma _ {\max} \\ m _ {\min} \end{bmatrix}\]

where we define scalers $S(d, \alpha)$ as

\[S(d, \alpha) = \left( \frac{\log(d+1)}{\delta} \right)^\alpha\]
dbb2b7a46f064a059b316ea3d6a4f4dc

What are $k$-WL tests at a high-level?


Colour refinement over $k$-tuples, where two $k$-tuples are adjacent if they differ in one node.

9750d04a0a194bd2961498f0bd567a84

What is the idea behind random node features and random message passing neural networks (rMPMMs)?


Attach a random feature to every node of the graph, then apply a MPNN. This is to break symmetries that would confuse the WL-test.

0ff0cdc681ec4bc4bfcfaa6b725e0c17

What is the idea behind a graph substructure networks (GSNs)?


Choose a bank of substructures containing graphs $H$ of size $k$, and count the occurence of each $H$ in every node or edge of the input graph.

9b547048a9144eed9418788d351343e8

@Visualise the class of graphs that GSNs can distinguish compared to $k$-WL tests.


ca8c6c1bd7f24a39b8a9b70d7faaaec1

What is the idea behind subgraph GNNs?


Create a collection of subgraphs of the original graph by deleting edges. Then run a permutation-invariant network over the features generated by GNNs applied to each subgraph. This can distinguish graphs the WL-test cannot.

Transformers

976290ba2cb84d68b03ff66f730b3565

How can you interpret transformers as a type of GNN?


Their architecture is equivalent to an attention GNN on a complete graph with positional encoding.

Over-squashing and bottlenecks

  • Sometimes properties of the graph means that long-term interactions are not modelled way
  • Capacity bounds

Exotic message passing

  • Some more unusual ways of message passing



Related posts