Geometric Deep Learning HT26, Core idea


Geometric deep learning is about the efficient representation learning of data via encoding the geometric priors of the data in the architectures we choose. In this framework, we can derive and motivate several successful architectures (CNNs, RNNs, transformers) based on certain symmetries in their input domains.

In the words of the authors of “Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges”:

While learning generic functions in high dimensions is a cursed estimation problem, most tasks of interest are not generic, and come with essential pre-defined regularities arising from the underlying low-dimensionality and structure of the physical world. This text is concerned with exposing these regularities through unified geometric principles that can be applied throughout a wide spectrum of applications.

This is formalised as follows. We have:

  • A domain $\Omega$ (a grid, a set, etc)
  • The symmetries of this domain $G$ (e.g. translations, permutations)
  • The space of signals $\mathcal X(\Omega)$ on this domain (e.g. images)
  • A hypothesis class $\mathcal F(\mathcal X(\Omega))$ of functions

Then $G$ acts on domain $\Omega$ and then on signals $\mathcal X(\Omega)$ via a representation $\rho$. We then encode geometric priors on the hypothesis class $\mathcal F(\mathcal X(\Omega))$ by requiring them to respect the symmetries of this domain.

The blueprint is as follows. We have the following building blocks:

  • A linear $G$-equivariant layer $B : \mathcal X(\Omega, \mathcal C) \to \mathcal X(\Omega', \mathcal C')$ satisfying $B(g x) = gB(x)$ for all $g \in G$ and $x \in \mathcal X(\Omega, \mathcal C)$.
  • A nonlinearity $\sigma : \mathcal C \to \mathcal C'$ applied element-wise as $(\pmb \sigma(x))(u) = \sigma(x(u))$
  • Local pooling (coarsening) $P : \mathcal X(\Omega, \mathcal C) \to \mathcal X(\Omega', \mathcal C)$ such that $\Omega' \subseteq \Omega$
  • $G$-invariant layer (global pooling) $A : \mathcal X(\Omega, \mathcal C) \to \mathcal Y$ satisfying $A(g\cdot x) = A(x)$ for all $g \in G$ and $x \in \mathcal X(\Omega, \mathcal C)$.

Using these blocks, we can construct $G$-invariant function $f : \mathcal X(\Omega, \mathcal C) \to \mathcal Y$ of the form

\[f = A \circ \pmb \sigma_J \circ B_J \circ P_{J-1} \circ \cdots \circ P_1 \circ \pmb \sigma_1 \circ B\]

where the blocks are selected such that the output space of each block matches the input space of the next one. Different blocks may exploit different choices of symmetry groups $G$.

Flashcards

@State and @visualise the spaces of interest in geometric deep learning and @describe how these spaces relate to one another.

  • A domain $\Omega$
  • The symmetries of this domain $G$
  • The space of signals $\mathcal X(\Omega)$ on this domain
  • A hypothesis class $\mathcal F(\mathcal X(\Omega))$

  • $G$ acts on the domain $\Omega$ and then on signals $\mathcal X(\Omega)$ via a representation $\rho$.
  • We encode geometric priors in the hypothesis class $\mathcal F(\mathcal X(\Omega))$ by requiring them to respect the symmetries of the domain.

Fill out this list of @example uses of the geometric deep learning blueprint.

ArchitectureDomain $\Omega$Symmetry group $\mathcal{G}$
CNN??
Spherical CNN??
Intrinsic / Mesh CNN??
GNN??
Deep Sets??
Transformer??
LSTM??
ArchitectureDomain $\Omega$Symmetry group $\mathcal{G}$
CNNGridTranslation
Spherical CNNSphere / $\mathrm{SO}(3)$Rotation $\mathrm{SO}(3)$
Intrinsic / Mesh CNNManifoldIsometry $\mathrm{Iso}(\Omega)$ / Gauge symmetry $\mathrm{SO}(2)$
GNNGraphPermutation $\Sigma _ n$
Deep SetsSetPermutation $\Sigma _ n$
TransformerComplete GraphPermutation $\Sigma _ n$
LSTM1D GridTime warping