Geometric Deep Learning HT26, Sets


Flashcards

78658aefed5b478ebddff65908e6191e

Suppose:

  • We have a set of $d$-dimensional features, specified as an $n \times d$ matrix $\mathbf X = (\pmb x _ 1, \ldots, \pmb x _ n)^\top$
  • $f$ is a function operating on these matrices
  • $\Sigma _ n$ is the permutation group on $n$ elements

@Define what it means for $f$ to be permutation invariant, and give an @example of a class of such functions.


For any permutation matrix $\mathbf P$ corresponding to the representation of a permutation $\rho (g)$, we have

\[f(\mathbf P \mathbf X) = f(\mathbf X)\]

One example is

\[f(\mathbf X) = \phi\left( \sum _ {u \in \mathcal V} \psi(\mathbf x _ u) \right)\]

where $\psi$ is independently applied to every node’s features, and $\phi$ is applied to the sum-aggregated outputs.

fcef2e2c11d349d7a7268565ccb2532b

Suppose:

  • We have a set of $d$-dimensional features, specified as an $n \times d$ matrix $\mathbf X = (\pmb x _ 1, \ldots, \pmb x _ n)^\top$, possibly with some graph structure
  • $f$ is a function operating on these matrices

Often we want some function $\mathbf F$ that acts locally in a node-wise manner, so that we may construct a new matrix $\mathbf H = \mathbf F(\mathbf X)$ which stacks these calculated features into a matrix.

We don’t have permutation invariance here. What do we have instead?


Permutation equivariance.

ebadfe578b62419e95054a262820f1b4

Suppose:

  • We have a set of $d$-dimensional features, specified as an $n \times d$ matrix $\mathbf X = (\pmb x _ 1, \ldots, \pmb x _ n)^\top$
  • $\mathbf F$ is a function operating on these matrices
  • $\Sigma _ n$ is the permutation group on $n$ elements

@Define what it means for $\mathbf F$ to be permutation invariant, and give an @example of a class of such functions.


For any permutation matrix $\mathbf P$ corresponding to the representation of a permutation $\rho (g)$, we have

\[\mathbf (\mathbf P \mathbf X) = \mathbf P\mathbf F(\mathbf X)\]

One example is a shared node-wise linear transform

\[\mathbf F _ {\mathbf \Theta} (\mathbf X) = \mathbf X \mathbf \Theta\]

where $\mathbf \Theta \in \mathbb R^{d \times d’}$ is a weight matrix.

156aaec4c50641eeb5d950660d044a46

@State a result about the class of all linear equivariant functions, i.e. functions of the form

\[\mathbf{FPX} = \mathbf{PFX}\]

Give an interpretation of this result.


Any such map can be written as a linear combination of two generators, the identity $\mathbf F _ 1 \mathbf X = \mathbf X$, and the identity $\mathbf F _ 2 \mathbf X = \frac 1 n \mathbf 1 \mathbf 1^\top \mathbf X$.

This says that either we can process each node independently or all together at once.




Related posts