Notes - Optimisation for Data Science HT25, Motivation and examples

[[Course - Optimisation for Data Science HT25]]^U

Flashcards

General setup

@State:

The general setup of a data analysis problem,
What a loss function typically looks like for a data fitting problem, and
Some examples of how you could interpret different data analysis problems in this framework

General setup:
- Data set $D = \{(a _ j, y _ j) \mid j = 1, \ldots, m\} \subseteq V \times W$ where $V$ is a vector space of features and $y _ j$ is a space of observations
- Parametric model: $\phi(a; x) : V \to W$, a feature observation relation parameterised by a vector $x \in \mathbb R^n$.
Typical loss function:
- Want to find $x \in \mathbb R^n$ such that $\phi(a _ j; x) \approx y _ j$ for each $j$, solve $\min _ {x \in \mathbb R^n} f(x)$ where
- $f(x) = \frac 1 m \sum^m _ {j = 1} \ell(a _ j, y _ j; x)$
Interpretations:
- Regression: $W = \mathbb R$
- Classification: $W = \{1, \ldots, M\}$
- Clustering, dimensionality reduction: $W = \emptyset$.

Regression

Can you formalise regression with an intercept as a data analysis problem in the standard framework where:

General setup:
- Data set $D = \{(a _ j, y _ j) \mid j = 1, \ldots, m\} \subseteq V \times W$ where $V$ is a vector space of features and $y _ j$ is a space of observations
- Parametric model: $\phi(a; x) : V \to W$, a feature observation relation parameterised by a vector $x \in \mathbb R^n$.
Typical loss function:
- Want to find $x \in \mathbb R^n$ such that $\phi(a _ j; x) \approx y _ j$ for each $j$, solve $\min _ {x \in \mathbb R^n} f(x)$ where
- $f(x) = \frac 1 m \sum^m _ {j = 1} \ell(a _ j, y _ j; x)$

$V = \mathbb R^{n}$, $W = \mathbb R$ and $\phi(a; x) = a^\top \tilde x$ and have objective

\[\min _ {x \in \mathbb R^n} \frac{1}{2m} \sum^m _ {j = 1} (\tilde a _ j^\top \tilde x - y _ j)^2 = \frac{1}{2m} \vert \vert A\tilde x - y \vert \vert ^2\]

where $\tilde x = {x \choose \beta}$.

@example~

Optimisation for Data Science HT25, Motivation and examples

Flashcards

General setup

Regression

Dictionary learning

Matrix completion

PCA

Data separation

Multiclass classification

Related posts