Optimisation for Data Science HT25, Motivation and examples
-
[[Course - Optimisation for Data Science HT25]]U
- See also:
- [[Notes - Machine Learning MT23, Classification]]U
- [[Notes - Machine Learning MT23, Clustering]]U
- [[Notes - Machine Learning MT23, Cross-entropy loss]]U
- [[Notes - Machine Learning MT23, Linear regression]]U
- [[Notes - Machine Learning MT23, Logistic regression]]U
- [[Notes - Machine Learning MT23, Principal component analysis]]U
- [[Notes - Machine Learning MT23, Support vector machines]]U
- [[Notes - Machine Learning MT23, Singular value decomposition]]U
- [[Notes - Numerical Analysis HT24, Singular value decomposition]]U
- [[Notes - Numerical Analysis HT24, Least-squares]]U
Flashcards
General setup
@State:
- The general setup of a data analysis problem,
- What a loss function typically looks like for a data fitting problem, and
- Some examples of how you could interpret different data analysis problems in this framework
- General setup:
- Data set $D = \{(a _ j, y _ j) \mid j = 1, \ldots, m\} \subseteq V \times W$ where $V$ is a vector space of features and $y _ j$ is a space of observations
- Parametric model: $\phi(a; x) : V \to W$, a feature observation relation parameterised by a vector $x \in \mathbb R^n$.
- Typical loss function:
- Want to find $x \in \mathbb R^n$ such that $\phi(a _ j; x) \approx y _ j$ for each $j$, solve $\min _ {x \in \mathbb R^n} f(x)$ where
- $f(x) = \frac 1 m \sum^m _ {j = 1} \ell(a _ j, y _ j; x)$
- Interpretations:
- Regression: $W = \mathbb R$
- Classification: $W = \{1, \ldots, M\}$
- Clustering, dimensionality reduction: $W = \emptyset$.
Regression
Can you formalise regression with an intercept as a data analysis problem in the standard framework where:
- General setup:
- Data set $D = \{(a _ j, y _ j) \mid j = 1, \ldots, m\} \subseteq V \times W$ where $V$ is a vector space of features and $y _ j$ is a space of observations
- Parametric model: $\phi(a; x) : V \to W$, a feature observation relation parameterised by a vector $x \in \mathbb R^n$.
- Typical loss function:
- Want to find $x \in \mathbb R^n$ such that $\phi(a _ j; x) \approx y _ j$ for each $j$, solve $\min _ {x \in \mathbb R^n} f(x)$ where
- $f(x) = \frac 1 m \sum^m _ {j = 1} \ell(a _ j, y _ j; x)$
- Data set $D = \{(a _ j, y _ j) \mid j = 1, \ldots, m\} \subseteq V \times W$ where $V$ is a vector space of features and $y _ j$ is a space of observations
- Parametric model: $\phi(a; x) : V \to W$, a feature observation relation parameterised by a vector $x \in \mathbb R^n$.
- Want to find $x \in \mathbb R^n$ such that $\phi(a _ j; x) \approx y _ j$ for each $j$, solve $\min _ {x \in \mathbb R^n} f(x)$ where
- $f(x) = \frac 1 m \sum^m _ {j = 1} \ell(a _ j, y _ j; x)$
$V = \mathbb R^{n}$, $W = \mathbb R$ and $\phi(a; x) = a^\top \tilde x$ and have objective
\[\min_{x \in \mathbb R^n} \frac{1}{2m} \sum^m_{j = 1} (\tilde a_j^\top \tilde x - y_j)^2 = \frac{1}{2m} ||A\tilde x - y||^2\]where $\tilde x = {x \choose \beta}$.
@example~