Notes - Continuous Mathematics HT23, Root-Finding
Flashcards
Two different stopping conditions for an iterative algorithm are
\[\begin{aligned}
|x_n - x_{n-1}| &< \text{tol} \\\\
|x_n - x_{n-1}| &< \text{tol}|x_n|
\end{aligned}\]
where the first detects small changes in absolute value and the second detects small changes in relative error. How are these two conditions often combined?
What is a bracket, given some function $f$?
An interval $a _ 0, b _ 0$ with $f(a _ 0)f(b _ 0) < 0$.
When using interval bisection, what is an upper bound on the error $ \vert \epsilon _ n \vert $?
What’s a big issue that stops you using interval bisection?
You need to find a starting bracket.
Newton’s method
Can you give an example of a function where Newton’s method fails for almost every single point?::
\[f(x) = \sqrt[3]{x}\]Newton’s method involves evaluating derivatives. Why can this be bad in two ways?
The derivative might be zero and it might be expensive to compute.
What is the expression for $x _ {n+1}$ in terms of $x _ n$ for finding the roots of $f : \mathbb R \to \mathbb R$ using Newton’s method?
Where does Newton’s method essentially come from?
Approximating a function $f$ by its first-order Taylor expansion and then solving for the root of the approximation.
There’s two useful theorems that let you examine the error of Newton’s method. One of these places an upper bound on $\epsilon _ {n+1}$ given that $x _ n$ is in an interval $(x^\star - c, x^\star + c)$, where $x^\star$ is the root. Can you state the theorem in full?
Let $f$ have two continuous derivatives and $f(x^\star) = 0$. Let $\epsilon _ n = x _ n - x^\star$. Then if $x _ n \in I = (x^\star - c, x^\star + c)$ and
\[A(c) = \frac{\max_{\beta \in I} \left|\frac{\text d^2 f}{\text d x^2}(\beta)\right|}{\min_{\alpha \in I} \left|\frac{\text d f}{\text d x}(\alpha)\right|}\]then
\[|\epsilon_{n+1}| \le \frac{A(c)}{2}\epsilon_n^2\]There’s a theorem bounding the error at a given step $x _ n$ of Newton’s method that states:
Let $f$ have two continuous derivatives and $f(x^\star) = 0$. Let $\epsilon _ n = x _ n - x^\star$. Then if $x _ n \in I = (x^\star - c, x^\star + c)$ and
\[A(c) = \frac{\max_{\beta \in I} \left|\frac{\text d^2 f}{\text d x^2}(\beta)\right|}{\min_{\alpha \in I} \left|\frac{\text d f}{\text d x}(\alpha)\right|}\]
then
\[|\epsilon_{n+1}| \le \frac{A(c)}{2}\epsilon_n^2\]
What condition does this allow you to place on the convergence of a given $x _ 0 \in I$?
If $x _ 0 \in I$, and further $\frac{cA(c)}{2} < 1$, then Newton’s method converges at least quadratically to $x^\star$.
There’s two useful theorems that let you analyse the error of Newton’s method. One of these guarantees the quadratic convergence of $x _ n$ starting at $x _ 0 = m$ given that the root $x^\star$ is in an interval $(m - c, m + c)$ and that the interval $I= (m-2c, m+2c)$ satisfies some necessary conditions. Can you state the theorem in full?
If $x^\star \in (m - c, m + c)$ and for $I = (m - 2c, m + 2c)$, the function
\[B(c) = \frac{\max_{\beta \in I} \left|\frac{\text d^2 f}{\text d x^2}(\beta)\right|}{\min_{\alpha \in I} \left|\frac{\text d f}{\text d x}(\alpha)\right|}\]satisfies
\[\frac{cB(c)}{2} < 1\]then this implies Newton’s method starting at $x _ 0 = m$ converges to $x^\star$.
If $f$ has two continuous derivatives and $f(x^\star) = 0$, and defining $\epsilon _ n = x _ n - x^\star$, then if $x _ n \in I = (x^\star - c, x^\star + c)$ and
\[A(c) = \frac{\max_{\beta \in I} \left|\frac{\text d^2 f}{\text d x^2}(\beta)\right|}{\min_{\alpha \in I} \left|\frac{\text d f}{\text d x}(\alpha)\right|}\]
then
\[|\epsilon_{n+1}| \le \frac{A(c)}{2}\epsilon_n^2\]
and furthermore, as a consequence if $x _ 0 \in I$, and further $\frac{cA(c)}{2} < 1$, then Newton’s method converges at least quadratically to $x^\star$. Why can there only be one root (including repeated roots) in the interval $I$?
Otherwise $A(c)$ would not be defined at the turning point between them.
Secant method
What is the main idea behind the secant method?
Approximate the function linearly with a secant between two points.
How many points do you need to start the secant method?
Two.
What’s the formula for $x _ {n+1}$ in terms of $x _ n$ and $x _ {n-1}$ when root-finding with the secant method?
When might you want to use the secant method over Newton’s method?
When calculating the derivative of a function is costly.
What is the order of convergence for the secant method?
Other methods
What is the root finding algorithm where you use the second-order Taylor approximation of a function called?
Halley’s method.
What is the root finding algorithm where you use a quadratic interpolating polynomial called (i.e. a derivative-free version of Halley’s method) called?
Muller’s method.
When is Halley’s method, the root-finding algorithm that uses the second-order Taylor approximation, and Muller’s method, which uses the quadratic interpolating polynomial, not very good?
There’s lots of difficulty when the quadratic doesn’t have a root.
Halley’s method and Muller’s method both approximate a function $f(x)$ by a quadratic $f(x) \approx ax^2 + bx + c$ and find the root via the quadratic formula. What’s the method that cleverly solves their problems by instead fitting $x = ay^2 + by + c$ (which always has a root) called?
Inverse Quadratic Interpolation.
What’s the order of convergence for inverse quadratic interpolation (IQI)?
Newton’s method in $d$ dimensions
When deriving a version of Newton’s method for $\pmb f : \mathbb R^n \to \mathbb R^d$, what do you break $\pmb f(\pmb x) = \pmb 0$ into?
Deriving newton’s method in $d$-dimensions involves splitting $\pmb f : \mathbb R^n \to \mathbb R^d$ from $\pmb f(\pmb x) = \pmb 0$ into
\[\begin{aligned}
f_1(\pmb x&) = 0 \\\\
f_2(\pmb x&) = 0 \\\\
& \vdots \\\\
f_d(\pmb x&) = 0
\end{aligned}\]
Now how do we approximate each of these about $\pmb x _ 0$ to get $d$ linear equations?
Deriving newton’s method in $d$-dimensions involves splitting $\pmb f : \mathbb R^n \to \mathbb R^d$ from $\pmb f(\pmb x) = \pmb 0$ into
\[\begin{aligned}
f_1(\pmb x&) = 0 \\\\
f_2(\pmb x&) = 0 \\\\
& \vdots \\\\
f_d(\pmb x&) = 0
\end{aligned}\]
and then approximating each as
\[\begin{aligned}
f_1(\pmb x_0)+ f_1'(\pmb x_0&)^\intercal(\pmb x - \pmb x_0) = 0 \\\\
f_2(\pmb x_0)+ f_2'(\pmb x_0&)^\intercal(\pmb x - \pmb x_0) = 0 \\\\
&\vdots \\\\
f_d(\pmb x_0)+ f_d'(\pmb x_0&)^\intercal(\pmb x - \pmb x_0) = 0 \\\\
\end{aligned}\]
How can you collect this all into one nice matrix equation?
Newton’s method in $d$-dimensions approximates $\pmb f(\pmb x) = \pmb 0$ as $d$ linear equations
\[\begin{aligned}
f_1(\pmb x_0)+ f_1'(\pmb x_0&)^\intercal(\pmb x - \pmb x_0) = 0 \\\\
f_2(\pmb x_0)+ f_2'(\pmb x_0&)^\intercal(\pmb x - \pmb x_0) = 0 \\\\
&\vdots \\\\
f_d(\pmb x_0)+ f_d'(\pmb x_0&)^\intercal(\pmb x - \pmb x_0) = 0 \\\\
\end{aligned}\]
which is then collected into the nice matrix equation
\[(\mathbf J(\pmb f)(\pmb x_0))(\pmb x - \pmb x_0) = -\pmb f(\pmb x_0)\]
What iteration does this give rise to for $\pmb x _ {n+1}$ in terms of $\pmb x _ n$?
What type of convergence do you get for Newton’s method in $d$ dimensions?
Quadratic.
How many evaluations of $\pmb f$ are required at each step for Newton’s method in $d$ dimensions?
How many derivatives need to be calculated at each step for Newton’s method in $d$ dimensions?
How many floating point operations need to be done at each step for Newton’s method in $d$ dimensions, and why?
$O(d^3)$, solving matrix equation
What is the iteration for $\pmb x _ {n+1}$ in terms of $\pmb x _ n$ for Newton’s method on $\pmb f(\pmb x) = \pmb 0$ in $d$ dimensions?
Newton’s method in $d$ dimensions has iterations that look like
\[\pmb x_{n+1} = \pmb x_n - (\mathbf J(\pmb f)(\pmb x_n))^{-1}\pmb f(\pmb x_n)\]
What is $\pmb{\Delta x}$?
Newton’s method in $d$ dimensions has iterations that look like
\[\pmb x_{n+1} = \pmb x_n + \pmb{\Delta x}\]
where
\[\pmb{\Delta x} = - (\mathbf J(\pmb f)(\pmb x_n))^{-1}\pmb f(\pmb x_n)\]
There’s a technique called “damping” that makes Newton’s method more robust (at the cost of slower convergence). This involves computing $\pmb{\Delta x}$ and then seeing if $\pmb x _ {n+1}$ is a better solution than $\pmb x _ n$. What’s a sensible way to define “better” here?
Newton’s method in $d$ dimensions has iterations that look like
\[\pmb x_{n+1} = \pmb x_n + \pmb{\Delta x}\]
where
\[\pmb{\Delta x} = - (\mathbf J(\pmb f)(\pmb x_n))^{-1}\pmb f(\pmb x_n)\]
There’s a technique called “damping” that makes Newton’s method more robust (at the cost of slower convergence). This involves computing $\pmb{\Delta x}$ and then seeing if $\pmb x _ {n+1}$ is a better solution than $\pmb x _ n$, e.g. if
\[||\pmb f(\pmb x_{n+1})|| \le ||\pmb f(\pmb x_n)||\]
If this isn’t the case, how does damped Newton’s method modify the iteration?
It computes $\pmb x _ {n+1}$ as
\[\pmb x_{n+1} = \pmb x_n + \lambda\pmb{\Delta x}\]for some $\lambda \in (0, 1)$.
Quasi-Newton methods in $d$ dimensions
The idea of quasi-Newton methods in $d$ dimensions is to approximate the Jacobian of $\pmb f$ in the same way that the secant method approximates the derivative of some $f$. In the secant method, we approximate the derivaitve by
\[f(x_n) = f(x_{n-1}) + \hat{d_n} (x_n - x_{n-1})\]
What’s the analogous secant equation in $d$ dimensions?
The idea of quasi-Newton methods in $d$ dimensions is to approximate the Jacobian of $\pmb f$ in the same way that the secant method approximates the derivative of some $f$. In the secant method, we approximate the derivaitve by
\[f(x_n) = f(x_{n-1}) + \hat{d_n} (x_n - x_{n-1})\]
The analogous secant equation in $d$ dimensions is
\[\pmb f(\pmb x_n) = f(\pmb x_{n-1}) + \hat{\mathbf J}_n (\pmb x_n - \pmb x_{n-1})\]
but a big issue here is that $\hat{\mathbf J} _ n$ is underdetermined and so we need to add more constraints. What are some example constraints used?
- Minimising the “distance” (with respect to a matrix norm) between approximate Jacobians
- It doesn’t change vectors that are orthogonal to $\pmb x _ n - \pmb x _ {n-1}$.
What’s the name of the root-finding method analogous to the secant method in $d$ dimensions?
Broyden’s method.
What type of convergence do you get for Broyden’s method for root-finding in $d$ dimensions?
Better than linear (for sufficiently close to a root).
How many evaluations of $\pmb f$ are required at each step for Broyden’s method for root-finding in $d$ dimensions?
How many derivatives need to be calculated at each step for Broyden’s method for root-finding in $d$ dimensions?
$0$, but potentially $d^2$ at the start.
Proofs
Prove that
If $f$ has two continuous derivatives and $f(x^\star) = 0$, and defining $\epsilon _ n = x _ n - x^\star$, then if $x _ n \in I = (x^\star - c, x^\star + c)$ and
\[A(c) = \frac{\max_{\beta \in I} \left|\frac{\text d^2 f}{\text d x^2}(\beta)\right|}{\min_{\alpha \in I} \left|\frac{\text d f}{\text d x}(\alpha)\right|}\]
then
\[|\epsilon_{n+1}| \le \frac{A(c)}{2}\epsilon_n^2\]
and furthermore, as a consequence if $x _ 0 \in I$, and further $\frac{cA(c)}{2} < 1$, then Newton’s method converges at least quadratically to $x^\star$.
Todo.
Prove that if $x^\star \in (m - c, m + c)$ and for $I = (m - 2c, m + 2c)$, the function
\[B(c) = \frac{\max_{\beta \in I} \left|\frac{\text d^2 f}{\text d x^2}(\beta)\right|}{\min_{\alpha \in I} \left|\frac{\text d f}{\text d x}(\alpha)\right|}\]
satisfies
\[\frac{cB(c)}{2} < 1\]
then this implies Newton’s method starting at $x _ 0 = m$ converges to $x^\star$.
Todo.
If $f$ has two continuous derivatives and $f(x^\star) = 0$, and defining $\epsilon _ n = x _ n - x^\star$, then if $x _ n \in I = (x^\star - c, x^\star + c)$ and
\[A(c) = \frac{\max_{\beta \in I} \left|\frac{\text d^2 f}{\text d x^2}(\beta)\right|}{\min_{\alpha \in I} \left|\frac{\text d f}{\text d x}(\alpha)\right|}\]
then
\[|\epsilon_{n+1}| \le \frac{A(c)}{2}\epsilon_n^2\]
and furthermore, as a consequence if $x _ 0 \in I$, and further $\frac{cA(c)}{2} < 1$, then Newton’s method converges at least quadratically to $x^\star$. Prove that such an interval always exists.
Todo.
Programs
Implement Newton’s method for finding the roots of a given function.
Todo.
Implement interval bisection for finding the roots of a given function, that initially searches for a starting bracket.
Todo.
Implement the inverse quadratic interpolation method for finding the roots of a given function.
Todo.