Fundementals

Floating point

Computers don’t use exact numbers when preforming calculations, but instead use a floating point system.

Definition

A floating point number x is expressed as

x = S \times 2^E \times 1.b_1b_2\ldots b_{52}

Such that S represents the sign of the number, + or -. E is an exponent, and b_1 through b_52 are binary digits.

Example

Suppose we have a variable x = 6 and we wish to express it as a floating point number. We first convert the number to binary

6_{10} = 110_{2}

We now specify where the decimal is, by adding an exponent

6_{10} = 110_{2} = 2^2\times1.10_2

And finally adding the sign

6_{10} = 110_{2} = 1\times2^2\times1.10_2

We can also express fractions, particular ones without a terminating digit. For example,

0.3 = 1 \times 2^{-2} \times 1.00110011001\ldots

Terminating after 52 digits.

Loss of signifigance

Computers themselves store numbers as a collection of 64 bits

x = s_1e_1\ldots e_{11}m_1\ldots m_{52}

One bit for the sign of a number, 11 bits for the exponent, and 52 bits for the mantissa (digits after the decimal place).

Clearly this means that only a finite amount of numbers can be expressed in floating point, and numbers are rounded to their nearest corresponding floating point. This leads to a loss of signifigance, and means functions with extremly large or small inputs will be catastrophically wrong, as the difference between two numbers is completly outside of the range floating point can represent.

Take the identical functions f(x) and g(x)

f(x) = \frac{1 - \cos(x)}{\sin^2(x)}

g(x) = \frac{1}{1 + \cos(x)}

f(x) suffers from an egregious loss of signifigance error as x decreases towards 0, But g(x) won’t.

Rounding

The number 101.101 rounds up to 110, because the digits after the decimal are above .1. If they aren’t the number terminates at the decimal. If 101.1\overline{0} then we get 110, but if we have 110.1\overline{0}, we drop the number. This is based on the digit in the units place. If it is a 1, we round up, if it is not, we drop it.

Taylor series

The taylor expansion of a function f(x) is given by

f(x) = \sum^m_{k=0} \frac{f^{(k)}(x)(x-x_0)^k}{k!} + \frac{f^{(m + 1)}(c)(x-x_0)^{m+1}}{(m+1)!}

Example

Take the function f(x) = e^x We can taylor expand it to get

f(x) = e^x = 1 + \frac{x}{1!} + \frac{x^2}{2!} + \cdots + \frac{x^k}{k!} + \frac{e^cx^{k+1}}{k+1!}

We can also write the Taylor series

f(x + h) = \sum^m_{k=0} \frac{h^kf^{(k)}(x)}{k!}

Example

Find the error

\begin{align*} y^\prime(x) &= ay(x) + by(x + h) + \varepsilon\\ y^\prime(x) &= ay(x) + b\left(y(x) - hy^\prime(x) + \frac{h^2}{2}y^{\prime\prime}(c_1)\right) + \varepsilon\\ y^\prime(x) &= (a+b)y(x) - bhy^\prime(x) + \frac{bh^2}{2}y^{\prime\prime}(c_1) + \varepsilon\\ \end{align*}

a = -b\quad b = -\frac{1}{h}\quad

Thus we get

y^\prime(x) = \frac{y(x+h) - y(x)}{h} + \varepsilon

with an error \varepsilon = -\frac{h}{2}y^{\prime\prime}(c_1).

When we plug these numbers into the computer however, we are going to have an error as we have only 52 digits of precision. This gives us the machine error \epsilon. So if we wanted to numerically approximate the derivative

y^\prime(x) = \frac{y(x+h) - y(x) + 2\epsilon y}{h} + \varepsilon

This gives us the total error, the sum of the machine error, and the taylor error.

\varepsilon_T = \epsilon + \varepsilon

\varepsilon_T = \frac{2\epsilon |y|}{2h} + \frac{2h^2}{12}|y^{\prime\prime\prime}(x)|

By computing the derivative, we can find the smallest total error.

\frac{d\varepsilon_T}{dh} = -\frac{1}{h^2}\epsilon |y| + \frac{h}{3}|y^{\prime\prime\prime}| = 0

Solving for h

h_{opt} =\left(\frac{3\epsilon |y|}{|y^{\prime\prime\prime}|}\right)^\frac{1}{3}

For example, lets try the function y = e^x. This gives us an optimal error

h_{opt} = \left( 3 \times 2^{-52} \frac{e^x}{e^x} \right)^\frac{1}{3}

Example

Lets find the optimal value of h using a different error.

\varepsilon = -\frac{h}{2}y^{\prime\prime}(c_1)

We can rewrite our equation

y^\prime(x) = \frac{y(x+h) - y(x) + 2\epsilon y}{h} + -\frac{h}{2}y^{\prime\prime}(c_1)

Then

\varepsilon_T = \frac{2\epsilon |y|}{h} + \frac{h}{2}y^{\prime\prime}

Taking the derivative

\frac{d \varepsilon_T}{dh} = -\frac{2\epsilon |y|}{h^2} + \frac{1}{2}y^{\prime\prime} = 0

We get

\frac{2\epsilon |y|}{h^2} = \frac{1}{2}y^{\prime\prime}

Thus

h_{opt} = \left(\frac{4\epsilon |y|}{|y^{\prime\prime}|}\right)^\frac{1}{2}

Example

Lets find the error of the following expression

y^{\prime\prime} = ay(x-h) + by(x) + cy(x+h) + \varepsilon\\

\begin{align*} y^{\prime\prime} &= a\left(y(x)-hy^\prime(x) + \frac{h^2}{2}y^{\prime\prime}(x) - \frac{h^3}{6}y^{\prime\prime\prime}(x) + \frac{h^4}{4!}y^{(4)}(x) \right)\\ &+ by(x) \\ &+ c\left(y(x)+hy^\prime(x) + \frac{h^2}{2}y^{\prime\prime}(x) + \frac{h^3}{6}y^{\prime\prime\prime}(x) + \frac{h^4}{4!}y^{(4)}(x) \right) \end{align*}

Thus

\begin{align*} (a + b + c) = 0\\ h(c-a) = 0\\ \frac{h^2}{2}(a-c) = 1 \end{align*}

Thus we have constants

c = a\quad a = \frac{1}{h^2} \quad b = -\frac{2}{h^2}

And we’re left with the error

\begin{align*} \varepsilon = -\frac{h^2}{2y}\left(y^{(4)}(c_1)+y^{(4)}(c_2)\right) \end{align*}

Thus we have

y^{\prime\prime}(x) = \frac{y(x-h)-2y(x)+y(x+h)}{h^2} + \varepsilon

Lets find the optimal error. First we include the machine error

\epsilon_m = \frac{4\epsilon |y|}{h^2}

Combine with our function error \varepsilon_{T} = \frac{4\epsilon |y|}{h^2} + \frac{h^2}{2y}\left(y^{(4)}\right)

Continue rest later.

Example again

Example

Taylor expand f(x) = (1+x)^{1/2}

We get clearly

f^{(k)}(x) = \frac{(-1)^{k+1}}{2^k}(3\cdot 5\cdots (2k-3))(1+x)^{\frac{-2k-1}{2}}

So our taylor expansion

f(x) = (1+x)^{1/2} = 1 + \frac{x}{2} - \frac{1}{2^2}\frac{x^2}{2!} + \frac{3}{2^3}\frac{x^3}{3!} \cdots + \frac{-1^{k+1}}{2^k}(1\cdot 3\cdot 5\cdots (2k-3))\frac{x^k}{k!}

Newtons method

Example

Write the Newton Iteration to solve

x = \cos x

Note that

f(x) = x - \cos x

f^\prime(x) = 1 + \sin x

x_{m+1} = x_m - \frac{x_m - \cos x_m}{1 + \sin x_m}

Remember that the error of Newtons method is approximatly equal to the multiplicity of the root

\varepsilon_{m+1} \approxeq \varepsilon_m^2 \bigg\lvert \frac{f^{\prime\prime}(a)}{2f^\prime(a)} \bigg\rvert

Example

Write the Newton Iteration to solve

f(x) = x^\alpha\quad \alpha>0

We want to write newtons iteration with an initial guess x_0 \not = 0, and find for what values of \alpha does newtons iteration converge to root?

First

x_{m+1} x_m - \frac{x^\alpha_m}{\alpha x^{\alpha-1}_m} = x_m\left(1-\frac{1}{\alpha}\right)

This will converge if

-1 < 1 - \frac{1}{\alpha} < 1

Thus

\frac{1}{\alpha} < 2 \implies \alpha > \frac{1}{2}

Provided the multiplicity of the root is 1, it converges quadratically. Otherwise the newton method converges linearly, with the rate equal to S = \frac{m-1}{m}.

Lets assume you can’t calculate the derivative. We instead use the approximation

f^\prime(x) \approx \frac{f(x + h)-f(x)}{h}

Or we use the secant method.

f^\prime(x_m) \approx \frac{f(x_m) - f(x_{m-1})}{x_m - x_{m-1}}

This is useful when f(x_m) is expensive to compute, but the disadvantage is the convergence is slightly slower.

\varepsilon_{m+1} \approx \varepsilon_m^\alpha \bigg\lvert \frac{f^{\prime\prime}(r)}{2f^\prime(r)}\bigg\rvert

Combination of Newton and Bisection

First we draw a secant line from the points a to b. This gives us an equation

y = f(a) + (x - a)\frac{f(b)-f(a)}{b-a}

Which gives us

x = \frac{a(f(b)-f(a))(b-a)}{f(b)-f(a)} = \frac{af(b)-bf(a)}{f(b)-f(a)}

Interpolation

Given m points (x_1,y_1) through (x_m,y_m), we want to find a continous function P_{m-1}(x) such that for all i = 1,\ldots,m

P_{m-1}(x_i) = y_i

We can find this function through Lagrange interpolation.

Define

L_k(x) = \frac{(x-x_1)(x-x_2)\cdots(x-x_{k-1})(x-x_{k+1})\cdots(x-x_m)}{(x_k-x_1)(x_k-x_2)\cdots(x_k - x_{k-1})(x_k - x_{k+1})\cdots(x_k - x_m)}

We can clearly see that

L_k(x_k) = 1,\quad L_k(x_i) = 0

And that L_k is a polynomial of degree m-1. Here we can define

P_{m-1}(x) = y_1L_1(x) + \cdots + y_mL_m(x)

As a polynomial of degree \leq m-1. This is called the Lagrange polynomial. This gives us a function P_{m-1}(x) such that

P_{m-1}(x_i) = y_1L_1(x_i) + \cdots + y_mL_m(x_i) = y_i

Which is pretty cool. But even better, this has the special property. Assume there exists some true function f(x) that we are interpolating at a finite amount of points through P_{m-1}(x). Then we have the error

y(x) - P_{m-1}(x) = \frac{y^{(m)}(c)}{m!}(x-x_1)(x-x_2)\cdots(x-x_m)

Where c \in (x_1,x_m). Note that we can make m as large as we want by adding more points, and due to the m! in the denominator, we can make our P_{m-1}(x) accurate incredibly quickly.

Example

Interpolate y(x) = \sin(x) using interpolation points

x_1 = 0\; x_2 = \frac{\pi}{2} \; x_3 = \pi

And compute the error.

We can see

y_1 = 0 \; y_2 = 1 \; y_3 = 0

We therefore want a polynomial of degree two that interpolates the function.

P_2(x) = y_1L_1 + y_2L_2 + y_3L_3

P_2(x) = (0)\frac{(x-\pi /2)(x-\pi)}{(0 - \pi/2)(0-\pi)} + (1)\frac{(x-0)(x-\pi)}{(\pi /2 - 0)(\pi /2 - \pi)} + (0)\frac{(x-0)(x- \pi /2)}{(\pi - 0) (\pi - \pi / 2)}

P_2(x) = -x(x-\pi)\frac{4}{\pi^2}

Thus we have a function that approximates \sin(x) using a parabola. Now we need to compute the error.

\varepsilon = \frac{f^{(m)}(c)}{m!}(x-x_1)\cdots(x-x_m)

\varepsilon = \frac{-\cos(c)}{6}(x-0)\left(x-\frac{\pi}{2}\right)(x-\pi)

Taking the absolute value, and the maximum value of cosine, we get our error |\varepsilon| \leq \frac{1}{6}|x\left(x-\frac{\pi}{2}\right)(x-\pi)|

We can find the maximum error taking the derivative of \varepsilon

\varepsilon^{\prime} = 3x^2 + 3\pi x + \frac{\pi^2}{2} = 0

We get

x_{\text{max}} = \frac{-3\pi \pm \pi \sqrt{3}}{6}

Define the function

h(x) = y(x) - P_{m-1}(x) - c(x-x_i)(x-x_2)\cdots(x-x_m)

This has the property that for any i \in 1,\ldots,m

h(x_i) = 0

We can choose a constant c such that for any x^* \in (x_1,x_m)

h(x^*) = 0

This gives us a function with m+1 zeros, specifically a polynomial of degree m+1. We can use Rolle’s theorem in order to get a function h^\prime(x) which has m roots. We can repeat this process until h^{(m)}(x) has one zero. That being

h^{(m)}(c) = 0 = y^{(m)}(c) - 0 - C(m!)

Thus

C = \frac{y^{(m)}(c)}{m!}

That directly leads to our error formula

y(x) - P_{m-1}(x) = \frac{y^{(m)}(c)}{m!}(x-x_1)(x-x_2)\cdots(x-x_m)

You can choose specific x_1,x_2,x_3 in order to make the error as small as possible.

Error

Suppose that we have an error

\varepsilon = \frac{h^3}{4}f^{(4)}(c_1) + \frac{h^3}{8}f^{\prime\prime}(c_2)

We have a simplified expression

\varepsilon = h^3(\frac{1}{4}+\frac{1}{8})f^{\prime\prime}(c)

For some c.

Multi-panel integration formula

\int\limits^b_a f(s) ds = \int\limits^{a+h}_a f(s) ds + \int\limits^{a+2h}_{a+h} f(s) ds \cdots \int\limits^{b}_{b-h} f(s) ds

Using the formula from last class, this is equal too

\int\limits_a^{a+h}f(s)ds = \frac{h}{2}(f(a)+f(a+h))-\frac{h^3}{12} f^{\prime\prime}(c_1)

Expanding our initial integral, we can see it’s telescoping. This gives

\int\limits^b_a f(s) ds = \frac{h}{2}\left(f(a)+2f(a+h)\cdots2f(b-h) + f(b)\right) - \varepsilon

\varepsilon = \frac{h^3}{12}(f^{\prime\prime}(c_1)\cdots f^{\prime\prime}(c_m))

Using the generalized intermediate value theorem, we can simplify

\varepsilon= \frac{h^3}{12}mf^{\prime\prime}(c) = \frac{(a-b)h^2}{12}f^{\prime\prime}(c)

Which gives us an error for the trapezoidal formula.

for what m is the error less than 10^{-6}/

Simpson integration

\mathrlap{\int\limits_x^{x+2h}}\quad\quad f(s) \,ds = \frac{h^3}{3}(f(x) + 4f(x+h) + f(x+2h)) - \frac{h^5}{90}f^{(4)}(c)