Approximations Of The Inverse Error Function

Introduction

When one is dealing with statistic or generating random numbers one eventually encounters both the error function and the inverse error function. While some programming languages have the error function included in their math standard libraries, typically the inverse error function is excluded. In this article I will be talking about the inverse error function, some of its uses and go over multiple different methods to approximate it.

Definition

The error function (see Figure 1) is the result of integrating a normalized Gaussian function (Normal Distribution).

The error function, \(erf(x)\) is defined as:
\(erf(x)\equiv \frac{2}{\sqrt{\pi}} \int_{0}^{x}e^{-t^2}dt \)

Figure 1: The error function.

The inverse error function (see Figure 2), \(erf^{-1}(x)\), needs to satisfy:
\(erf(erf^{-1}(x))=x\)
\(erf^{-1}(erf(x))=x\)

Figure 2: The inverse error function. It ranges from \(-1 \leq x \leq 1 \).

Unfortunately, there is no closed form expression (evaluates in a finite number of operations) of the inverse error function. Instead the inverse error function is typically defined by an infinite series expansion:
\(erf^{-1}(x)=\sum _{k=0}^{\infty }{\frac {c_{k}}{2k+1}}\left({\frac {\sqrt {\pi }}{2}}x\right)^{2k+1}\)
Where \(c_{0} = 1\) and \(c_{k}\) is:
\(c_{k}=\sum _{m=0}^{k-1}{\frac {c_{m}c_{k-1-m}}{(m+1)(2m+1)}}\)
The first few terms of the inverse error function are:
\(erf^{-1}(x)=\frac{\sqrt {\pi}}{2} \left( x+ \frac{\pi}{12}x^{3} +\frac{7\pi ^{2}}{480}x^{5}+{\frac {127\pi ^{3}}{40320}}x^{7}+{\frac {4369\pi ^{4}}{5806080}}x^{9}+\cdots \right) \)

Why is it important?

At first, second and even third glance it is very hard to see the importance of the inverse error function. What makes the inverse error function so important is that it allows you to take a perfectly flat random distribution and transform it into a Gaussian distribution.
As seen in Figure 3, we generated a histogram by generating random numbers between [-1,1] using a flat distribution. We then took each of those random numbers and put it through the inverse error function. Generating a histogram we see that the flat distribution transformed into a Gaussian distribution.

\(\overset{erf^{-1}}{\longrightarrow}\)

Figure 3: The inverse error function transforms a flat random generator into a Gaussian random generator.

Besides this useful result the inverse error function is used extensively in diffusion calculations and statistics since the inverse error function is the CDF of the normal distribution.

Floating Point Precision and Error

Since we don’t have an equation for the inverse error function that is easily computed we need a way to decide if the accuracy of an approximation function is good enough. What does “good enough” mean?
The answer depends on what type of floating point number you are using. As long as the approximation is more accurate than the floating point number’s precision it can’t be distinguished from the real equation.

Type Total Bits Sign Bits Exponent Bits Significand Bits Precision
Half 16 1 5 10 9.77E-4
Single 32 1 8 23 1.19E-7
Double 64 1 11 52 2.22E-16

The table above lists the highest amount of precision each floating point type is capable of. Therefore for a good approximation all we have to ensure is that the error in the approximation is less than the precision of the floating point number we are using.

Calculating the Error

The absolute error is the difference between the calculated value \(x_0\) and the real value \(x\):
\(ABS_{Error} = x_0 – x\)
Unfortunatly, as previously stated we don’t have a way to directly calculate the real value of the inverse error function. However, by definition \(erf^{-1}(erf(x))=x\). Since we can calculate \(erf(x)\), the \(ABS_{Error}\) can be calculated as:
\(ABS_{Error}(x) = erf^{-1}(erf(x)) – x\)

Multiple approximations of inverse error function

Now that we have a way to evaluate approximations we will be looking at four common methods to approximate the inverse error function.

Finite Series Approximation

While the inverse error function is defined by an infinite series expansion, the obvious question becomes how many terms does one need to use in order to have a good approximation.

Figure 4: Finite series approximation of the error function.

Figure 5: Zoomed look at the tail end of various finite series approximations.

Figure 6: ABS Error for finite series approximation. Notice that the finite series needs to have around 500th order polynomial before it is accurate for double precision floating point numbers.

Newton Refinement Approximation

This approximation relies on a magic number initial approximation but is then refined using Newton’s method. This algorithm is found in many high end mathematic programs such as Matlab. The one drawback of this method is that it requires multiple calls to the error function slowing down the calculation.

Figure 7: Newton Refinement approximation of the error function. Graphed from 0 to 1 since \(erf^{-1}(-x) = -erf^{-1}(x)\).

Figure 8: Zoomed look at the tail end of various Newton Refinement approximations. There is no noticeable difference between single and double refinement.

Figure 9: ABS Error for Newton Refinement approximations. Notice that the double refinement is needed to drop the error to the level below double precision floating point numbers while single refinement can be used with single precision floating point numbers.

Halley Refinement Approximation

Found in Numerical Recipes Third Edition p265, this approximation finds the complimentary inverse error function \(erfc^{-1}(x)\). The complimentary inverse error function can be easily changed to the inverse error function since \(erf^{-1}(x) = erf^{-1}(1-x)\). This approximation uses Halley’s Method in order to find a better approximation to the equation.

Figure 10: Halley Refinement approximation of the error function.

Figure 11: Zoomed look at the tail end of various Halley Refinement approximations. There is no noticeable difference between single and double refinement.

Figure 12: ABS Error for Halley Refinement approximations. Notice that the double refinement is needed to drop the error to the level below double precision floating point numbers while single refinement can be used with single precision floating point numbers.

Figure 13: Zoomed out ABS Error for Halley Refinement approximations.

M. Giles Approximation

Defined in Mike Giles paper, “Approximating the erfinv function”, this approximation is targeted directly at GPUs which rely primarily single (float) point numbers to do their calculations. He sacrificed accuracy in order to dramatically increase the speed at which the function calculates. There is a noticeable deviation in the equation at \(\pm\)0.627271, this is due to jumping to the other equation in the if statement.

Since GPUs calculate in parallel, one of the constraints in the design of this approximation was to make each branch of the if statement take the same amount of time. Due to this constraint the accuracy of the 2nd branch greatly suffered. The 2nd equation doesn’t even have enough accuracy to be used with half precision floating point numbers while the first is accurate for single precision floating point numbers.

Figure 14: M. Giles approximation of the error function. Jump in the piece wise equation occurs at \(\pm\)0.627271.

Figure 15: ABS Error for M. Giles approximations. Notice that between -0.627271 \(\leq x \leq\) 0.627271 the error is acceptable for single precision floating point numbers. However outside that range the equation does not have high enough precision for even half precision floating point numbers.

Figure 16: Zoomed out ABS Error for M. Giles approximations.

Conclusion

If you require absolute accuracy use Newton approximation. To increase it’s speed with a very small hit to accuracy use a single refinement method instead of double refinement. If all you care about is speed and just want an approximation that is somewhat close to the true value, use M. Giles method.