Processing math: 100%

Function approximation with regression analysis

This online calculator uses several regression models for approximation of an unknown function given by a set of data points.

The function approximation problem is how to select a function among a well-defined class that closely matches ("approximates") a target unknown function.

This calculator uses provided target function table data in the form of points {x, f(x)} to build several regression models, namely: linear regression, quadratic regression, cubic regression, power regression, logarithmic regression, hyperbolic regression, ab-exponential regression and exponential regression. Results can be compared using the correlation coefficient, coefficient of determination, average relative error (standard error of the regression) and visually, on chart. Theory and formulas are given below the calculator, as per usual.

PLANETCALC, Function approximation with regression analysis

Function approximation with regression analysis

Digits after the decimal point: 4
Linear regression
y=0.6233x+129.5721
Linear correlation coefficient
0.7883
Coefficient of determination
0.6214
Average relative error, %
2.4459 %
Quadratic regression
y=0.0168x2+3.0867x+41.3745
Correlation coefficient
0.8164
Coefficient of determination
0.6665
Average relative error, %
2.0026 %
Cubic regression
y=0.0001x30.0392x2+4.6976x+3.1981
Correlation coefficient
0.8165
Coefficient of determination
0.6666
Average relative error, %
2.0166 %
Power regression
y=56.4834x0.2642
Correlation coefficient
0.7981
Coefficient of determination
0.6369
Average relative error, %
2.3409 %
ab-Exponential regression
y=134.44581.0036x
Correlation coefficient
0.7842
Coefficient of determination
0.6150
Average relative error, %
2.4781 %
Logarithmic regression
y=20.1773+45.6282lnx
Correlation coefficient
0.8013
Coefficient of determination
0.6421
Average relative error, %
2.3080 %
Hyperbolic regression
y=220.77423264.2257x
Correlation coefficient
0.8106
Coefficient of determination
0.6570
Average relative error, %
2.1698 %
Exponential regression
y=e4.9012+0.0036x
Correlation coefficient
0.7842
Coefficient of determination
0.6150
Average relative error, %
2.4781 %
Results
The file is very large. Browser slowdown may occur during loading and creation.
y
Linear regression
Quadratic regression
Cubic regression
Power regression
ab-Exponential regression
Logarithmic regression
Hyperbolic regression
Exponential regression
5560657075808590155160165170175180185190

Results

ixyLinear regressionQuadratic regressionCubic regressionPower regressionab-Exponential regressionLogarithmic regressionHyperbolic regressionExponential regression
53.6162.9809158.4548158.0846161.6911163.0991161.4938159.8745163.0991
157163165.1002162.6187162.5014164.3394165.1101164.3000163.5071165.1101
258164165.7235163.7694163.7038165.0961165.7063165.0936164.4944165.7063
359158166.3468164.8863164.8633165.8433166.3047165.8735165.4483166.3047
462175168.2167168.0351168.0907168.0303168.1127168.1366168.1254168.1127
564171169.4633169.9660170.0394169.4454169.3290169.5852169.7707169.3290
664172169.4633169.9660170.0394169.4454169.3290169.5852169.7707169.3290
765175170.0866170.8809170.9546170.1408169.9404170.2926170.5553169.9404
868165171.9565173.4237173.4714172.1808171.7880172.3514172.7709171.7880
969178172.5797174.2039174.2360172.8461172.4083173.0175173.4666172.4083

Linear regression

Equation:
\widehat{y}=ax+b

a coefficient
a&=\frac{\sum x_i \sum y_i- n\sum x_iy_i}{\left(\sum x_i\right)^2-n\sum x_i^2}

b coefficient
b&=\frac{\sum x_i \sum x_iy_i-\sum x_i^2\sum y_i}{\left(\sum x_i\right)^2-n\sum x_i^2}

Linear correlation coefficient
r_{xy}&=\frac{n\sum x_iy_i-\sum x_i\sum y_i}{\sqrt{\left(n\sum x_i^2-\left(\sum x_i\right)^2\right)\!\!\left(n\sum y_i^2-\left(\sum y_i\right)^2 \right)}}

Coefficient of determination
R^2=r_{xy}^2

Standard error of the regression
\overline{A}=\dfrac{1}{n}\sum\left|\dfrac{y_i-\widehat{y}_i}{y_i}\right|\cdot100\%

Quadratic regression

Equation:
\widehat{y}=ax^2+bx+c

System of equations to find a, b and c
\begin{cases}a\sum x_i^2+b\sum x_i+nc=\sum y_i\,,\\[2pt] a\sum x_i^3+b\sum x_i^2+c\sum x_i=\sum x_iy_i\,,\\[2pt] a\sum x_i^4+b\sum x_i^3+c\sum x_i^2=\sum x_i^2y_i\,;\end{cases}

Correlation coefficient
R= \sqrt{1-\frac{\sum(y_i-\widehat{y}_i)^2}{\sum(y_i-\overline{y})^2}},
where
\overline{y}= \dfrac{1}{n}\sum y_i

Coefficient of determination
R^2

Standard error of the regression
\overline{A}=\dfrac{1}{n}\sum\left|\dfrac{y_i-\widehat{y}_i}{y_i}\right|\cdot100\%

Cubic regression

Equation:
\widehat{y}=ax^3+bx^2+cx+d

System of equations to find a, b, c and d
\begin{cases}a\sum x_i^3+b\sum x_i^2+c\sum x_i+nd=\sum y_i\,,\\[2pt] a\sum x_i^4+b\sum x_i^3+c\sum x_i^2+d\sum x_i=\sum x_iy_i\,,\\[2pt] a\sum x_i^5+b\sum x_i^4+c\sum x_i^3+d\sum x_i^2=\sum x_i^2y_i\,,\\[2pt] a\sum x_i^6+b\sum x_i^5+c\sum x_i^4+d\sum x_i^3=\sum x_i^3y_i\,;\end{cases}

Correlation coefficient, coefficient of determination, standard error of the regression – the same formulas as in the case of quadratic regression.

Power regression

Equation:
\widehat{y}=a\cdot x^b

b coefficient
b=\dfrac{n\sum(\ln x_i\cdot\ln y_i)-\sum\ln x_i\cdot\sum\ln y_i }{n\sum\ln^2x_i-\left(\sum\ln x_i\right)^2 }

a coefficient
a=\exp\!\left(\dfrac{1}{n}\sum\ln y_i-\dfrac{b}{n}\sum\ln x_i\right)

Correlation coefficient, coefficient of determination, standard error of the regression – the same formulas as above.

ab-Exponential regression

Equation:
\widehat{y}=a\cdot b^x

b coefficient
b=\exp\dfrac{n\sum x_i\ln y_i-\sum x_i\cdot\sum\ln y_i }{n\sum x_i^2-\left(\sum x_i\right)^2 }

a coefficient
a=\exp\!\left(\dfrac{1}{n}\sum\ln y_i-\dfrac{\ln b}{n}\sum x_i\right)

Correlation coefficient, coefficient of determination, standard error of the regression – the same.

Hyperbolic regression

Equation:
\widehat{y}=a + \frac{b}{x}

b coefficient
b=\dfrac{n\sum\dfrac{y_i}{x_i}-\sum\dfrac{1}{x_i}\sum y_i }{n\sum\dfrac{1}{x_i^2}-\left(\sum\dfrac{1}{x_i}\right)^2 }

a coefficient
a=\dfrac{1}{n}\sum y_i-\dfrac{b}{n}\sum\dfrac{1}{x_i}

Correlation coefficient, coefficient of determination, standard error of the regression - the same as above.

Logarithmic regression

Equation:
\widehat{y}=a + b\ln x

b coefficient
b=\dfrac{n\sum(y_i\ln x_i)-\sum\ln x_i\cdot \sum y_i }{n\sum\ln^2x_i-\left(\sum\ln x_i\right)^2 }

a coefficient
a=\dfrac{1}{n}\sum y_i-\dfrac{b}{n}\sum\ln x_i

Correlation coefficient, coefficient of determination, standard error of the regression – the same as above.

Exponential regression

Equation:
\widehat{y}=e^{a+bx}

b coefficient
b=\dfrac{n\sum x_i\ln y_i-\sum x_i\cdot\sum\ln y_i }{n\sum x_i^2-\left(\sum x_i\right)^2 }

a coefficient
a=\dfrac{1}{n}\sum\ln y_i-\dfrac{b}{n}\sum x_i

Correlation coefficient, coefficient of determination, standard error of the regression – the same as above.

Derivation of formulas

Let's start from the problem:
We have an unknown function y=f(x), given in the form of table data (for example, such as those obtained from experiments).
We need to find a function with a known type (linear, quadratic, etc.) y=F(x), those values should be as close as possible to the table values at the same points. In practice, the type of function is determined by visually comparing the table points to graphs of known functions.

As a result we should get a formula y=F(x), named the empirical formula (regression equation, function approximation), which allows us to calculate y for x's not present in the table. Thus, the empirical formula "smoothes" y values.

We use the Least Squares Method to obtain parameters of F for the best fit. The best fit in the least-squares sense minimizes the sum of squared residuals, a residual being the difference between an observed value and the fitted value provided by a model.

Thus, when we need to find function F, such as the sum of squared residuals, S will be minimal
S=\sum\limits_i(y_i-F(x_i))^2\rightarrow min

Let's describe the solution for this problem using linear regression F=ax+b as an example.
We need to find the best fit for a and b coefficients, thus S is a function of a and b. To find the minimum we will find extremum points, where partial derivatives are equal to zero.

Using the formula for the derivative of a complex function we will get the following equations:
\begin{cases} \sum [y_i - F(x_i, a, b)]\cdot F^\prime_a(x_i, a, b)=0 \\ \sum [y_i - F(x_i, a, b)]\cdot F^\prime_b(x_i, a, b)=0 \end{cases}

For function F(x,a,b)=ax+b partial derivatives are
F^\prime_a=x,
F^\prime_b=1

Expanding the first formulas with partial derivatives we will get the following equations:
\begin{cases} \sum (y_i - ax_i-b)\cdot x_i=0 \\ \sum (y_i - ax_i-b)=0 \end{cases}

After removing the brackets we will get the following:
\begin{cases} \sum y_ix_i - a \sum x_i^2-b\sum x_i=0 \\ \sum y_i - a\sum x_i - nb=0 \end{cases}

From these equations we can get formulas for a and b, which will be the same as the formulas listed above.

Using the same technique, we can get formulas for all remaining regressions.

URL copied to clipboard
PLANETCALC, Function approximation with regression analysis

Comments