Nonlinear Regression Analysis: Variable Transformation in Regression Analysis
Overview 1
Regression analysis is essentially a method to elucidate the linear relationships among variables; however, if necessary, the data can be ‘flattened’ to analyze them linearly. This inherently involves explaining the dependent variable through a nonlinear combination of independent variables.
Practice
Let’s load the built-in data Pressure
.
Statistical analysis of the Pressure
data is in fact unnecessary.
This is merely a natural phenomenon which requires mathematical proof just as much as it does experiments, and there’s a greater interest in ‘causality’. However, if one wishes to analyze it through regression analysis, it should be possible to do so. However, a regression formula for the above data $$(\text{pressure}) = \beta_{0 } + \beta_{1 } ( \text{temperature} ) + \varepsilon$$ would likely be meaningless. Therefore, let’s use the independent variable with taken logs as $\log ( \text{temperature} )$ and the dependent variable with taken logs as $\log (\text{pressure})$ as new variables. Let’s refer to the newly formed regression formula as $$\log (\text{pressure}) = \log {b_{0 }} + {b_{1 }} \log ( \text{temperature} )+ \varepsilon$$. If this analysis is carried out correctly, $$\log (\text{pressure}) = \log {b_{0 }} ( \text{temperature} )^{{b_{1 }}}+ \varepsilon$$ therefore, by undoing the logs on both sides, $$ (\text{pressure}) = b_{0 } ( \text{temperature} )^{{b_{1 }}} + \varepsilon $$ it would be possible to derive a relationship like this. Naturally, such a formula also appears in actual physics in a similar form. If each regression analysis is conducted and the regression line is drawn, it looks as follows.
The mathematical manipulation of taking and undoing logs is a human task.
- On the left, an attempt was made to analyze the data without any handling, but it failed to meet any of the conditions required for regression analysis, rendering it meaningless.
- On the right, a log scale was applied, and it is observed to remarkably and accurately demonstrate the trend of the data with very high explanatory power. Strictly speaking, there might be issues with residual analysis preventing it from being statistically perfect, but it’s still true that it sufficiently explains the data well.
Code
Below is an example code in R.
pressure; ?pressure
win.graph(4,4)
plot(pressure,main='Pressure\')
win.graph(8,4); par(mfrow=c(1,2))
out1<-lm(temperature~pressure,data=pressure); summary(out1)
plot(pressure,main='scale\'); abline(out1,col='red')
y<-pressure[-1,]$pressure; logtemp<-log(y)
x<-pressure[-1,]$temperature; logpress<-log(x)
out2<-lm(logtemp~logpress); summary(out2)
plot(logpress,logtemp,main='log scale\'); abline(out2,col='red')
See Also
Hadi. (2006). Regression Analysis by Example(4th Edition): p152. ↩︎