How to View Simple Regression Analysis Results in R 📂Statistical Analysis

How to View Simple Regression Analysis Results in R

Practice

How to Do Regression Analysis

head(faithful)

In R, load the built-in data faithful using head() function and check.

It’s difficult to confirm if there’s a linear relationship between the two variables just by looking at the data, so let’s plot it to see.

win.graph(6,3)
par(mfrow=c(1,2))
plot(faithful, main ="faithful",asp=T)
plot(faithful, main ="faithful")
points(head(faithful),col='red',pch=19)

The one on the left is with the aspect ratio kept constant, which is accurate but hard to read. The one on the right adjusted the ratio for easier viewing, in which the red dots represent the first six data points confirmed with the head() function.

Seeing the pattern of the points, we can suspect that the two variables have a linear relationship, so let’s verify it by doing a simple regression analysis. For regression analysis, you must input a linear model into the lm() function. In this case, you would analyze waiting with eruptions as shown below.

out1<-lm(waiting~eruptions,data=faithful); summary(out1)

Interpreting the Results

The summary() function displays a summary of the regression analysis results.

Parts not highlighted in the box are relatively unimportant, especially in simple regression analysis.

(1) Residuals: No need to know. Although needed to check if the assumptions of regression analysis are met for residual analysis, it’s not crucial for simple regression.
(2) Estimate: This is the regression coefficient, where (Intercept) represents the $y$ y-intercept, and eruptions represents the slope. It can be expressed as $\text{(waiting)} = 33.4744 + 10.7296 \cdot \text{(eruptions)} + \varepsilon$
(3) Std. Error: No need to know. It’s the standard error of the estimate, which can be used to calculate confidence intervals for the regression coefficients.
(4) t value: No need to know. The t value is the estimate divided by the standard error, producing a test statistic following a t-distribution with $n-2$ degrees of freedom. This allows for hypothesis testing to see if the regression coefficient is statistically significant.
(5) Pr(>|t|): A small value indicates that the regression coefficient is significant, which is good if you want to show a correlation. The smaller the value, the better. According to the value, dots are marked following the Signif. codes below. Usually, a significance level of 5% is considered adequate; even one dot means there is a correlation. If this value is large, indicating no regression relationship exists, the statistical significance of how the regression coefficient was determined is nil. This p-value comes from a t-distribution, hence pr(>|t|) is an appropriate term.
(6) Multiple R-squared: This metric shows how well the analysis explains the data; the higher, the better. In this case, it’s fine to understand that it explains 81% of the total data.

Visualization

Now, let’s draw the regression line ourselves to visually check if the regression analysis was done correctly.

win.graph(6,3)
par(mfrow=c(1,2))
plot(faithful, main ="faithful",asp=T)
abline(out1,col='#47C83E',lwd=2)
plot(faithful, main ="faithful")
abline(out1,col='#47C83E',lwd=2)

The green line in the above graph represents the regression line, effectively passing through the data.

Code

Below is the complete example code.

head(faithful)
 
win.graph(6,3)
par(mfrow=c(1,2))
plot(faithful, main ="faithful",asp=T)
plot(faithful, main ="faithful")
points(head(faithful),col='red',pch=19)
 
out1<-lm(waiting~eruptions,data=faithful); summary(out1)
 
win.graph(6,3)
par(mfrow=c(1,2))
plot(faithful, main ="faithful",asp=T)
abline(out1,col='#47C83E',lwd=2)
plot(faithful, main ="faithful")
abline(out1,col='#47C83E',lwd=2)