logo

How to Use Bootstrap Functions in R 📂R

How to Use Bootstrap Functions in R

Overview

You can write your code to perform bootstrap in R, but you can also use the functions that are provided by default. The process is simple as below, but it has many differences in how to use it compared to other functions, so it might feel very unfamiliar at first.

Guide

Step 1.

Define a function boot.fn() that returns the statistic you want to obtain. Of course, the name of the function doesn’t really matter. There must be a second option index among the arguments, such as boot.fn<-function(dataset, index).


Step 2.

Set the options data, Statistic, R in the boot() function and run it.

  • data is the dataset,
  • Statistic is the function defined in Step 1.
  • R is the number of repetitions.

Example

Let’s walk through the example step by step.

20180507\_201348.png

Default is a dataset of default status and its conditions. Obviously, the default variable is the dependent variable, and since it’s a predictive problem of whether it actually leads to default, logistic regression is appropriate.

20180507\_202245.png

The result of logistic regression is as shown above.

We will use bootstrap for the regression coefficients here.

20180507\_202409.png

The function used for bootstrap necessarily requires two arguments: dataset and index. The dataset takes in the original data, and the index will receive the tuple for resampling through bootstrap.

If you check return(), you can see that what the function returns is not the model or summary as a whole but just a vector consisting of regression coefficients. This formal definition of the function means that one might find it hard to understand the bootstrap function without a basic knowledge of R. If one is proficient in statistics to the extent of using bootstrap, it wouldn’t be a big problem, but it’s normal to feel it a bit difficult, so don’t blame yourself.

20180507\_202925.png

The boot() function takes in the dataset as data and the number of times to run as R. A common mistake is in the Statistic option, which accepts boot.fn(), returning a ‘vector’, not the ‘function itself’ as boot.fn.

If you keep getting errors, the issue is mostly here. Viewing the results, you can see that the coefficients, from the first regression coefficient as t1, t2, t3, t4, and the coefficients or standard errors are very similar to the results of the initial logistic regression analysis. What bootstrap aims to obtain here is the point estimate, which is not the original but the std. error. As you can compare, the original is just the regression coefficients obtained from the entire data, which is utterly unrelated to bootstrap itself and unnecessary.

Code

Here is the example code. In the case of bootstrap, since the way to use it is so unique, if you don’t quite understand, you can just change the example to fit your case and use it.