logo

Handling Strings in R 📂R

Handling Strings in R

Overview

While not to the extent commonly seen in languages popular among developers, there is a surprisingly frequent need to handle strings in R. The more vast and unruly the data, the more crucial these minor techniques become.

Tips

20180517_102158.png

  • The nchar() function simply returns the length of a string. Those familiar with other languages would probably have tried length first.
  • The substring() function, as its name easily suggests, returns a substring. In the example, it returns “God” from “Oh My God”, which is from the 7th to the 10th character.
  • The gsub() function replaces parts of the string with another string completely. Of course, it’s case-sensitive.
  • The casefold() function converts all uppercase letters to lowercase. This is particularly useful in fields like statistics or linguistics where distinction between cases might not be necessary.
  • The strsplit() function splits the given string based on a specified criterion and returns the vector. As seen, even a space can be a separator, and you can even split by each character by inputting ‘’.
  • The paste0() function concatenates given strings without spaces.

You could use the regular paste() function with options, but the code becomes much simpler and more readable.

Code

OMG<-"Oh My God"
 
nchar(OMG)
substring(OMG,7,10)
gsub('God','Girl', OMG)
casefold(OMG)
strsplit(OMG,' ')