Handling Strings in R
Overview
While not to the extent commonly seen in languages popular among developers, there is a surprisingly frequent need to handle strings in R. The more vast and unruly the data, the more crucial these minor techniques become.
Tips
- The
nchar()
function simply returns the length of a string. Those familiar with other languages would probably have triedlength
first. - The
substring()
function, as its name easily suggests, returns a substring. In the example, it returns “God” from “Oh My God”, which is from the 7th to the 10th character. - The
gsub()
function replaces parts of the string with another string completely. Of course, it’s case-sensitive. - The
casefold()
function converts all uppercase letters to lowercase. This is particularly useful in fields like statistics or linguistics where distinction between cases might not be necessary. - The
strsplit()
function splits the given string based on a specified criterion and returns the vector. As seen, even a space can be a separator, and you can even split by each character by inputting ‘’. - The
paste0()
function concatenates given strings without spaces.
You could use the regular paste()
function with options, but the code becomes much simpler and more readable.
Code
OMG<-"Oh My God"
nchar(OMG)
substring(OMG,7,10)
gsub('God','Girl', OMG)
casefold(OMG)
strsplit(OMG,' ')