Character strings in R (2024)

Posted on February 19, 2014 by thiagogm in R bloggers | 0 Comments

[This article was first published on Thiago G. Martins » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post deals with the basics of character strings in R. My main reference has been Gaston Sanchez‘s ebook [1], which is excellent and you should read it if interested in manipulating text in R. I got the encoding’s section from [2], which is also a nice reference to have nearby. Text analysis will be one topic of interest to this Blog, so expect more posts about it in the near future.

Creating character strings

The class of an object that holds character strings in R is “character”. A string in R can be created using single quotes or double quotes.

chr = 'this is a string'chr = "this is a string"chr = "this 'is' valid"chr = 'this "is" valid'

We can create an empty string with empty_str = "" or an empty character vector with empty_chr = character(0). Both have class “character” but the empty string has length equal to 1 while the empty character vector has length equal to zero.

empty_str = ""empty_chr = character(0)class(empty_str)[1] "character"class(empty_chr)[1] "character"length(empty_str)[1] 1length(empty_chr)[1] 0

The function character() will create a character vector with as many empty strings as we want. We can add new components to the character vector just by assigning it to an index outside the current valid range. The index does not need to be consecutive, in which case R will auto-complete it with NA elements.

chr_vector = character(2) # create char vectorchr_vector[1] "" ""chr_vector[3] = "three" # add new elementchr_vector[1] "" "" "three"chr_vector[5] = "five" # do not need to # be consecutivechr_vector[1] "" "" "three" NA "five" 

Auxiliary functions

The functions as.character() and is.character() can be used to convert non-character objects into character strings and to test if a object is of type “character”, respectively.

Strings and data objects

R has five main types of objects to store data: vector, factor, multi-dimensional array, data.frame and list. It is interesting to know how these objects behave when exposed to different types of data (e.g. character, numeric, logical).

  • vector: Vectors must have their values all of the same mode. If we combine mixed types of data in vectors, strings will dominate.
  • arrays: A matrix, which is a 2-dimensional array, have the same behavior found in vectors.
  • data.frame: By default, a column that contains a character string in it is converted to factors. If we want to turn this default behavior off we can use the argument stringsAsFactors = FALSE when constructing the data.frame object.
  • list: Each element on the list will maintain its corresponding mode.
# character dominates vectorc(1, 2, "text") [1] "1" "2" "text"# character dominates arraysrbind(1:3, letters[1:3]) [,1] [,2] [,3][1,] "1" "2" "3" [2,] "a" "b" "c" # data.frame with stringsAsFactors = TRUE (default)df1 = data.frame(numbers = 1:3, letters = letters[1:3])df1 numbers letters1 1 a2 2 b3 3 cstr(df1, vec.len=1)'data.frame': 3 obs. of 2 variables: $ numbers: int 1 2 ... $ letters: Factor w/ 3 levels "a","b","c": 1 2 ...# data.frame with stringsAsFactors = FALSEdf2 = data.frame(numbers = 1:3, letters = letters[1:3], stringsAsFactors = FALSE)df2 numbers letters1 1 a2 2 b3 3 cstr(df2, vec.len=1)'data.frame': 3 obs. of 2 variables: $ numbers: int 1 2 ... $ letters: chr "a" ...# Each element in a list has its own typelist(1:3, letters[1:3])[[1]][1] 1 2 3[[2]][1] "a" "b" "c"

Character encoding

R provides functions to deal with various set of encoding schemes. The Encoding() function returns the encoding of a string. iconv() converts the encoding.

chr = "lá lá"Encoding(chr)[1] "UTF-8"chr = iconv(chr, from = "UTF-8", to = "latin1")Encoding(chr)[1] "latin1"

References:

[1] Gaston Sanchez’s ebook on Handling and Processing Strings in R.
[2] R Programming/Text Processing webpage.

Character strings in R (1)

Related

To leave a comment for the author, please follow the link and comment on their blog: Thiago G. Martins » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Character strings in R (2024)
Top Articles
Latest Posts
Article information

Author: Jonah Leffler

Last Updated:

Views: 6015

Rating: 4.4 / 5 (65 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Jonah Leffler

Birthday: 1997-10-27

Address: 8987 Kieth Ports, Luettgenland, CT 54657-9808

Phone: +2611128251586

Job: Mining Supervisor

Hobby: Worldbuilding, Electronics, Amateur radio, Skiing, Cycling, Jogging, Taxidermy

Introduction: My name is Jonah Leffler, I am a determined, faithful, outstanding, inexpensive, cheerful, determined, smiling person who loves writing and wants to share my knowledge and understanding with you.