Input an expression and R will print the result immediately.
When assignment operator “<-“, “->” is used, R will store the result and not print it unless you type the variable name or call print() function.
Comment sign: “#”
> x <- 5
> x ## Or print(x)
[1] 5
A bit similar to terminal command line tools.
Basic commands:
More functions about directory and files:
Tab completion works in R as well.
Atomic classes of objects in R:
Numbers in R are generally treated as numeric numbers. Specify the “L” suffix if you explicitly want an integer, e.g. 1L. Inf and NaN are also defined in R.
R objects can have attributes: names, dimnames, dimensions, class, length and others. They can be accessed by attributes() function.
Colon operator “:” is the most common one used to create a sequence.
> 1:10
[1] 1 2 3 4 5 6 7 8 9 10
> pi:10
[1] 3.141593 4.141593 5.141593 6.141593 7.141593 8.141593 9.141593
> 15:1
[1] 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
the bracket [] above indicates that x is a vector (which contains elements of the same class), and the element follows it, which is 1, is the first element of the vector. If it is printed in two lines as below, you’ll see
[1] 15 14 13 12 11 10 9 8
[9] 7 6 5 4 3 2 1
Seq() function does similar work. Advantages are seq() can control increment and length, e.g.
> seq(1, 5, by = 0.5)
[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
> seq(1, 10, length = 7)
[1] 1.0 2.5 4.0 5.5 7.0 8.5 10.0
Rep() (replicate) is another function to create a sequence.
> rep(0, times = 10)
[1] 0 0 0 0 0 0 0 0 0 0
> rep(c(0, 1, 2), times = 5)
[1] 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2
> rep(c(0, 1, 2), each = 5)
[1] 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2
Vector is the most common object in R. And it can only contain objects of the same class. List is similar to vector but can contain objects of different classes.
The c() function (combine / concatenate) can be used to create vectors.
> x <- c(0.5, 0.6)
> x <- c(“a”, “b”, “c”)
> x <- c(1+0i, 3+4i)
The vector() function works as well.
> x <- vector(“numeric”, length = 10)
then the vector x will be initialized with default value.
Vectors can be used in arithmetic expression. Common arithmetic operators include “+”, “-“, “*”, “/”, “^” (power), sqrt(), abs(), etc. e.g.
> z <- c(1, 2, 3)
> z + 100
[1] 101 102 103
> sqrt(z - 1)
[1] 0.000000 1.000000 1.414214
Other operations for vectors include max, min, range (return c(min, max)), length, sum, prod, mean (return average), var (return variance), sort, etc.
When two vectors of the same length are involved in arithmetic expression, R will perform the operations element by element (vectorized operations).
If they are of different lengths, R will cycle in the shorter vector (Note that a single number can be viewed as a vector of length 1). And R will give a warning if the short length does not divide the long length. e.g.
> x <- c(1, 2, 3, 4, 5, 6)
> y <- c(1, 10, 1, 10, 1, 10)
> x + y
[1] 2 12 4 14 6 16
> y <- c(1, 10, 100)
> x + y
[1] 2 12 103 5 15 106
> y <- c(1, 10, 100, 1000)
> x + y
[1] 2 12 103 1004 6 16
Warning message:
In x + y : 长的对象长度不是短的对象长度的整倍数
Logical vectors:
> x <- c("a", "b", "c", "c", "d", "a")
> u <- x > "a"
> u
[1] FALSE TRUE TRUE TRUE TRUE FALSE
Logical operators: >, <, ==, >=, <=, !=, &, |, !, xor()
And we have && and || which only evaluates the first element of each operand.
Character vectors can be combined using both c() and paste() functions.
> my_char <- c("My", "name", "is")
> paste(my_char, collapse = " ")
[1] "My name is"
> c(my_char, "Niwatori")
[1] "My" "name" "is" "Niwatori"
> paste("Hello", "world!", sep = " ")
[1] "Hello world!"
> paste(1:3, c("X", "Y", "Z"), sep = "")
[1] "1X" "2Y" "3Z"
When you try to mix objects of different classes in a vector, implicit coercion will happen to turn objects into the same class. (Coercion principle?)
> c(1.7, “a”)
[1] “1.7” “a”
> c(TRUE, 2)
[1] 1 2
Explicit coercion can happen by using as.* function.
> x <- 0:4
> as.numeric(x)
[1] 0 1 2 3 4
> as.character(x)
[1] “0” “1” “2” “3” “4”
Lists are similar to vectors except that lists can contain objects of different classes, and every object in the list occupies a single vector.
> list(1, “a”, TRUE, 1+4i)
[[1]]
[1] 1
[[2]]
[1] "a"
[[3]]
[1] TRUE
[[4]]
[1] 1+4i
Missing values are denoted by NA (Not Available) or NaN (Not a Number) for undefined mathematical operations.
NaN will occur if you try to compute 0 / 0 or Inf – Inf, where Inf stands for infinity.
The function is.na() is used to test objects if they are NA, and is.nan() is used to test for NaN. A NaN value is also NA but not vice versa.
> x <- c(1, 2, NA, NaN, 3)
> is.na(x)
[1] FALSE FALSE TRUE TRUE FALSE
> is.nan(x)
[1] FALSE FALSE FALSE TRUE FALSE
Note the command “x == NA” does NOT perform identically as “is.na(x)”. For “x == NA”, each element in x is compared with NA, yielding an incomplete expression which returns NA as an indefinite value, i.e.
> x == NA
[1] NA NA NA NA NA
To remove missing values, logical vectors with is.na() and complete.cases() functions are often used.
> x <- c(1, 2, NA, 4, NA, 6)
> x[!is.na(x)]
[1] 1 2 4 6
> y <- c("a", NA, "c", "d", NA, "f")
> good <- complete.cases(x, y)
> x[good]
[1] 1 4 6
> y[good]
[1] "a" "d" "f"
> myd ## A data frame
Names First Second Third
1 Alice 1 2 3
2 Bob 2 3 4
3 Carol NA 4 5
4 Dave 4 NA 6
> good <- complete.cases(myd)
> myd[good, ]
Names First Second Third
1 Alice 1 2 3
2 Bob 2 3 4
For subsetting vectors, single square bracket operator [] is most commonly used.
> x <- c(1, 2, 3, 4, 5, 5, 5, 5, 5, NA, NA, NA, 6, 7, 8, 9)
> x[2] ## Positive integer index
[1] 2
> x[1:5]
[1] 1 2 3 4 5
> x[c(3, 5, 7, 9, 11)]
[1] 3 5 5 5 NA
> y <- x[!is.na(x)] ## Logical index
> y
[1] 1 2 3 4 5 5 5 5 5 6 7 8 9
> y[y > 5]
[1] 6 7 8 9
> x[!is.na(x) & x > 5]
[1] 6 7 8 9
Which() function will produce the indices of the elements which make the expression true.
You’ll get nothing useful if you ask for numbers whose indices are 0 or greater than the bound of the vector. Be cautious! But negative indices do make sense.
> x <- 1:10
> x[c(-2, -7)] ## Negative integer index
[1] 1 3 4 5 6 8 9 10 ## All numbers except x[2] & x[7]
> x[-c(2, 7)] ## Putting the negative sign in front also works
[1] 1 3 4 5 6 8 9 10
Modifying subsets:
> x <- c(-2:5, rep(NA, 4))
> x
[1] -2 -1 0 1 2 3 4 5 NA NA NA NA
> x[is.na(x)] <- -1
> x
[1] -2 -1 0 1 2 3 4 5 -1 -1 -1 -1
> x[x < 0] <- -x[x < 0] ## Same as x <- abs(x)
> x
[1] 2 1 0 1 2 3 4 5 1 1 1 1
R objects can have names for writing readable code.
Names of vectors can be accessed and set with names() function.
> x <- c(foo = 1, bar = 2, norf = 3)
> x
foo bar norf
1 2 3
> names(x)
[1] "foo" "bar" "norf"
or can be implemented as
> x <- c(1, 2, 3)
> names(x) <- c("foo", "bar", "norf")
> x
foo bar norf
1 2 3
Now we can subset the vector through names.
> x["bar"]
bar
2
> x[c("foo", "bar")]
foo bar
1 2
Other operators used for extracting subsets of R objects:
Examples of [[]] and $ operators for subsetting lists:
> x <- list(1:4, 0.6)
> x
[[1]]
[1] 1 2 3 4
[[2]]
[1] 0.6
> x[1] ## Returns a list containing a numeric vector
[[1]]
[1] 1 2 3 4
> x[[1]] ## Returns simply a numeric vector
[1] 1 2 3 4
> names(x) <- c("foo", "bar")
> x
$foo
[1] 1 2 3 4
$bar
[1] 0.6
> x$foo ## x$foo == x[["foo"]] == x[[1]]
[1] 1 2 3 4
> x[1:2] ## Returns a list
$foo
[1] 1 2 3 4
$bar
[1] 0.6
Differences between [[]] and $ operators when subsetting by names:
The [[]] operator can take a integer sequence to extract a single element from nested lists, equivalent to using bracket operators multiple times.
> x <- list(a = list(2, 3, 4), b = c(5, 6))
> x[[1]][1]
[[1]]
[1] 2
> x[[1]][[1]]
[1] 2
> x[[c(1, 1)]]
[1] 2
> x$a[[1]]
[1] 2
Matrices are vectors with a dimension attribute, which is an integer vector of length 2 (nrow, ncol). So the first way to create a matrix from a vector is to add dimension attribute. Note matrices are constructed column-wise.
> m <- 1:10
> dim(m) <- c(2, 5)
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
Matrices can also be created using matrix() function.
> m <- matrix(1:6, nrow = 2, ncol = 3)
> m
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
Matrices can be created by column-binding or row-binding with cbind() or rbind() function.
> x <- 1:3
> y <- 10:12
> cbind(x, y)
x y
[1,] 1 10
[2,] 2 11
[3,] 3 12
> rbind(x, y)
[,1] [,2] [,3]
x 1 2 3
y 10 11 12
Matrices can be subsetted with x[i, j] type indices, where i and j can be missing.
When a single element of a matrix is extracted, it is returned as a vector of length 1 rather than a 1 x 1 matrix. This behavior can be turned off by setting drop = FALSE. Similar for extracting a single row or a single column.
> x <- matrix(1:6, 2, 3)
> x
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> x[1, 2]
[1] 3
> x[1, ]
[1] 1 3 5
> x[1, , drop = FALSE]
[,1] [,2] [,3]
[1,] 1 3 5
Vectorized operations work for matrices as well. Note x*y yields a matrix with entries of x multiplied by entries of y respectively, while x %*% y is the real matrix multiplication.
> x <- matrix(1:4, 2, 2)
> y <- x
> x * y
[,1] [,2]
[1,] 1 9
[2,] 4 16
> x %*% y
[,1] [,2]
[1,] 7 15
[2,] 10 22
Names of matrices can be set with dimnames() attribute, which must be a list containing names of rows and columns.
> x <- matrix(1:6, nrow = 2, ncol = 3)
> x
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> dimnames(x) <- list(c("r1", "r2"), c("c1", "c2", "c3"))
> x
c1 c2 c3
r1 1 3 5
r2 2 4 6
Similar to matrices, data frames are used to store tabular data as well, but data frames can contain objects of different classes while matrices cannot.
Data frames have attributes called rownames() and colnames(), which will be 1, 2, 3, etc. by default.
> my_matrix <- matrix(1:20, nrow = 4, ncol = 5)
> patients <- c("Bill", "Gina", "Kelly", "Sean")
> cbind(patients, my_matrix)
## Wrong! Implicit coercion from numeric to character
[1,] "Bill" "1" "5" "9" "13" "17"
[2,] "Gina" "2" "6" "10" "14" "18"
[3,] "Kelly" "3" "7" "11" "15" "19"
[4,] "Sean" "4" "8" "12" "16" "20"
> my_data <- data.frame(patients, my_matrix)
> my_data
patients X1 X2 X3 X4 X5
1 Bill 1 5 9 13 17
2 Gina 2 6 10 14 18
3 Kelly 3 7 11 15 19
4 Sean 4 8 12 16 20
> colnames(my_data) <- c("patient", "age", "weight", "bp", "rating", "test")
> my_data
patient age weight bp rating test
1 Bill 1 5 9 13 17
2 Gina 2 6 10 14 18
3 Kelly 3 7 11 15 19
4 Sean 4 8 12 16 20
Factors are used to represent categorical data like a label with a levels attribute.
> x <- factor(c("y", "y", "n", "y", "n"))
> x
[1] y y n y n
Levels: n y
> table(x) ## Show how many objects of each level
x
n y
2 3
> unclass(x) ## Strip the classes out of objects
[1] 2 2 1 2 1
attr(,"levels")
[1] "n" "y"
The order of the levels can be set using levels arguments to factor() or modifying levels() attribute. This can be important because the first level sometimes is set as the baseline level, e.g.
> x <- factor(c("y", "y", "n", "y", "n"), levels = c("y", "n"))
> x
[1] y y n y n
Levels: y n
The most commonly used function to read tabular data is read.table() and read.csv(). The two functions are almost identical except that the separator for the former is the space while for the latter is the comma.
Read.table() function takes quite a few parameters, many of which have default values. But specifying these options instead of using default can make it run faster.
> data <- read.table("foo.txt")
Dump() and dput() function can result in textual format which preserves the metadata though sacrificing some readability and memory. Textual format frees other users from specifying the data all over again, and it makes data potentially recoverable in case of corruption.
> y <- data.frame(a = 1, b = "a")
> dput(y)
structure(list(a = 1, b = structure(1L, .Label = "a", class = "factor")), .Names = c("a",
"b"), row.names = c(NA, -1L), class = "data.frame")
> dput(y, file = "test.R")
> newy <- dget("test.R")
> newy
a b
1 1 a
Dput() and dget() is used to write and read data in textual format. Dump() and source() have similar functions, but the difference is that they are used for multiple objects.
> x <- "foo"
> y <- data.frame(a = 1, b = "a")
> dump(c("x", "y"), file = "test.R")
> rm(x, y) ## Remove variables x and y
> source("test.R")
> x
[1] "foo"
> y
a b
1 1 a
Data are read in through connection interfaces.
File() function takes a few parameters, among which description, the name of the file, and the open options, are most commonly used. For open options, there are “r”, “w”, “a”, “rb”, “wb”, “ab” for reading, writing and appending only or in binary mode.
Here are two examples:
> con <- file(“foo.txt”, “r”)
> data <- read.csv(con)
> close(con)
is the same as
> data <- read.csv(“foo.txt”)
Reading webpages:
> con <- url(“http://www.baidu.com/”, “r”)
> x <- readlines(con)
> head(x)
[1] " " ...
[2] "