[R]-Data structures

Cite: http://adv-r.had.co.nz/Data-structures.html

R's base data structures can be organised by their dimensionality (1d, 2d, or nd) and whether they're homogeneous (all contents must be of the same type) or heterogeneous (the contents can be of different types). This gives rise to the five data types most often used in data analysis:

Homogeneous Heterogeneous
1d (vector) Atomic vector List
2d Matrix Data frame
nd Array -

Note that R has no 0-dimensional, or scalar types. Individual numbers or strings, which you might think would be scalars, are actually vectors of length one.

Given an object, the best way to understand what data structures it’s composed of is to use str():

vector and matrix are just aliases for one- and two-dimensional array respectively.

Vector

The basic data structure in R is the vector. Vectors come in two flavours: atomic vector and list. They have three common properties:

  • Type, typeof(), what it is.
  • Length, length(), how many elements it contains.
  • Attributes, attributes(), additional arbitrary metadata.
Atomic vector

There are four common types of atomic vectors: logical, integer, double (often called numeric), and character. There are two rare types that I will not discuss further: complex and raw. Atomic vectors are usually created with c(), short for combine.

Atomic vectors are always flat, even if you nest c()’s:

c(1, c(2, c(3, 4)))
#> [1] 1 2 3 4
# the same as
c(1, 2, 3, 4)
#> [1] 1 2 3 4

Given a vector, you can determine its type with typeof(), or check if it's a specific type with an "is" function:

is.character()
is.double()
is.integer()
is.logical()
# or, more generally
is.atomic()

# examples
int_var <- c(1L, 6L, 10L)
typeof(int_var)
#> [1] "integer"
is.integer(int_var)
#> [1] TRUE
is.atomic(int_var)
#> [1] TRUE

dbl_var <- c(1, 2.5, 4.5)
typeof(dbl_var)
#> [1] "double"
is.double(dbl_var)
#> [1] TRUE
is.atomic(dbl_var)
#> [1] TRUE

is.numeric() 相当于 is.integer() | is.double():

is.numeric(int_var)
#> [1] TRUE
is.numeric(dbl_var)
#> [1] TRUE
List

You construct lists by using list() instead of c():

x <- list(1:3, "a", c(TRUE, FALSE, TRUE), c(2.3, 5.9))
str(x)
#> List of 4
#>  $ : int [1:3] 1 2 3
#>  $ : chr "a"
#>  $ : logi [1:3] TRUE FALSE TRUE
#>  $ : num [1:2] 2.3 5.9

Lists are sometimes called recursive vectors, because a list can contain other lists:

x <- list(list(list(list())))
str(x)
#> List of 1
#>  $ :List of 1
#>   ..$ :List of 1
#>   .. ..$ : list()
is.recursive(x)
#> [1] TRUE

c() will combine several lists into one. If given a combination of atomic vectors and lists, c() will coerce the vectors to lists before combining them. Compare the results of list() and c():

x <- list(list(1, 2), c(3, 4))
y <- c(list(1, 2), c(3, 4))
str(x)
#> List of 2
#>  $ :List of 2
#>   ..$ : num 1
#>   ..$ : num 2
#>  $ : num [1:2] 3 4
str(y)
#> List of 4
#>  $ : num 1
#>  $ : num 2
#>  $ : num 3
#>  $ : num 4

You can turn a list into an atomic vector with unlist(). If the elements of a list have different types, unlist() uses the same coercion rules as c().

Lists are used to build up many of the more complicated data structures in R. For example, both data frames (described in data frames) and linear models objects (as produced by lm()) are lists:

is.list(mtcars)
#> [1] TRUE

mod <- lm(mpg ~ wt, data = mtcars)
is.list(mod)
#> [1] TRUE

你可能感兴趣的:([R]-Data structures)