the term factor refers to statistical data type used to
head()
enables you to show the first observations of a data frame. Similarly, the function tail()
prints out the last observations in your data set.
The function str()
shows you the structure of your data set. For a data frame it tells you:
- The total number of observations (e.g. 32 car types)
- The total number of variables (e.g. 11 car features)
- A full list of the variables names (e.g.
mpg
,cyl
... ) - The data type of each variable (e.g.
num
) - The first observations
creating a data frame
data.frame()
similar to vectors and matrices, you select elements from a data frame with the help of square brackets[]
my_df[1,2]
selects the value at the first row and second column in my_df
.
my_df[1:3,2:4]
selects rows 1, 2, 3 and columns 2, 3, 4 in my_df
.
my_df[1, ]
selects all elements of the first row.
It is often easier to just make use of the variable name:
planets_df[1:3,"type"]
If you want to select all elements of the variable diameter, for example, both of these will do the trick:
planets_df[,3]
planets_df[,"diameter"]
However, there is a short-cut. If your columns have names, you can use the $ sign:
planets_df$diameter
You should see the subset()
function as a short-cut to do exactly the same as what you did in the previous exercises.
subset(my_df, subset = some_condition)
The first argument of subset()
specifies the data set for which you want a subset. By adding the second argument, you give R the necessary information and conditions to select the correct subset.
The code below will give the exact same result as you got in the previous exercise, but this time, you didn't need the rings_vector
!
subset(planets_df, subset = rings)
# Select planets with diameter < 1
subset(planets_df, subset = diameter <1)
If you type rings_vector
in the console, you get:
[1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
use rings_vector to select the data for the four planets with rings.
planets_df[rings_vector, ]
order()
is a function that gives you the ranked position of each element when it is applied on a variable, such as a vector for example:
> a <- c(100, 10, 1000)
> order(a)
[1] 2 1 3
10, which is the second element in a
, is the smallest element, so 2 comes first in the output of order(a)
. 100, which is the first element in a
is the second smallest element, so 1 comes second in the output of order(a)
.
This means we can use the output of order(a)
to reshuffle a
:
> a[order(a)]
[1] 10 100 1000
You would like to rearrange your data frame such that it starts with the smallest planet and ends with the largest one. A sort on the diameter column.
# Use order() to create positions
positions <- order(planets_df$diameter)
# Use positions to sort planets_df
planets_df[positions,]
List
- Vectors (one dimensional array): can hold numeric, character or logical values. The elements in a vector all have the same data type.
- Matrices (two dimensional array): can hold numeric, character or logical values. The elements in a matrix all have the same data type.
- Data frames (two-dimensional objects): can hold numeric, character or logical values. Within a column all elements have the same data type, but different columns can be of different data type.
A list in R allows you to gather a variety of objects under one name (that is, the name of the list) in an ordered way. These objects can be matrices, vectors, data frames, even other lists, etc. It is not even required that these objects are related to each other in any way.
You could say that a list is some kind super data type: you can store practically any piece of information in it!
Selecting elements from a list
Creating a named list
Well done, you're on a roll!
Just like on your to-do list, you want to avoid not knowing or remembering what the components of your list stand for. That is why you should give names to them:
my_list <- list(name1 = your_comp1, name2 = your_comp2)
This creates a list with components that are named name1, name2, and so on. If you want to name your lists after you've created them, you can use the names() function as you did with vectors. The following commands are fully equivalent to the assignment above:
my_list <- list(your_comp1, your_comp2) names(my_list) <- c("name1", "name2")
One way to select a component is using the numbered position of that component. For example, to "grab" the first component of shining_list you type
shining_list[[1]]
A quick way to check this out is typing it in the console. Important to remember: to select elements from vectors, you use single square brackets:[ ]
. Don't mix them up!
You can also refer to the names of the components, with [[ ]]
or with the $
sign. Both will select the data frame representing the reviews:
shining_list[["reviews"]] #注意双括号
`shining_list$reviews
Besides selecting components, you often need to select specific elements out of these components. For example, with shining_list[[2]][1]
you select from the second component, actors (shining_list[[2]]
), the first element ([1]
). When you type this in the console, you will see the answer is Jack Nicholson
.