有两种类型的变量:类别(名义型)变量和有序类别(有序型),他们在R中称为因子(factor),函数factor()以一个整数向量的形式存储类别值,整数的取值范围是[1... k ](其中k 是名义型变量中唯一值的个数),同时一个由字符串(原始值)组成的内部向量将映射到这些整数上。
举例来说,假设有向量:
diabetes <- c(“type1”,”type2”,”type1”,”type1”)
语句diabetes <- factor(diabetes)将此向量存储为(1, 2, 1, 1),并在内部将其关联为1=Type1和2=Type2(具体赋值根据字母顺序而定)。针对向量diabetes进行的任何分析都会将其作为名义型变量对待,并自动选择适合这一测量尺度的统计方法。
#创建factor
gender.vector <- c("Male", "Female", "Female", "Male", "Male")
factor.gender.vector <- factor(gender.vector)
factor.gender.vector
> factor.gender.vector
[1] Male Female Female Male Male
Levels: Female Male
hair.color.vector <- c("Blonde", "Blonde", "Brunette", "Ginger", "Grey", "Brunette")
temperature.vector <- c("High", "Low", "High", "Low", "Medium")
factor.hair.color.vector <- factor(hair.color.vector)
factor.temperature.vector <- factor(temperature.vector, order = TRUE, levels = c("Low",
"Medium", "High"))
factor.temperature.vector
factor.hair.color.vector
> factor.temperature.vector
[1] High Low High Low Medium
Levels: Low < Medium < High
> factor.hair.color.vector
[1] Blonde Blonde Brunette Ginger Grey Brunette
Levels: Blonde Brunette Ginger Grey
survey.vector <- c("M","F","F","M","M")
factor.survey.vector <- factor( survey.vector )
factor.survey.vector
levels(factor.survey.vector) <- c("Female","Male")
factor.survey.vector
> factor.survey.vector #Print to console
[1] M F F M M
Levels: F M
> factor.survey.vector
[1] Male Female Female Male Male
Levels: Female Male
> survey.vector <- c("M", "F", "F", "M", "M")
> factor.survey.vector <- factor(survey.vector)
> levels(factor.survey.vector) <- c("Female", "Male")
> factor.survey.vector
[1] Male Female Female Male Male
Levels: Female Male
> # Type your code here for survey.vector
> summary(survey.vector)
Length Class Mode
5 character character
> # Type your code here for factor.survey.vector
> summary(factor.survey.vector)
Female Male
2 3
speed.vector <- c("Fast","Slow","Slow","Fast","Ultra-fast")
factor.speed.vector <-
factor(speed.vector,order = TRUE,levels=c('Slow','Fast','Ultra-fast'))
factor.speed.vector
summary(factor.speed.vector)
> factor.speed.vector
[1] Fast Slow Slow Fast Ultra-fast
Levels: Slow < Fast < Ultra-fast
> summary(factor.speed.vector)
Slow Fast Ultra-fast
2 2 1
speed.vector <- c("Fast","Slow","Slow","Fast","Ultra-fast")
speed.factor.vector <- factor(speed.vector, ordered=TRUE,levels=c("Slow","Fast","Ultra-fast") )
speed.factor.vector
compare.them <- speed.factor.vector[2] > speed.factor.vector[5]
# Is data analyst 2 faster than data analyst 5?
compare.them
> speed.factor.vector
[1] Fast Slow Slow Fast Ultra-fast
Levels: Slow < Fast < Ultra-fast
> compare.them
[1] FALSE