转载自https://blog.csdn.net/hsdcc217/article/details/78510087
在R语言中,因子(factor)表示的是一个编号或者一个等级,即,一个点。例如,人的个数可以是1,2,3,4……那么因子就包括,1,2,3,4…..还有描述协变量水平时,会用到高、中、低,也是因子,因为这些都是一个点。与之区别的向量,是一个连续性的值,例如,数值中有1,1.1,1.2……可以作为数值来计算,而因子则不可以。简单通俗来讲:因子是一个点,向量是一个有方向的范围。在R中,如果把数字作为因子,那么在导入数据之后,需要将向量转换为因子(factor),而因子在整个计算过程中不再作为数值,而是一个”符号”而已。
以实例进行解释和说明
data <- c(1,2,2,3,1,2,3,3,1,2,3,3,1)
> data
[1] 1 2 2 3 1 2 3 3 1 2 3 3 1
> fdata <- factor(data)
> fdata
[1] 1 2 2 3 1 2 3 3 1 2 3 3 1
Levels: 1 2 3
> class(fdata)
[1] "factor"
> class(data)
[1] "numeric"
#factor()函数将原来的数值型的向量转化为了factor类型。factor类型的向量中有Levels的概念。Levels就是factor中的所有元素的集合(没有重复)。我们可以发现Levels就是factor中元素排除重复后且字符化的结果。因为Levels的元素都是character。
> levels(fdata)
[1] "1" "2" "3"
#我们可以在factor生成时,通过labels向量来指定levels,继续上面的程序:
> rdata <- factor(data,labels=c("I","II","III"))
> rdata
[1] I II II III I II III III I II III III I
Levels: I II III
> rdata <- factor(data,labels=c("e","ee","eee"))
> rdata
[1] e ee ee eee e ee eee eee e ee eee eee e
Levels: e ee eee
#factors可以指定数据的顺序
> mons <- c("March","April","January","November","January", "September","October","September","November","August", "January","November","November","February","May","August", "July","December","August","August","September","November", "February","April")
> mons <- factor(mons)
> mons
[1] March April January November January
[6] September October September November August
[11] January November November February May
[16] August July December August August
[21] September November February April
11 Levels: April August December February ... September
> table(mons)
mons
April August December February January
2 4 1 2 3
July March May November October
1 1 1 5 1
September
3
#显然月份是有顺序的,我们可以为factor指定顺序
mons = factor(mons,levels=c("January","February","March","April","May","June","July","August","September","October","November","December"),ordered=TRUE)
> table(mons)
mons
January February March April May
3 2 1 2 1
June July August September October
0 1 4 3 1
November December
5 1