数据框是矩阵的一般化:
1 数据框的不同列可以是不同类型
2 同一列的类型相同
数据框将是你最常用的数据结构。
> patientID <- c(1, 2, 3, 4)
> age <- c(25, 34, 28, 52)
> diabetes <- c("Type1", "Type2", "Type1", "Type1")
> status <- c("Poor", "Improved", "Excellent", "Poor")
> patientdata <- data.frame(patientID, age, diabetes, status)
> patientdata
patientID age diabetes status
1 1 25 Type1 Poor
2 2 34 Type2 Improved
3 3 28 Type1 Excellent
4 4 52 Type1 Poor
> tradeHistory[1,]
id name buyDate saleDate val buyPrice salePrice profit totalProfit
1 793 华闻传媒 20150921 20150922 100 10.28 10.55 15.94 15.94
访问一列
> tradeHistory[,5]
[1] 100 100 200 100 100 100 100 100 100 200 100 200 100 100 100 100 100 100 300
> patientdata[1:2]
patientID age
1 1 25
2 2 34
3 3 28
4 4 52
使用列名访问
> patientdata[c("age","status")]
age status
1 25 Poor
2 34 Improved
3 28 Excellent
4 52 Poor
使用变量名称$访问
> patientdata$diabetes
[1] Type1 Type2 Type1 Type1
Levels: Type1 Type2
使用双下标索引访问
> patientdata[3,4]
[1] Excellent
Levels: Excellent Improved Poor
#创建一个日期对象,时间跨度由seq序列的内容来定
dts <- as.Date("20050101", '%Y%m%d') + seq(0,1000,15)
#创建数据框对象,包括[Dates Gas]两列
A <- data.frame( Dates = dts, Gas = 4000 + cumsum(abs( rnorm(length(dts), 100, 30))))
#给A增加3列[Year,DayOfYear,GasDiff]
A <- transform( A,
Year = format(Dates, '%Y'),
DayOfYear = as.numeric( format(Dates, '%j')),
GasDiff = c(diff( Gas ),NA))
head(A)
tail(A)
> head(A)
Dates Gas Year DayOfYear GasDiff
1 2005-01-01 4080.554 2005 1 99.97783
2 2005-01-16 4180.532 2005 16 85.28999
3 2005-01-31 4265.822 2005 31 56.98435
4 2005-02-15 4322.807 2005 46 102.43868
5 2005-03-02 4425.245 2005 61 118.03613
6 2005-03-17 4543.281 2005 76 102.31251
> tail(A)
Dates Gas Year DayOfYear GasDiff
62 2007-07-05 10498.67 2007 186 84.40763
63 2007-07-20 10583.08 2007 201 59.70456
64 2007-08-04 10642.78 2007 216 116.91161
65 2007-08-19 10759.69 2007 231 142.73284
66 2007-09-03 10902.43 2007 246 133.92609
67 2007-09-18 11036.35 2007 261 NA