START
为什么要使用R?
多数商业统计软件价格不菲,投入成千上万美元都是可能的。而R是免费的!
R拥有顶尖水准的制图功能。如果希望复杂数据可视化,那么R拥有最全面且最强大的一
系列可用功能。R是一个可进行交互式数据分析和探索的强大平台。
R是一个无与伦比的平台,在其上可使用一种简单而直接的方式编写新的统计方法。它易
于扩展,并为快速编程实现新方法提供了一套十分自然的语言。R可运行于多种平台之上,包括Windows、UNIX和Mac OS X。这基本上意味着它可以运
行于你所能拥有的任何计算机上。
下面展示R制图功能的一个示例
R的获取与安装
R可以在CRAN(Comprehensive R Archive Network)http://cran.r-project.org 上免费下载。Linux、
Mac OS X和Windows都有相应编译好的二进制版本。根据你所选择平台的安装说明进行安装即
可。稍
数据结构
向量
向量适用于存储数值型、字符型或逻辑型数据的一维数组
a <- c(1,2,3,4)
b <- c("one","two","three")
c <- c(TRUE,FALSE)
cat(a,b,c)
输出结果为
1 2 3 4 one two three TRUE FALSE
这里,a是数值型向量,b是字符型向量,而c是逻辑型向量。①注意,单个向量中的数据必须拥有相同的类型或模式(数值型、字符型或逻辑型)。同一向量中无法混杂不同模式的数据。
矩阵
矩阵是一个二维数组,只是每个元素都拥有相同的模式(数值型、字符型或逻辑性),可通过函数matrix创建矩阵
查看帮助
> help("matrix")
得到关于 matrix
的 R Documentation
,看下说明
matrix {base} R Documentation
Matrices
Description
matrix creates a matrix from the given set of values.
as.matrix attempts to turn its argument into a matrix.
is.matrix tests if its argument is a (strict) matrix.
Usage
matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE,
dimnames = NULL)
as.matrix(x, ...)
## S3 method for class 'data.frame'
as.matrix(x, rownames.force = NA, ...)
is.matrix(x)
Arguments
data
an optional data vector (including a list or expression vector). Non-atomic classed R objects are coerced by as.vector and all attributes discarded.
nrow
the desired number of rows.
ncol
the desired number of columns.
byrow
logical. If FALSE (the default) the matrix is filled by columns, otherwise the matrix is filled by rows.
dimnames
A dimnames attribute for the matrix: NULL or a list of length 2 giving the row and column names respectively. An empty list is treated as NULL, and a list of length one as row names. The list can be named, and the list names will be used as names for the dimensions.
x
an R object.
...
additional arguments to be passed to or from methods.
rownames.force
logical indicating if the resulting matrix should have character (rather than NULL) rownames. The default, NA, uses NULL rownames if the data frame has ‘automatic’ row.names or for a zero-row data frame.
Details
If one of nrow or ncol is not given, an attempt is made to infer it from the length of data and the other parameter. If neither is given, a one-column matrix is returned.
If there are too few elements in data to fill the matrix, then the elements in data are recycled. If data has length zero, NA of an appropriate type is used for atomic vectors (0 for raw vectors) and NULL for lists.
is.matrix returns TRUE if x is a vector and has a "dim" attribute of length 2 and FALSE otherwise. Note that a data.frame is not a matrix by this test. The function is generic: you can write methods to handle specific classes of objects, see InternalMethods.
as.matrix is a generic function. The method for data frames will return a character matrix if there is only atomic columns and any non-(numeric/logical/complex) column, applying as.vector to factors and format to other non-character columns. Otherwise, the usual coercion hierarchy (logical < integer < double < complex) will be used, e.g., all-logical data frames will be coerced to a logical matrix, mixed logical-integer will give a integer matrix, etc.
The default method for as.matrix calls as.vector(x), and hence e.g. coerces factors to character vectors.
When coercing a vector, it produces a one-column matrix, and promotes the names (if any) of the vector to the rownames of the matrix.
is.matrix is a primitive function.
The print method for a matrix gives a rectangular layout with dimnames or indices. For a list matrix, the entries of length not one are printed in the form integer,7 indicating the type and length.
Note
If you just want to convert a vector to a matrix, something like
dim(x) <- c(nx, ny)
dimnames(x) <- list(row_names, col_names)
will avoid duplicating x.
References
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
See Also
data.matrix, which attempts to convert to a numeric matrix.
A matrix is the special case of a two-dimensional array.
Examples
is.matrix(as.matrix(1:10))
!is.matrix(warpbreaks) # data.frame, NOT matrix!
warpbreaks[1:10,]
as.matrix(warpbreaks[1:10,]) # using as.matrix.data.frame(.) method
## Example of setting row and column names
mdat <- matrix(c(1,2,3, 11,12,13), nrow = 2, ncol = 3, byrow = TRUE,
dimnames = list(c("row1", "row2"),
c("C.1", "C.2", "C.3")))
mdat
使用示例
> mdat <- matrix(c(1,2,3, 11,12,13), nrow = 2, ncol = 3, byrow = TRUE,
+ dimnames = list(c("row1", "row2"),
+ c("C.1", "C.2", "C.3")))
> mdat
C.1 C.2 C.3
row1 1 2 3
row2 11 12 13
> print(mdat)
C.1 C.2 C.3
row1 1 2 3
row2 11 12 13
数组
数据与矩阵类似,但是维度可以大于2,通过array函数创建
> array(1:3, c(2,4,3))
, , 1
[,1] [,2] [,3] [,4]
[1,] 1 3 2 1
[2,] 2 1 3 2
, , 2
[,1] [,2] [,3] [,4]
[1,] 3 2 1 3
[2,] 1 3 2 1
, , 3
[,1] [,2] [,3] [,4]
[1,] 2 1 3 2
[2,] 3 2 1 3
数据帧
数据帧是表格对象,与矩阵不同,每列可以包含不同的数据类型,通过data.frame函数创建
要学会使用help,这次输出 help("data.frame")
查看用例,下面给出一个示例
> data.frame(1, 1:10, sample(LETTERS[1:3], 10, replace = TRUE))
X1 X1.10 sample.LETTERS.1.3...10..replace...TRUE.
1 1 1 C
2 1 2 A
3 1 3 C
4 1 4 A
5 1 5 C
6 1 6 C
7 1 7 A
8 1 8 A
9 1 9 A
10 1 10 B
因子
因子标签始终是字符型,无论输入向量是数值型、字符型还是逻辑性。因子将向量存储在向量中的元素的不同值作为标签
因子使用factor()
函数创建。nlevels
函数给出了级别的计数。
apple_colors <- c('green','green','yellow','red','red','red','green')
factor_apple <- factor(apple_colors)
print(factor_apple)
print(nlevels(factor_apple))
执行上述代码,会产生一下结果
[1] green green yellow red red red green
Levels: green red yellow
[1] 3
列表
列表是一些对象成分的集合,包括上面的数据结构。
> list(c(2,5,8),52.8,TRUE)
[[1]]
[1] 2 5 8
[[2]]
[1] 52.8
[[3]]
[1] TRUE
变量
对象的名称由大小写字母、数字0-9、点号和下划线组成,名称是区分大小写的,不能以数字开头 ,以字母开头,或者点后面不带数字
变量赋值
变量可以使用向左、向右和等于运算符分配值
<- <<- = 叫作左分配符
-> ->> 叫作右分配符
其它运算符
冒号运算符 :
> 1:10
[1] 1 2 3 4 5 6 7 8 9 10
成员运算符 %in%
> v1 <- 8
>
> t <- 1:10
>
> print(v1 %in% t)
[1] TRUE
转置相乘 %*%
该运算符用于将矩阵与其转置相乘
> M = matrix( c(2,6,5,1,10,4), nrow = 2,ncol = 3,byrow = TRUE)
>
> t = M %*% t(M)
>
> print(t)
[,1] [,2]
[1,] 65 82
[2,] 82 117
除此之外还需要注意的点
语句之间用分号; 或者 换行符 \n 分隔
print()或cat()函数打印变量的值
cat可以将多个变量打印输出,逗号分隔
变量查找、添加和删除
查找和添加用下标
删除变量用rm()函数
这里就不多介绍了
运算符
算术运算符、逻辑运算符、all()和any()、位运算符
算术运算符
加法 +
> x <- -1:9
> x + 1
[1] 0 1 2 3 4 5 6 7 8 9 10
> x + 1
[1] 0 1 2 3 4 5 6 7 8 9 10
> x + 2
[1] 1 2 3 4 5 6 7 8 9 10 11
> x * 2 + 3
[1] 1 3 5 7 9 11 13 15 17 19 21
减法 -
> v <- c( 2,5.5,6);
> t <- c(8, 3, 4);
> print(v-t);
[1] -6.0 2.5 2.0
乘法 *
> v <- c( 2,5.5,6);
> t <- c(8, 3, 4);
> print(v * t);
[1] 16.0 16.5 24.0
以下运算符就自己多手动实践啦!不知道的可以多help,比如help("+")都可以的
除法 /
求余 %%
求模 %/%
求指数幂 ^
逻辑运算符
大于
> v <- c(2,5.5,6,9);
> t <- c(8,2.5,14,9);
> print(v>t);
[1] FALSE TRUE FALSE FALSE
这里就不多介绍了
小于
等于 ==
小于或等于 <=
大于或等于 >=
不等于 !=
注意
& | 作用在两个向量相应元素上进行比较
&& 和 || 只作用在对象的第一个元素上
函数
函数定义
function_name <- function(arg_1, arg_2, ...) {
Function body
}
函数由不同的组件组成,它们是:
函数名称 function_name
参数 arg_1, arg_2, ...
函数体 Function body
返回值 return
内置函数
seq()
, mean()
, max()
, sum()
, paste()
用户自定义函数
new.function <- function(a) {
for(i in 1:a) {
b <- i^2
print(b)
}}
调用函数
new.function(10)
也可以调用没有参数的函数
new.function <- function() {
for(i in 10:20) {
print(i^2)
}}
new.function()
总结
为什么要学习R语言?我这里可能往大数据方向发展,有可能往人工智能方向发展,也有可能往算法方向发展。人生有太多的不确定因素,既然自己想着要学那就慢慢来学呗。充满希望的旅途胜过终点的到达。