1. quit console
> q() or
>quit()
2. help
> help(solve) or
> ?solve
> help.start()
3. help.search() --search for help
>help.search(solve) or
> ??solve
4. show examples of functions or topic
> example(topic) ---example(solve)
5. batch run r code, that is, running r by code of file
> source("commands.R")
6. redirect the output to files
> sink("record.lis") --redicrct the console output to the file named "record.lis"
> sink() --restore the output to the console
7. display the objects currently stored in R
> objects() or
>ls()
8. remove objects currrently stored in R
> rm(x, y, z, ink, junk, temp, foo, bar)
9.vectors assignment or creation ,
> x <- c(10.4, 5.6, 3.1, 6.4, 21.7) or --c() is a function to generate a vector
> assign("x", c(10.4, 5.6, 3.1, 6.4, 21.7)) or
> c(10.4, 5.6, 3.1, 6.4, 21.7) -> x
--manipulatation of variables
> 1/x
> y <- c(x, 0, x)
> v <- 2*x + y + 1
10. basic function
x is vectore
>min(x)
>max(x)
>range(x)
>length(x)
log
exp
sin
cos
tan
sqrt
sum
prod ---production of all elements
var ---sample variance = sum((x-mean(x))^2) / (length(x)-1)
---If the argument to var() is an n-by-p matrix the value is a p-by-p sample covariance matrix got by regarding the rows as independent p-variate sample vectors.
sort ---sort the vectore in increasing order
pmax --parallel maximum and minimum max and min, operating on several vectors
pmin
sqrt ---can compute the sqare root of complex number
11. generating sequences
>1:30
>n<- 10
>1:n-1
>1:(n-1)
12. generate sequences by seq() function
> seq(-5, 5, by=.2) -> s3
> s4 <- seq(length=51, from=-5, by=.2)
parameters: seq(from, to, by, length, along)
13. replication function
> s5 <- rep(x, times=5) ---replication of sequences of numbers
> s6 <- rep(x, each=5) ---repiication of each numbers belonging to the sequences of numbers
14. logical vectors
> temp <- x > 13
or ---- |
and -----&
15. missing variables
NA
>ind <- is.na(z) ----- is.na(var)
-- one cannot use x==NA as logical expression, since NA is not a real number, one must use is.na() function to test the NA
NA --Not Available
> z <- c(1:3,NA);
NaN----Not a Number
>is.nan(XX)
> 0/0
> Inf - Inf
In summary, is.na(xx) is TRUE both for NA and NaN values. To differentiate these, is.nan(xx) is only TRUE for NaNs. Missing values are sometimes printed as
16. paste() --concatenating strings character by character
The paste() function takes an arbitrary number of arguments and concatenates them one by one into character strings
> labs <- paste(c("X","Y"), 1:10, sep="")
("X1", "Y2", "X3", "Y4", "X5", "Y6", "X7", "Y8", "X9", "Y10")
17. indexing vectores
--logical vector
> y <- x[!is.na(x)] --listing the non-na elements of x into y
--- x[c(1,0,1,0,0,1)]
> (x+1)[(!is.na(x)) & x>0] -> z
--vector indexed by positive integers
> x[1:10]
> c("x","y")[rep(c(1,2,2,1), times=4)]
---vector indexed by negative integers,which exclude the corresponding elements
> y <- x[-(1:5)]
--vector indexed by names of component, name each position
> fruit <- c(5, 10, 1, 20)
> names(fruit) <- c("orange", "banana", "apple", "peach")
> lunch <- fruit[c("apple","orange")]
--application
> x[is.na(x)] <- 0 --missing treatment
> y[y < 0] <- -y[y < 0] ---absolute each elements
> y <- abs(y)
20. variable type transfomration: char - integer, integer - char
> z <- 0:9
> digits <- as.character(z)
> d <- as.integer(digits)
21. truncate the size of vectore
> alpha <- alpha[2 * 1:5]
> length(alpha) <- 3
22. attributes() function
> attr(z, "dim") <- c(10,10) ---treat z as if it is 10-by-10 matrix
23. uclass(obj): remove the temparory effect of class
> winter
> unclass(winter)
24. factor() function: classification / grouping of elements:
levels(): find the distinct elements of the vectors, by combination with factor function
> state <-c("tas", "sa", "qld", "nsw", "nsw", "nt", "wa", "wa",
"qld", "vic", "nsw", "vic", "qld", "qld", "sa", "tas",
"sa", "nt", "wa", "vic", "qld", "nsw", "nsw", "wa",
"sa", "act", "nsw", "vic", "vic", "act")
> statef <- factor(state)
> statef
> levels(statef)
25. tapply(): to apply a function to each group of compoment defined by second arguments, for the data info given by the first argv.
> incomes <- c(60, 49, 40, 61, 64, 60, 59, 54, 62, 69, 70, 42, 56,
61, 61, 61, 58, 51, 48, 65, 49, 49, 41, 48, 52, 46,
59, 46, 58, 43)
> incmeans <- tapply(incomes, statef, mean)
--function building
> stderr <- function(x) sqrt(var(x)/length(x)) ---one can not use: function(x) ...->stderr, not work
> incster <- tapply(incomes, statef, stderr)
> incster
26. array from vector: treating vector as it is an array
assuming z has 1500 elements, only this case, we can use the following statement
>dim(z) <- c(3,5,100)
27. array define:
> x <- array(1:20, dim=c(4,5)) # Generate a 4 by 5 array., syntax: array(values, array_format)
> x
> i <- array(c(1:3,3:1), dim=c(3,2))
> i # i is a 3 by 2 index array.
> x[i] # Extract those elements
> x[i] <- 0 # Replace those elements by zeros.
> x
28. matrix ------!!!!!!!!!!!!!!!!!!!!!!!!!
> Xb <- matrix(0, n, b)
> Xv <- matrix(0, n, v)
> ib <- cbind(1:n, blocks)
> iv <- cbind(1:n, varieties)
> Xb[ib] <- 1
> Xv[iv] <- 1
> X <- cbind(Xb, Xv)
> N <- crossprod(Xb, Xv)
> N <- table(blocks, varieties)
29. array function
> Z <- array(data_vector, dim_vector)
> Z <- array(h, dim=c(3,4,2))
> Z <- h ; dim(Z) <- c(3,4,2)
> Z <- array(0, c(3,4,2))
> D <- 2*A*B + C + 1
30. outer product -----An important operation on arrays is the outer product. If a and b are two numeric arrays, their outer product is an array whose dimension vector is obtained by concatenating their two dimension vectors (order is important), and whose data vector is got by forming all possible products of elements of the data vector of a with those of b.
-- different from formal outer product.
> ab <- a %o% b or
> ab <- outer(a, b, "*")
---formatted outer product
> f <- function(x, y) cos(y)/(1 + x^2)
> z <- outer(x, y, f)
> d <- outer(0:9, 0:9)
> fr <- table(outer(d, d, "-"))
> plot(as.numeric(names(fr)), fr, type="h",xlab="Determinant", ylab="Frequency")
31. generalized transopose of array --!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>perm(c(1,2,..k)) ---permutation of a list of numbers
> B <- aperm(A, c(2,1)) ---aperm: permuation of array
>t(A) --transpose of an array
>nrow(A) --number of rows
>ncol(A) -- number of cols
32. matrix muliplication
--product element by the corresponding element, supposing same size of a and b
> A * B
--matrix product
> A %*% B
> x %*% A %*% x -quardratic form
>crossprod(x,y) --- === t(x) %*%y
>diag(A) --gives the vector for diagonal of array A
Also, somewhat confusingly, if k is a single numeric value then diag(k) is the k by k identity matrix!
33. linear equation & inversion
> b <- A %*% x
> solve(A,b) ---solve linear equation A*x = b
>solve(A) ---inverse of array A
--using solve(A,b) is better than solve(A)*b
34. eigen values and eigen vectors
> ev <- eigen(Sm)
> evals <- eigen(Sm)$values
> eigen(Sm)$vectors
> eigen(Sm)
> evals <- eigen(Sm, only.values = TRUE)$values
35. singluar value decomposition (SVM) !!!!!!!!!!!!!!!!!!need to be detailed
>svd(M)
--such that M=U %*% D %*% t(V)
---if M is square, the following can be used to compute the determinant
> absdetM <- prod(svd(M)$d)
> absdet <- function(M) prod(svd(M)$d)
36. least square fitting -- lsfit(x,y)
> ans <- lsfit(X, y)
37. qr decomposition --------!!!!!!!!!!!!!!!!!need to be detailed
> Xplus <- qr(X)
> b <- qr.coef(Xplus, y)
> fit <- qr.fitted(Xplus, y)
> res <- qr.resid(Xplus, y)
38. binding matrices horizontally, vertically: rbind(), cbind()
> X <- cbind(arg_1, arg_2, arg_3, ...)
> X <- cbind(1, X1, X2)
39. transform array or other format to vector
> vec <- as.vector(X)
> vec <- c(X)
40. table
> statefr <- table(statef)
> statefr <- tapply(statef, statef, length)
> factor(cut(incomes, breaks = 35+10*(0:7))) -> incomef
> table(incomef,statef)
41. list
An R list is an object consisting of an ordered collection of objects known as its components.
> Lst <- list(name="Fred", wife="Mary", no.children=3,child.ages=c(4,7,9))
Lst[1], Lst[2] represent each part of list: name, wife ,...
length(Lst) gives the number of (top level) components it has
> name$component_name
Lst$name is the same as Lst[[1]] and is the string "Fred",
Lst$wife is the same as Lst[[2]] and is the string "Mary",
Lst$child.ages[1] is the same as Lst[[4]][1] and is the number 4.
> x <- "name"; Lst[[x]]
> Lst <- list(name_1=object_1, ..., name_m=object_m)
> Lst[5] <- list(matrix=Mat)
42. concatenating list
> list.ABC <- c(list.A, list.B, list.C)
43. data frame
A data frame is a list with class "data.frame".
> accountants <- data.frame(home=statef, loot=incomes, shot=incomef)
44. attach() & detach : make the naming easily
make the components of a list or data frame temporarily visible as variables under their component name, without the need to quote the list name explicitly each time.
> attach(lentils)
> u <- v+w
> lentils$u <- v+w
> detach()
> attach(any.old.list)
> search() ---show the search path, compare the diff of search() after attach () and detach()
> ls(2)
> detach("lentils")
> search()
45. read.table(): read from file
> HousePrice <- read.table("houses.data")
> HousePrice <- read.table("houses.data", header=TRUE)
46. scan() function to read from file
> inp <- scan("input.dat", list("",0,0)) ---second argv give the input format
> label <- inp[[1]]; x <- inp[[2]]; y <- inp[[3]]
> inp <- scan("input.dat", list(id="", x=0, y=0)) ---second argv give both the input format and column name
> label <- inp$id; x <- inp$x; y <- inp$y
> X <- matrix(scan("light.dat", 0), ncol=5, byrow=TRUE)
47. accessing built-in datasets
>data() --show lists of datasets currently available to use
--loading data from r packages
data(package="rpart")
data(Puromycin, package="datasets")
48. edit data: edit()
> xnew <- edit(xold)
> xnew <- edit(data.frame())
49 . probabiliut distribution
One convenient use of R is to provide a comprehensive set of statistical tables. Functions are provided to evaluate the cumulative distribution function P(X <= x), the probability density function and the quantile function (given q, the smallest x such that P(X <= x) > q), and to simulate from the distribution.
Prefix the name given here by ‘d’ for the density, ‘p’ for the CDF, ‘q’ for the quantile function and ‘r’ for simulation (random deviates).
> ## 2-tailed p-value for t distribution
> 2*pt(-2.43, df = 13)
> ## upper 1% point for an F(2, 7) distribution
> qf(0.01, 2, 7, lower.tail = FALSE)
50. examing distribution of variables
> attach(faithful)
> summary(eruptions)
> fivenum(eruptions)
> stem(eruptions)
> hist(eruptions)
> hist(eruptions, seq(1.6, 5.2, 0.2), prob=TRUE)
> lines(density(eruptions, bw=0.1))
> rug(eruptions) # show the actual data points
#########################skipping this part###################
> plot(ecdf(eruptions), do.points=FALSE, verticals=TRUE)
> long <- eruptions[eruptions > 3]
> plot(ecdf(long), do.points=FALSE, verticals=TRUE)
> x <- seq(3, 5.4, 0.01)
> lines(x, pnorm(x, mean=mean(long), sd=sqrt(var(long))), lty=3)
> shapiro.test(long)
> ks.test(long, "pnorm", mean = mean(long), sd = sqrt(var(long)))
> t.test(A, B)
> var.test(A, B)
> t.test(A, B, var.equal=TRUE)
> wilcox.test(A, B)
> plot(ecdf(A), do.points=FALSE, verticals=TRUE, xlim=range(A, B))
> plot(ecdf(B), do.points=FALSE, verticals=TRUE, add=TRUE)
> ks.test(A, B)
60. grouping, looping and conditional format
> if (expr_1) expr_2 else expr_3
> for (name in expr_1) expr_2
for (....) {
....
}
## other loops
> repeat expr
> while (condition) expr
61. writing functions
> name <- function(arg_1, arg_2, ...) expression
> twosam <- function(y1, y2) {
n1 <- length(y1); n2 <- length(y2)
yb1 <- mean(y1); yb2 <- mean(y2)
s1 <- var(y1); s2 <- var(y2)
s <- ((n1-1)*s1 + (n2-1)*s2)/(n1+n2-2)
tst <- (yb1 - yb2)/sqrt(s*(1/n1 + 1/n2))
tst
}
> tstat <- twosam(data$male, data$female); tstat
> bslash <- function(X, y) {
> regcoeff <- bslash(Xmat, yvar)
62. define new operator
> "%!%" <- function(X, y) { ... }
>X %!% y ---similar as define function
63. named funciton: (can be ignored similary with C++)
> fun1 <- function(data, data.frame, graph, limit) {
> ans <- fun1(d, df, TRUE, 20)
> ans <- fun1(d, df, graph=TRUE, limit=20)
> ans <- fun1(data=d, limit=20, graph=TRUE, data.frame=df)
> fun1 <- function(data, data.frame, graph=TRUE, limit=20) { ... }
> ans <- fun1(d, df)
> ans <- fun1(d, df, limit=10)
> bdeff <- function(blocks, varieties) {
> temp <- X
> dimnames(temp) <- list(rep("", nrow(X)), rep("", ncol(X)))
> temp; rm(temp)
> no.dimnames(X)
64. scope
S> cube(2)
S> n <- 3
S> cube(2)
R> cube(2)
if(amount > total)
65. Customization
If that variable is unset, the file Rprofile.site in the R home subdirectory etc is used. This file should contain the commands that you want to execute every time R is started under your system. A second, personal, profile file named .Rprofile24 can be placed in any directory.
> .First <- function() {
options(prompt="$ ", continue="+\t") # $ is the prompt
options(digits=5, length=999) # custom numbers and printout
x11() # for graphics
par(pch = "+") # plotting character
source(file.path(Sys.getenv("HOME"), "R", "mystuff.R"))
# my personal functions
library(MASS) # attach a package
}
> .Last <- function() {
graphics.off() # a small safety measure.
cat(paste(date(),"\nAdios\n")) # Is it time for lunch?
}
> .First <- function() {
> .Last <- function() {
> methods(class="data.frame")
> methods(plot)
> coef
> methods(coef)
> getAnywhere("coef.aov")
> getS3method("coef", "aov")
> fitted.model <- lm(formula, data = data.frame)
> fm2 <- lm(y ~ x1 + x2, data = production)
> fm <- aov(yield ~ v + n*p*k + Error(farms/blocks), data=farm.data)
> anova(fitted.model.1, fitted.model.2, ...)
> new.model <- update(old.model, new.formula)
> fm05 <- lm(y ~ x1 + x2 + x3 + x4 + x5, data = production)
> fm6 <- update(fm05, . ~ . + x6)
> smf6 <- update(fm6, sqrt(.) ~ .)
> fmfull <- lm(y ~ . , data = production)
> fitted.model <- glm(formula, family=family.generator, data=data.frame)
> fm <- glm(y ~ x1 + x2, family = gaussian, data = sales)
> fm <- lm(y ~ x1+x2, data=sales)
> kalythos <- data.frame(x = c(20,35,45,55,70), n = rep(50,5),
> kalythos$Ymat <- cbind(kalythos$y, kalythos$n - kalythos$y)
> fmp <- glm(Ymat ~ x, family = binomial(link=probit), data = kalythos)
> fml <- glm(Ymat ~ x, family = binomial, data = kalythos)
> summary(fmp)
> summary(fml)
> ld50 <- function(b) -b[1]/b[2]
> ldp <- ld50(coef(fmp)); ldl <- ld50(coef(fml)); c(ldp, ldl)
> fmod <- glm(y ~ A + B + x, family = poisson(link=sqrt),
> nlfit <- glm(y ~ x1 + x2 - 1,
> x <- c(0.02, 0.02, 0.06, 0.06, 0.11, 0.11, 0.22, 0.22, 0.56, 0.56,
> y <- c(76, 47, 97, 107, 123, 139, 159, 152, 191, 201, 207, 200)
> fn <- function(p) sum((y - (p[1] * x)/(p[2] + x))^2)
> plot(x, y)
> xfit <- seq(.02, 1.1, .05)
> yfit <- 200 * xfit/(0.1 + xfit)
> lines(spline(xfit, yfit))
> out <- nlm(fn, p = c(200, 0.1), hessian = TRUE)
> sqrt(diag(2*out$minimum/(length(y) - 2) * solve(out$hessian)))
> plot(x, y)
> xfit <- seq(.02, 1.1, .05)
> yfit <- 212.68384222 * xfit/(0.06412146 + xfit)
> lines(spline(xfit, yfit))
> df <- data.frame(x=x, y=y)
> fit <- nls(y ~ SSmicmen(x, Vm, K), df)
> fit
> summary(fit)
Estimate Std. Error t value Pr(>|t|)
> x <- c(1.6907, 1.7242, 1.7552, 1.7842, 1.8113,
> y <- c( 6, 13, 18, 28, 52, 53, 61, 60)
> n <- c(59, 60, 62, 56, 63, 59, 62, 60)
> fn <- function(p)
> out <- nlm(fn, p = c(-50,20), hessian = TRUE)
> sqrt(diag(solve(out$hessian)))
> pairs(X)
> coplot(a ~ b | c)
> coplot(a ~ b | c + d)
> plot(x, y, type="n"); text(x, y, names)
> text(x, y, expression(paste(bgroup("(", atop(n, x), ")"), p^x, q^{n-x})))
> help(plotmath)
> example(plotmath)
> demo(plotmath)
> help(Hershey)
> demo(Hershey)
> help(Japanese)
> demo(Japanese)
> text(locator(1), "Outlier", adj=0)
> plot(x, y)
> identify(x, y)
> oldpar <- par(col=4, lty=2)
> par(oldpar)
> oldpar <- par(no.readonly=TRUE)
> par(oldpar)
> plot(x, y, pch="+")
> legend(locator(1), as.character(0:25), pch = 0:25)
> postscript()
> dev.off()
> postscript("file.ps", horizontal=FALSE, height=5, pointsize=10)
> postscript("plot1.eps", horizontal=FALSE, onefile=FALSE,
> library()
> library(boot)
> search()
> loadedNamespaces()
> help.start()
w <- ifelse(Mod(w) > 1, 1/w, w)
R [options] [
Note that input and output can be redirected in the usual way (using 鈥?鈥?and 鈥?鈥?, but the line length limit of 4095 bytes still applies. Warning and error messages are sent to the error channel (stderr).
q(status=
Many of these use either Control or Meta characters. Control characters, such as Control-m, are obtained by holding the
The R program keeps a history of the command lines you type, including the erroneous lines, and commands in your history may be recalled, changed if necessary, and re-submitted as new commands. In Emacs-style command-line editing any straight typing you do while in this editing phase causes the characters to be inserted in the command you are editing, displacing any characters to the right of the cursor. In vi mode character insertion mode is started by M-i or M-a, characters are typed and insertion mode is finished by typing a further
Pressing the
The final
>: Logical vectors
>=: Logical vectors