R中的基本操作01

重复一下谢益辉老师R语言忍者秘笈的基本操作

Chard Liu

2019年1月8日

#Sys.setlocale(‘LC_ALL’,‘C’) readlines读取文本文件,返回字符型向量

# R软件的许可证文件(GPL)
gpl = readLines(file.path(R.home(), "COPYING"))
head(gpl)  # GPL前几行
## [1] "\t\t    GNU GENERAL PUBLIC LICENSE"                                             
## [2] "\t\t       Version 2, June 1991"                                                
## [3] ""                                                                               
## [4] " Copyright (C) 1989, 1991 Free Software Foundation, Inc."                       
## [5] "                       51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA"
## [6] " Everyone is permitted to copy and distribute verbatim copies"
xie = readLines("https://yihui.name")  # 我的主页
head(xie)  # HTML代码
## [1] ""                                                             
## [2] ""                                                       
## [3] "  "                                                                    
## [4] "\t"                       
## [5] "    "                                                
## [6] "    "
nchar(gpl[1:10])  # GPL前10行分别有多少字符
##  [1] 32 29  0 56 79 61 58  0 15  0
sum(nchar(gpl))
## [1] 17671
#strsplit返回的结果是列表,每个元素是向量
strsplit(gpl[4:5], " ")  # 拆分第4、5两行
## [[1]]
## [1] ""            "Copyright"   "(C)"         "1989,"       "1991"       
## [6] "Free"        "Software"    "Foundation," "Inc."       
## 
## [[2]]
##  [1] ""           ""           ""           ""           ""          
##  [6] ""           ""           ""           ""           ""          
## [11] ""           ""           ""           ""           ""          
## [16] ""           ""           ""           ""           ""          
## [21] ""           ""           ""           "51"         "Franklin"  
## [26] "St,"        "Fifth"      "Floor,"     "Boston,"    "MA"        
## [31] ""           "02110-1301" ""           "USA"
#用空格做分隔符并不严格,标点符号也是单词之间的分隔符
#于是需要用到正则表达式
#正在正则表达式中,单词之间的分隔符可以统一被表达为\\W(反斜杠引导大写字母W),这个特殊表##达式可以匹配任意非单词的字符,这样就能得到只剩下单词的
words = unlist(strsplit(gpl, "\\W"))#unlist去除列表形式,变成字符
words = words[words != ""]  # 去掉空字符
# 频数最大的10个单词
tail(sort(table(tolower(words))), 10)#tolower小写转换,sort升序,tail保留末位
## 
##    this      is       a program     and     you      or      of      to 
##      49      53      57      71      72      76      77     104     108 
##     the 
##     194
#拆的另一种方式根据位置拆解substr与substring
xie[8]
## [1] "    Yihui Xie | 璋㈢泭杈\x89"
substr(xie[8], 12, 20)#拆出12-20的位置
## [1] "Yihui Xie"
#学会拼接
paste(1:3,"a")
## [1] "1 a" "2 a" "3 a"
paste(1:3,"a",sep = "-")#分别与a以-相连,向量的横向拼接
## [1] "1-a" "2-a" "3-a"
paste(letters[1:10], collapse = "~")#collapase将向量内部的每个元素一起连接
## [1] "a~b~c~d~e~f~g~h~i~j"
paste(1:3, "a", sep = "-", collapse = "+")#经典案例
## [1] "1-a+2-a+3-a"
#sep返回的仍是向量,而collapse把字符向量“坍缩”为一个字符串
来一个调皮的,模仿大佬们的操作

love = function() cat("I love you\n")#\n换行 function括号中表示的是输入参数,此处即空格
say = function(person) {
    love()
    love()
    cat(paste("I love you dear", person, "\n"))
    love()
}
say("somebody")  # 对somebody唱一嗓子吧
## I love you
## I love you
## I love you dear somebody 
## I love you

你可能感兴趣的:(R中的基本操作01)