参考:https://r-pkgs.org/package-structure-state.html#binary-package
Package structure and state
Package states
R包一共有5种states:
- source
- bundled
- binary
- installed
- in-memory
install.packages()
和devtools::install_github()
都是从source, bundled, binary这些states转成 installed states。而library()
则是使installed package变成in-memory
Source package
源码包就是一个目录下面有着特定的结构,就像我们自己开发R包最开始产生的那个目录结构一样,包括DESCRIPTION文件、R/目录下放着函数的.R文件等等。
如果需要查看源码包,直接找上CRAN即可(当然如果是bioconductor包,去找相应的Github就是),eg.
- forcats: https://cran.r-project.org/package=forcats
- readxl: https://cran.r-project.org/package=readxl
其中一个会给出来的链接就是:在Github上公开的
- forcats: https://github.com/tidyverse/forcats
- readxl: https://github.com/tidyverse/readxl
有的作者可能忘记添加这种URL了,不过肯定也是可以找到的。
如果有的包不是在公共平台上发布的,也可以在一些非官方的、仅可读的镜像上如 METACRAN上找到。比如:
- MASS: https://github.com/cran/MASS
- car: https://github.com/cran/car
Bundled package
Bundled package就是经过压缩打包的R包。在linux上经常就是.tar.gz格式,意味着这个state就是把很多文件打包起来(.tar)然后再gzip压缩(.gz)。这种state主要是方便传输,一般是一种中间形式。
如果要对本地开发的R包制作这种state,可以使用devtools::build()
,相当于是调用了 pkgbuild::build()
并最终 R CMD build
,更详细的信息可以参考:https://cran.r-project.org/doc/manuals/R-exts.html#Building-package-tarballs
但是实际情况是,一个bundled包并不是简单地tar打包然后gzip压缩做成的,在R里面,制作一个.tar.gz文件实际上还有更多其他操作。
举例来说,forcats_0.4.0.tar.gz下载后,终端解压:
tar xvf forcats_0.4.0.tar.gz
这样解压以后,你会发现实际上就和source package的结构相当,重要states内容比较如下:
总结来说,source package和uncompressed bundle之间主要的区别就是:
- Vignettes have been built, so rendered outputs, such as HTML, appear below
inst/doc/
and a vignette index appears in thebuild/
directory, usually alongside a PDF package manual.- A local source package might contain temporary files used to save time during development, like compilation artefacts in
src/
. These are never found in a bundle.- Any files listed in
.Rbuildignore
are not included in the bundle. These are typically files that facilitate your development process, but that should be excluded from the distributed product.
.Rbuildignore
这个就和版本控制工具如Git的.gitignore相似,这个文件决定了什么文件会进一步被用到下游形式中(如bundle),什么文件会被抛弃。
文件是用正则表达式写的,如下面这样:
^foofactors\.Rproj$
^\.Rproj\.user$
^LICENSE\.md$
^README\.Rmd$
每一个以^开头$结尾的文件都是会被后期抛弃的,这种文件只是在开发过程中起作用。为了避免正则表达式写错,最安全的排除指定文件的做法是:
usethis::use_build_ignore("notes")
总的来说:这个文件就是让你的开发更方便,你需要不停做测试和修改,然而有些中间文件并不能上传到CRAN上。
.Rbuildignore
is a way to resolve some of the tension between the practices that support your development process and CRAN’s requirements for submission and distribution
可能涉及到的中间文件包括:
- Files that help you generate package contents programmatically. Examples:
- Using
README.Rmd
to generate an informative and currentREADME.md
.- Storing
.R
scripts to create and update internal or exported data.- Files that drive package development, checking, and documentation, outside of CRAN’s purview. Examples:
- Files relating to the RStudio IDE.
- Using the pkgdown package to generate a website.
- Configuration files related to continuous integration/deployment and monitoring test coverage.
举例:tidyverse 注意这里只是为了展示,真实情况不一定
^.*\.Rproj$ # Designates the directory as an RStudio Project
^\.Rproj\.user$ # Used by RStudio for temporary files
^README\.Rmd$ # An Rmd file used to generate README.md
^LICENSE\.md$ # Full text of the license
^cran-comments\.md$ # Comments for CRAN submission
^\.travis\.yml$ # Used by Travis-CI for continuous integration testing
^data-raw$ # Code used to create data included in the package
^pkgdown$ # Resources used for the package website
^_pkgdown\.yml$ # Configuration info for the package website
^\.github$ # Contributing guidelines, CoC, issue templates, etc.
Binary package
如果需要把R包分享给其他没有R包开发经验的用户,就需要用到binary package,而且这种包的形式是平台特异的。比如Windows和macOS。如果需要制作一个二进制包,需要使用如下代码:
devtools::build(binary = TRUE)
不过一般最开始制作这种二进制包并发布的是CRAN,用户并不需要。在CRAN上提交package bundle,然后它会帮你发布二进制的包。
Installed package
安装后的包就是二进制包解压以后的package library。下图展示了包下载的一些方式,实际情况还要复杂很多:
Package libraries
查看可用的包:
# on Windows
.libPaths()
#> [1] "C:/Users/jenny/Documents/R/win-library/3.6"
#> [2] "C:/Program Files/R/R-3.6.0/library"
lapply(.libPaths(), list.dirs, recursive = FALSE, full.names = FALSE)
#> [[1]]
#> [1] "abc" "anytime" "askpass" "assertthat"
#> ...
#> [145] "zeallot"
#>
#> [[2]]
#> [1] "base" "boot" "class" "cluster"
#> [5] "codetools" "compiler" "datasets" "foreign"
#> [9] "graphics" "grDevices" "grid" "KernSmooth"
#> [13] "lattice" "MASS" "Matrix" "methods"
#> [17] "mgcv" "nlme" "nnet" "parallel"
#> [21] "rpart" "spatial" "splines" "stats"
#> [25] "stats4" "survival" "tcltk" "tools"
#> [29] "translations" "utils"
我们可以看到R的library分成了两类:
- A user library
- A system-level or global library
第一类就是用户自己后来添加的包,从CRAN、bioconductor等各处的都有。第二类是核心包,比如base,系统默认自带的。目的是方便管理,其他安装的包的添加或删除不会干扰到原来的基础包。
从path中也可以反映出,如果要对R进行升级更新,比如从3.5 到3.6(minor version),那么需要重新安装包。但是如果是R 3.6.0到3.6.1(patch release),就不需要重新安装。