go module & package & versioning & vendor 笔记

个人经验总结,无特别参考价值

Go Module

Go设计之初的GOPATH机制可以说比较失败,所以后面才有了vendor和module来修补,module解决核心的问题是version问题,也就是包管理问题。module的整个设计可以参考Russ Cox的博客Go & Versioning。普通用户只需要使用就行了,不需要考虑module背后设计哲学。

A module is a collection of related Go packages that are versioned together as a single unit.

Modules record precise dependency requirements and create reproducible builds.

而到实际工作环境中,reproducible build是包管理工具的重中之重。

go.mod

A module is defined by a tree of Go source files with a go.mod file in the tree’s root directory.

go.mod所在的directory是整个module的root directory,如果需要import该module下的package,需要把module作为fullpath引进过来,

// Once the go.mod file exists, no additional steps are required:
// go commands like 'go build', 'go test', or even 'go list' will automatically
// add new dependencies as needed to satisfy imports.

go.mod的另外一个好处就是降低了go get的开销,go get有两个功能,一是download packages,一是resolve dependencies。如果没有go.modgo get需要download整个packages,然后部分编译所有go.mod文件通过import语句来resolve dependencies。但是现在就方便了很多。

go mod init

go mod init在兼容方面做的还可以,它可以自动去转换现有的比较旧的包管理工具。

var Converters = map[string]func(string, []byte) (*modfile.File, error){
    "GLOCKFILE":          ParseGLOCKFILE,
    "Godeps/Godeps.json": ParseGodepsJSON,
    "Gopkg.lock":         ParseGopkgLock,
    "dependencies.tsv":   ParseDependenciesTSV,
    "glide.lock":         ParseGlideLock,
    "vendor.conf":        ParseVendorConf,
    "vendor.yml":         ParseVendorYML,
    "vendor/manifest":    ParseVendorManifest,
    "vendor/vendor.json": ParseVendorJSON,
}

另外一个比较酷的地方就是go mod init会尝试去访问VCS metadata去自动解析出合适的module path。

这个path真的很重要,我以前做过的一个code intelligence项目,需要将go project clone到本地非GOPATH的目录下(因为GOPATH有强制要求,就是强制源码存放在GOPATH/src下面),并且考虑到go module是未来,所以通过go mod init命令强制把所有项目转换成go module。由于go mod init能够通过访问VCS metadata来构造出正确的module path(大部分情况正确),module path正确了,整个项目的parse才能通过,有了parser构造出来的AST,code intelligence才会成为可能。

go mod init尝试构造module path 的具体代码是由go/internal/modload/init.go:findModulePath()实现的,但是具体过程还是复杂的。大致分为如下几步:

  • 递归遍历所有文件的 import comments
  • 如果没有确定,则尝试从Godeps/Godeps.jsonvendor/vendor.json中获取。

Can I control when go.mod gets updated and when the go tools use the network to satisfy dependencies?

这是wiki中的一个一个问题,我直接粘贴过来了。在现实场景中有很多类似的问题,例如基于安全方面的考虑,不能连接外网,或者如何在不download dependencies的情况下,尽可能地保证项目能够正常运行。在module模式下,go command会自动尝试去连接网络update或者fetch新的 dependencies

Some teams will want to disallow the go tooling from touching the network at certain points, or will want greater control regarding when the go tooling updates go.mod, how dependencies are obtained, and how vendoring is used.

The go tooling provides a fair amount of flexibility to adjust or disable these default behaviors, including via -mod=readonly, -mod=vendor, GOFLAGS, GOPROXY=off, GOPROXY=file:///filesystem/path, go mod vendor, and go mod download.

关于这方面的细节go-module-knobs中有很好的的介绍。

Multi-Module Repositories

由于我以前的做过的code intelligence的项目,需要把旧的项目全部转化为module,在项目初期,这里花了我80%的精力去覆盖10%的cases(当时真是年轻不懂事,应该多和leader zhao fuyao沟通想要达成的效果以及所需要花费的工时,真是感谢他的耐心 )。因为很难预测面对的Repo形态是什么样的,有可能部分directory是比较旧的vendor,有的是go.mod,所以需要确定如何才能正确地构建go.mod,这个过程很复杂,写了很多垃圾代码去尝试覆盖所有场景。

go module支持子module,能够递归的向上搜索找到合适的go.mod

package

我当时最感兴趣的是golang是如何确定从哪里download package的

import "github.com/kylelemons/go-gypsy/yaml"
             ^         ^          ^     ^
             |         |          |     `-- Package name
             |         |          `-------- Project name
             |         `------------------- Author's handle
             `----------------------------- Hosting site

注:上面的示例来自于PackagePublishing

整个import string有下面四部分组成:

  • Hosting site
  • Author’s handle
  • Project name
  • Package name

从说明中我们就可以知道,golang package存储在某个hosting site中。

// Certain import paths also
// describe how to obtain the source code for the package using
// a revision control system.
//
// A few common code hosting sites have special syntax:
//
// 	Bitbucket (Git, Mercurial)
//
// 		import "bitbucket.org/user/project"
// 		import "bitbucket.org/user/project/sub/directory"
//
// 	GitHub (Git)
//
// 		import "github.com/user/project"
// 		import "github.com/user/project/sub/directory"
//
// 	Launchpad (Bazaar)
//
// 		import "launchpad.net/project"
// 		import "launchpad.net/project/series"
// 		import "launchpad.net/project/series/sub/directory"
//
// 		import "launchpad.net/~user/project/branch"
// 		import "launchpad.net/~user/project/branch/sub/directory"
//
// 	IBM DevOps Services (Git)
//
// 		import "hub.jazz.net/git/user/project"
// 		import "hub.jazz.net/git/user/project/sub/directory"
//
// For code hosted on other servers, import paths may either be qualified
// with the version control type, or the go tool can dynamically fetch
// the import path over https/http and discover where the code resides
// from a  tag in the HTML.
//
// To declare the code location, an import path of the form
//
// 	repository.vcs/path
//
// specifies the given repository, with or without the .vcs suffix,
// using the named version control system, and then the path inside
// that repository. The supported version control systems are:
//
// 	Bazaar      .bzr
// 	Fossil      .fossil
// 	Git         .git
// 	Mercurial   .hg
// 	Subversion  .svn
//
// For example,
//
// 	import "example.org/user/foo.hg"
//
// denotes the root directory of the Mercurial repository at
// example.org/user/foo or foo.hg, and
//
// 	import "example.org/repo.git/foo/bar"
//
// denotes the foo/bar directory of the Git repository at
// example.org/repo or repo.git.
//
// When a version control system supports multiple protocols,
// each is tried in turn when downloading. For example, a Git
// download tries https://, then git+ssh://.

注:上面的注释信息摘自go/src/cmd/go/alldocs.go

对于一些常见的hosting sites,例如github.com或者bitbucket.org,go command能够自动识别version control systems,对于其它的server,需要对HTML页面中的 tag做一定的修改,才能让go command识别。

// The meta tag has the form:
//
// 	

// ...

// The vcs is one of "bzr", "fossil", "git", "hg", "svn".

packagemodule模式host在 code host site中时,与普通模式没什么区别:

// When using modules, downloaded packages are stored in the module cache.
// (See 'go help module-get' and 'go help goproxy'.)
//
// When using modules, an additional variant of the go-import meta tag is
// recognized and is preferred over those listing version control systems.
// That variant uses "mod" as the vcs in the content value, as in:
//
// 	
//
// This tag means to fetch modules with paths beginning with example.org
// from the module proxy available at the URL https://code.org/moduleproxy.
// See 'go help goproxy' for details about the proxy protocol.

build list

另外我在以前的工作中接触到的一个比较重要的概念是build list,这也是go比较好的地方,module模式下任何go command 都会自动下载main module对应版本的dependencies,然后满足build的需求。code intelligence需要完成的type information,才能提供symbol的definition,才能支持hover,或者进一步的跳转。所以build list也就等同于dependencies,需要仔细处理。

"构建"build list也需要收集operating system和architecture的信息。

// The set of modules providing packages to builds is called the "build list".
// The build list initially contains only the main module. Then the go command
// adds to the list the exact module versions required by modules already
// on the list, recursively, until there is nothing left to add to the list.
// If multiple versions of a particular module are added to the list,
// then at the end only the latest version (according to semantic version
// ordering) is kept for use in the build.
//
// The 'go list' command provides information about the main module
// and the build list. For example:
//
// 	go list -m              # print path of main module
// 	go list -m -f={{.Dir}}  # print root directory of main module
// 	go list -m all          # print build list

GOPROXY

GOPROXY也是module模式下的产物,GOPROXY是一个environment variable,GOPROXY指明了module下载的URL。

// 	GOPROXY
// 		URL of Go module proxy. See 'go help modules'.

// ...

// Module downloading and verification
//
// The go command can fetch modules from a proxy or connect to source control
// servers directly, according to the setting of the GOPROXY environment
// variable (see 'go help env'). The default setting for GOPROXY is
// "https://proxy.golang.org,direct", which means to try the
// Go module mirror run by Google and fall back to a direct connection
// if the proxy reports that it does not have the module (HTTP error 404 or 410).
// See https://proxy.golang.org/privacy for the service's privacy policy.
// If GOPROXY is set to the string "direct", downloads use a direct connection
// to source control servers. Setting GOPROXY to "off" disallows downloading
// modules from any source. Otherwise, GOPROXY is expected to be a comma-separated
// list of the URLs of module proxies, in which case the go command will fetch
// modules from those proxies. For each request, the go command tries each proxy
// in sequence, only moving to the next if the current proxy returns a 404 or 410
// HTTP response. The string "direct" may appear in the proxy list,
// to cause a direct connection to be attempted at that point in the search.
// Any proxies listed after "direct" are never consulted.
//
// The GOPRIVATE and GONOPROXY environment variables allow bypassing
// the proxy for selected modules. See 'go help module-private' for details.
//
// No matter the source of the modules, the go command checks downloads against
// known checksums, to detect unexpected changes in the content of any specific
// module version from one day to the next. This check first consults the current
// module's go.sum file but falls back to the Go checksum database, controlled by
// the GOSUMDB and GONOSUMDB environment variables. See 'go help module-auth'
// for details.

GOPROXY默认是https://proxy.golang.org,direct,当然国内是访问不到了。我们可以通过设置GOPROXY=off的值来禁止go command连接网络下载module。当然从GOPROXY下载package的时候,是需要check go.sum来保证安全的,安全稳定也是广大开发者很看重的一点,对于module的质疑也源于此,联想node.js的left-pad事件。

如果想要更详尽的了解 Go Module Proxy,强烈推荐美女工程师Katie Hockman的talk GopherCon 2019: Katie Hockman - Go Module Proxy: Life of a Query 和 sourcegraph的blog GopherCon 2019 - Go Module Proxy: Life of a query。

Go Module Proxy有以下三个基本的特性:

  • Reproducible builds. Modules
  • Persistent dependencies. Mirror
  • Trustworthy fetches. Checksum Database


注:上图来自于GopherCon 2019: Katie Hockman - Go Module Proxy: Life of a Query

go list

go list是解析go project dependencies的重要的命令,tools中封装起来的供用户使用的调用go list的API是golistDriver()。不同于其它命令,这个命令会按照optioins将build list信息展示出来。

// The 'go list' command provides information about the main module
// and the build list.

资料:
Modules
go/src/cmd/go/alldocs.go

待了解

Merkle Tree 区块链和go.sum database都使用了这个技术
HTTPS

你可能感兴趣的:(golang)