Velodrome
velodrome.k8s.io/
是一个dashboard, monitoring and metrics for Kubernetes Developer Productivity. 一系列组件,用户监测developer productivity, 这个应该比较通用,略修改可以用于其他repo
与github交互的代码复用于github robot也比较容易
架构
- Grafana stack: 前端 (用的都是开源组件,里面存的都是配置)
- InfluxDB: save precalculated metrics
- Prometheus: save poll-based metrics
- Grafana: display graphs based on these metrics
- nginx: proxy all of these services in a single URL
- SQL base: containing a copy of the issues, events, and PRs in Github repositories. It is used for calculating statistics about developer productivity.
- Fetcher: fetches Github data and stores in a SQL database (主要的go代码在这,调用github sdk去拉取数据)
- SQL Proxy: SQL Proxy deployment to Cloud SQL (存的配置)
- Transform: Transform SQL (Github db) into valuable metrics
- Other monitoring tools
- token-counter: Monitors RateLimit usage of your github
Fetcher
使用的资源
- Issues (including pull-requests)
- Events (associated to issues)
- Comments (regular comments and review comments)
使用这些资源来
- Compute average time-to-resolution for an issue/pull-request
- Compute time between label creation/removal: lgtm'd, merged
- break-down based on specific flags (size, priority, ...)
// ClientInterface describes what a client should be able to do
type ClientInterface interface {
RepositoryName() string
FetchIssues(last time.Time, c chan *github.Issue)
FetchIssueEvents(issueID int, last *int, c chan *github.IssueEvent)
FetchIssueComments(issueID int, last time.Time, c chan *github.IssueComment)
FetchPullComments(issueID int, last time.Time, c chan *github.PullRequestComment)
}复制代码
流程
// 1. 入口
1. main -> cobra.Command(root) -> runProgram -> UpdateIssues
// 2. UpdateIssues test-infra/velodrome/fetcher/issues.go
// 调用client FetchIssues, channel 传递 issue model
go client.FetchIssues(latest, c)
for issue := range c {
// 2.1
NewIssue(..)
UpdateComments(*issue.Number, issueOrm.IsPR, db, client)
// and find if we have new events
UpdateIssueEvents(*issue.Number, db, client)
}
// 2.2 UpdateComments test-infra/velodrome/fetcher/comments.go
func UpdateComments(issueID int, pullRequest bool, db *gorm.DB, client ClientInterface) {
latest := findLatestCommentUpdate(issueID, db, client.RepositoryName())
updateIssueComments(issueID, latest, db, client)
if pullRequest {
updatePullComments(issueID, latest, db, client)
}
}
func updateIssueComments(issueID int, latest time.Time, db *gorm.DB, client ClientInterface) {
c := make(chan *github.IssueComment, 200)
go client.FetchIssueComments(issueID, latest, c)
for comment := range c {
commentOrm, err := NewIssueComment(issueID, comment, client.RepositoryName())
...
}
}
func updatePullComments(issueID int, latest time.Time, db *gorm.DB, client ClientInterface) {
c := make(chan *github.PullRequestComment, 200)
go client.FetchPullComments(issueID, latest, c)
for comment := range c {
commentOrm, err := NewPullComment(issueID, comment, client.RepositoryName())
...
}
}
// 2.3 UpdateIssueEvents test-infra/velodrome/fetcher/issue-events.go
func UpdateIssueEvents(issueID int, db *gorm.DB, client ClientInterface) {
...
c := make(chan *github.IssueEvent, 500)
go client.FetchIssueEvents(issueID, latest, c)
for event := range c {
eventOrm, err := NewIssueEvent(event, issueID, client.RepositoryName())
...
}
}复制代码
token-counter
transform
sql 中的github数据 --> transform --> metrics
func (config *transformConfig) run(plugin plugins.Plugin) error {
...
// 处理 issue, comment 数据成为point -> influxdb
go Dispatch(plugin, influxdb, fetcher.IssuesChannel,
fetcher.EventsCommentsChannel)
ticker := time.Tick(time.Hour / time.Duration(config.frequency))
for {
// Fetch new events from MySQL, push it to plugins
if err := fetcher.Fetch(mysqldb); err != nil {
return err
}
// 处理好的batch point,批量推送到influx db
if err := influxdb.PushBatchPoints(); err != nil {
return err
}
if config.once {
break
}
// 最短多久跑一次
<-ticker
}
}
// Dispatch receives channels to each type of events, and dispatch them to each plugins.
func Dispatch(plugin plugins.Plugin, DB *InfluxDB, issues chan sql.Issue, eventsCommentsChannel chan interface{}) {
for {
var points []plugins.Point
select {
case issue, ok := <-issues:
if !ok {
return
}
points = plugin.ReceiveIssue(issue)
case event, ok := <-eventsCommentsChannel:
if !ok {
return
}
switch event := event.(type) {
case sql.IssueEvent:
points = plugin.ReceiveIssueEvent(event)
case sql.Comment:
points = plugin.ReceiveComment(event)
default:
glog.Fatal("Received invalid object: ", event)
}
}
for _, point := range points {
if err := DB.Push(point.Tags, point.Values, point.Date); err != nil {
glog.Fatal("Failed to push point: ", err)
}
}
}
}复制代码
plugin
plugin 需要实现Plugin interface
type Plugin interface {
ReceiveIssue(sql.Issue) []Point
ReceiveComment(sql.Comment) []Point
ReceiveIssueEvent(sql.IssueEvent) []Point
}复制代码
入口root.AddCommand(plugins.NewCountPlugin(config.run))
pulgin 是 authorFilter test-infra/velodrome/transform/plugins/count.go
// test-infra/velodrome/transform/plugins/count.go
// 多个plugin wrap 成了一个
func NewCountPlugin(runner func(Plugin) error) *cobra.Command {
stateCounter := &StatePlugin{}
eventCounter := &EventCounterPlugin{}
commentsAsEvents := NewFakeCommentPluginWrapper(eventCounter)
commentCounter := &CommentCounterPlugin{}
authorLoggable := NewMultiplexerPluginWrapper(
commentsAsEvents,
commentCounter,
)
authorLogged := NewAuthorLoggerPluginWrapper(authorLoggable)
fullMultiplex := NewMultiplexerPluginWrapper(authorLogged, stateCounter)
fakeOpen := NewFakeOpenPluginWrapper(fullMultiplex)
typeFilter := NewTypeFilterWrapperPlugin(fakeOpen)
authorFilter := NewAuthorFilterPluginWrapper(typeFilter)
..;复制代码
triage
一个代码很简单但是页面却很丰富的metric展示页面,什么作用还没看明白
=》 Kubernetes Aggregated Failures
testgrid
testgrid.k8s.io 的前后端, 是jenkins test的metrics 统计, grid的方式,很直观
前端可配置,config.yaml 里面是所有的test
比如 1.6-1.7-kubectl-skew 是其中一个dashborad 下面有多个tab,每个是一个test group, 如 gce-1.6-1-7-cvm
The testgrid site is accessible at testgrid.k8s.io. The site is
configured by [config.yaml
].
Updates to the config are automatically tested and pushed to production.
Testgrid is composed of:
- A list of test groups that contain results for a job over time.
- A list of dashboards that are composed of tabs that display a test group
- A list of dashboard groups of related dashboards.
scenarios
测试脚本,python 脚本,调用k8s.io/kubernetes/test/....go
Test jobs are composed of two things:
1) A scenario to test
2) Configuration options for the scenario.
Three example scenarios are:
- Unit tests
- Node e2e tests
- e2e tests
Example configurations are:
- Parallel tests on gce
- Build all platforms
The assumption is that each scenario will be called a variety of times with
different configuration options. For example at the time of this writing there
are over 300 e2e jobs, each run with a slightly different set of options.
rebots
issue-creator
source 是创建的来源,比如FlakyJobReporter会对flaky 的 jenkins job 创建issue
queue health
This app monitors the submit queue and produces the chart at submit-queue.k8s.io/#/e2e.
就是给submit-queue.k8s.io生成统计图,比如这个https://storage.go…
It does this with two components:
- a poller, which polls the current state of the queue and appends it to a
historical log. - a grapher, which gets the historical log and renders it into charts.
prow
这个这里面最有意思的app,可以作为处理github command的rebot,plugin的设计
组件包括:
cmd/hook
is the most important piece. It is a stateless server that listens
for GitHub webhooks and dispatches them to the appropriate handlers.cmd/plank
is the controller that manages jobs running in k8s pods.cmd/jenkins-operator
is the controller that manages jobs running in Jenkins.cmd/sinker
cleans up old jobs and pods.cmd/splice
regularly schedules batch jobs.cmd/deck
presents a nice view of recent jobs.cmd/phony
sends fake webhooks.cmd/tot
vends incrementing build numbers.cmd/horologium
starts periodic jobs when necessary.cmd/mkpj
createsProwJobs
.
config
deck
prow.k8s.io/的前后端,展示rece… prow jobs (third party resource).
hook
核心,listen github webhook,然后分发,主要是交给plugin处理
k8s Bot Commands
k8s-ci-robot
and k8s-merge-robot
understand several commands. They should all be uttered on their own line, and they are case-sensitive.
Command | Implemented By | Who can run it | Description |
---|---|---|---|
/approve |
mungegithub approvers | owners | approve all the files for which you are an approver |
/approve no-issue |
mungegithub approvers | owners | approve when a PR doesn't have an associated issue |
/approve cancel |
mungegithub approvers | owners | removes your approval on this pull-request |
/area [label1 label2 ...] |
prow label | anyone | adds an area/<> label(s) if it exists |
/remove-area [label1 label2 ...] |
prow label | anyone | removes an area/<> label(s) if it exists |
/assign [@userA @userB @etc] |
prow assign | anyone | Assigns specified people (or yourself if no one is specified). Target must be a kubernetes org member. |
/unassign [@userA @userB @etc] |
prow assign | anyone | Unassigns specified people (or yourself if no one is specified). Target must already be assigned. |
/cc [@userA @userB @etc] |
prow assign | anyone | Request review from specified people (or yourself if no one is specified). Target must be a kubernetes org member. |
/uncc [@userA @userB @etc] |
prow assign | anyone | Dismiss review request for specified people (or yourself if no one is specified). Target must already have had a review requested. |
/close |
prow close | authors and assignees | closes the issue/PR |
/reopen |
prow reopen | authors and assignees | reopens a closed issue/PR |
/hold |
prow hold | anyone | adds the do-not-merge/hold label |
/hold cancel |
prow hold | anyone | removes the do-not-merge/hold label |
/joke |
prow yuks | anyone | tells a bad joke, sometimes |
/kind [label1 label2 ...] |
prow label | anyone | adds a kind/<> label(s) if it exists |
/remove-kind [label1 label2 ...] |
prow label | anyone | removes a kind/<> label(s) if it exists |
/lgtm |
prow lgtm | assignees | adds the lgtm label |
/lgtm cancel |
prow lgtm | authors and assignees | removes the lgtm label |
/ok-to-test |
prow trigger | kubernetes org members | allows the PR author to /test all |
/test all /test |
prow trigger | anyone on trusted PRs | runs tests defined in config.yaml |
/retest |
prow trigger | anyone on trusted PRs | reruns failed tests |
/priority [label1 label2 ...] |
prow label | anyone | adds a priority/<> label(s) if it exists |
/remove-priority [label1 label2 ...] |
prow label | anyone | removes a priority/<> label(s) if it exists |
/sig [label1 label2 ...] |
prow label | anyone | adds a sig/<> label(s) if it exists |
@kubernetes/sig- |
prow label | kubernetes org members | adds the corresponding sig label |
/remove-sig [label1 label2 ...] |
prow label | anyone | removes a sig/<> label(s) if it exists |
/release-note |
prow releasenote | authors and kubernetes org members | adds the release-note label |
/release-note-action-required |
prow releasenote | authors and kubernetes org members | adds the release-note-action-required label |
/release-note-none |
prow releasenote | authors and kubernetes org members | adds the release-note-none label |
// 一个叫个plugin处理的例子 k8s.io/test-infra/prow/hook/events.go
func (s *Server) handleGenericComment(ce *github.GenericCommentEvent, log *logrus.Entry) {
for p, h := range s.Plugins.GenericCommentHandlers(ce.Repo.Owner.Login, ce.Repo.Name) {
go func(p string, h plugins.GenericCommentHandler) {
pc := s.Plugins.PluginClient
pc.Logger = log.WithField("plugin", p)
pc.Config = s.ConfigAgent.Config()
pc.PluginConfig = s.Plugins.Config()
if err := h(pc, *ce); err != nil {
pc.Logger.WithError(err).Error("Error handling GenericCommentEvent.")
}
}(p, h)
}
}
// 开启的plugin 列表 k8s.io/test-infra/prow/hook/plugins.go
import (
_ "k8s.io/test-infra/prow/plugins/assign"
_ "k8s.io/test-infra/prow/plugins/cla"
_ "k8s.io/test-infra/prow/plugins/close"
_ "k8s.io/test-infra/prow/plugins/golint"
_ "k8s.io/test-infra/prow/plugins/heart"
_ "k8s.io/test-infra/prow/plugins/hold"
_ "k8s.io/test-infra/prow/plugins/label"
_ "k8s.io/test-infra/prow/plugins/lgtm"
_ "k8s.io/test-infra/prow/plugins/releasenote"
_ "k8s.io/test-infra/prow/plugins/reopen"
_ "k8s.io/test-infra/prow/plugins/shrug"
_ "k8s.io/test-infra/prow/plugins/size"
_ "k8s.io/test-infra/prow/plugins/slackevents"
_ "k8s.io/test-infra/prow/plugins/trigger"
_ "k8s.io/test-infra/prow/plugins/updateconfig"
_ "k8s.io/test-infra/prow/plugins/wip"
_ "k8s.io/test-infra/prow/plugins/yuks"
)复制代码
plugin-lgtm
// k8s.io/test-infra/prow/plugins/lgtm/lgtm.go
// plugin 类型 plugin 基本是实现 下面的一个或者多个handler
genericCommentHandlers = map[string]GenericCommentHandler{}
issueHandlers = map[string]IssueHandler{}
issueCommentHandlers = map[string]IssueCommentHandler{}
pullRequestHandlers = map[string]PullRequestHandler{}
pushEventHandlers = map[string]PushEventHandler{}
reviewEventHandlers = map[string]ReviewEventHandler{}
reviewCommentEventHandlers = map[string]ReviewCommentEventHandler{}
statusEventHandlers = map[string]StatusEventHandler{}
// 比如lgtm, 做的事主要是 检查"lgtm (cancel)" 看看是不是合法,assign issue, 添加活着删除
// lgtm label, 如果有问题还会创建comment
func init() {
plugins.RegisterIssueCommentHandler(pluginName, handleIssueComment)
plugins.RegisterReviewEventHandler(pluginName, handleReview)
plugins.RegisterReviewCommentEventHandler(pluginName, handleReviewComment)
}复制代码
plugin-assign
// 处理 (un)assign xx; (un)cc; assing issue to xxx
func init() {
plugins.RegisterIssueCommentHandler(pluginName, handleIssueComment)
plugins.RegisterIssueHandler(pluginName, handleIssue)
plugins.RegisterPullRequestHandler(pluginName, handlePullRequest)
}复制代码
plugin-golint
// 会git checkout 代码,对改动的代码调用golang lint
func init() {
plugins.RegisterIssueCommentHandler(pluginName, handleIC)
}复制代码
plugin-heart
没看懂是干嘛的
plugin-hold
do-not-merge/hold label
plugin-label
用得最多的plugin, 很多命令都是他实现,比如area,sig,kind...
(?m)^/(area|priority|kind|sig)\s*(.*)$复制代码
plugin-trigger
这个plugin处理的命令比较重要,也比较特殊,会触发prowjob
// 处理 /retest /ok-to-test
func init() {
plugins.RegisterIssueCommentHandler(pluginName, handleIssueComment)
plugins.RegisterPullRequestHandler(pluginName, handlePullRequest)
plugins.RegisterPushEventHandler(pluginName, handlePush)
}
// 对于pr 的处理 k8s.io/test-infra/prow/plugins/trigger/pr.go
func handlePR(c client, trustedOrg string, pr github.PullRequestEvent) error {
author := pr.PullRequest.User.Login
switch pr.Action {
case github.PullRequestActionOpened:
// ismember -> buildAll
// else welcome
case github.PullRequestActionReopened, github.PullRequestActionSynchronize:
// if trusted -> buildAll
case github.PullRequestActionLabeled:
// When a PR is LGTMd, if it is untrusted then build it once.
}
return nil
}
// buildAll -> CreateProwJob
// 对于 comment:retest的处理
// 会收集status中失败的 ->一个presubmit 结构 -> prowjob
// 参考github 的status api, https://developer.github.com/v3/repos/statuses/复制代码
horologium
starts periodic jobs when necessary.
jenkins-operator
controller that manages jobs running in Jenkins, 基本上是从jenkins job的状态sync 到prow job 的status
mkpj
一个cmd 可以手动 creates ProwJobs
plank
controller that manages jobs running in k8s pods, 从pod的状态sync到prowjob 的status
sinker
cleans up old jobs and pods.
splice
regularly schedules batch jobs.
tide
todo
tot
todo
Planter
Planter is a container + wrapper script for your bazel builds.
It will run a docker container as the current user that can run bazel builds
in your $PWD
.
mungegithub
a deprecated system befor prow
metrics
todo
maintenance/migratestatus
Status Context Migrator
The migratestatus tool is a maintenance utility used to safely switch a repo from one status context to another.
For example if there is a context named "CI tests" that needs to be moved by "CI tests v2" this tool can be used to copy every "CI tests" status into a "CI tests v2" status context and then mark every "CI tests" context as passing and retired. This ensures that no PRs are ever passing when they shouldn't be and doesn't block PRs that should be passing. The copy and retire phases can be run seperately or together at once in move mode.
LogExporter
LogExporter is a tool that runs post-test on our kubernetes test clusters.
It does the job of computing the set of logfiles to be exported (based on the
node type (master/node), cloud provider, and the node's system services),and
then actually exports them to the GCS path provided to it.
把日志文件上传到gcs
label_sync
todo
kubetest
Kubetest is the interface for launching and running e2e tests.
参考 github.com/kubernetes/…
KETTLE
Kubernetes Extract Tests/Transform/Load Engine
This collects test results scattered across a variety of GCS buckets,
stores them in a local SQLite database, and outputs newline-delimited JSON files
for import into BigQuery.
jenkins
deprecated, 参考 github.com/kubernetes/…
现在不会直接创建jenkins job,而是用prow job
gubernator
k8s-gubernator.appspot.com/ 前端,应该是test-infra最重要的前端,status check 里面的链接,比如k8s-gubernator.appspot.com/build/kuber… 也是来自这里