kubernetes test-infra

image

Velodrome

velodrome.k8s.io/
是一个dashboard, monitoring and metrics for Kubernetes Developer Productivity. 一系列组件,用户监测developer productivity, 这个应该比较通用,略修改可以用于其他repo

与github交互的代码复用于github robot也比较容易

架构

  • Grafana stack: 前端 (用的都是开源组件,里面存的都是配置)
    • InfluxDB: save precalculated metrics
    • Prometheus: save poll-based metrics
    • Grafana: display graphs based on these metrics
    • nginx: proxy all of these services in a single URL
  • SQL base: containing a copy of the issues, events, and PRs in Github repositories. It is used for calculating statistics about developer productivity.
    • Fetcher: fetches Github data and stores in a SQL database (主要的go代码在这,调用github sdk去拉取数据)
    • SQL Proxy: SQL Proxy deployment to Cloud SQL (存的配置)
    • Transform: Transform SQL (Github db) into valuable metrics
  • Other monitoring tools
    • token-counter: Monitors RateLimit usage of your github

Fetcher

使用的资源

  • Issues (including pull-requests)
  • Events (associated to issues)
  • Comments (regular comments and review comments)

使用这些资源来

  • Compute average time-to-resolution for an issue/pull-request
  • Compute time between label creation/removal: lgtm'd, merged
  • break-down based on specific flags (size, priority, ...)
// ClientInterface describes what a client should be able to do
type ClientInterface interface {
    RepositoryName() string
    FetchIssues(last time.Time, c chan *github.Issue)
    FetchIssueEvents(issueID int, last *int, c chan *github.IssueEvent)
    FetchIssueComments(issueID int, last time.Time, c chan *github.IssueComment)
    FetchPullComments(issueID int, last time.Time, c chan *github.PullRequestComment)
}复制代码

流程

// 1. 入口
1. main -> cobra.Command(root) -> runProgram -> UpdateIssues
// 2. UpdateIssues test-infra/velodrome/fetcher/issues.go
// 调用client FetchIssues, channel 传递 issue model
go client.FetchIssues(latest, c)
for issue := range c {
    // 2.1 
    NewIssue(..)

    UpdateComments(*issue.Number, issueOrm.IsPR, db, client)
    // and find if we have new events
    UpdateIssueEvents(*issue.Number, db, client)
}

// 2.2 UpdateComments test-infra/velodrome/fetcher/comments.go
func UpdateComments(issueID int, pullRequest bool, db *gorm.DB, client ClientInterface) {
    latest := findLatestCommentUpdate(issueID, db, client.RepositoryName())

    updateIssueComments(issueID, latest, db, client)
    if pullRequest {
        updatePullComments(issueID, latest, db, client)
    }
}

func updateIssueComments(issueID int, latest time.Time, db *gorm.DB, client ClientInterface) {
    c := make(chan *github.IssueComment, 200)
    go client.FetchIssueComments(issueID, latest, c)
    for comment := range c {
        commentOrm, err := NewIssueComment(issueID, comment, client.RepositoryName())
        ...
    }
}

func updatePullComments(issueID int, latest time.Time, db *gorm.DB, client ClientInterface) {
    c := make(chan *github.PullRequestComment, 200)

    go client.FetchPullComments(issueID, latest, c)

    for comment := range c {
        commentOrm, err := NewPullComment(issueID, comment, client.RepositoryName())
        ...
    }
}


// 2.3 UpdateIssueEvents test-infra/velodrome/fetcher/issue-events.go
func UpdateIssueEvents(issueID int, db *gorm.DB, client ClientInterface) {
    ...
    c := make(chan *github.IssueEvent, 500)

    go client.FetchIssueEvents(issueID, latest, c)
    for event := range c {
        eventOrm, err := NewIssueEvent(event, issueID, client.RepositoryName())
        ...
    }
}复制代码

token-counter

transform

sql 中的github数据 --> transform --> metrics

func (config *transformConfig) run(plugin plugins.Plugin) error {
    ...

    // 处理 issue, comment 数据成为point -> influxdb
    go Dispatch(plugin, influxdb, fetcher.IssuesChannel,
        fetcher.EventsCommentsChannel)

    ticker := time.Tick(time.Hour / time.Duration(config.frequency))
    for {
        // Fetch new events from MySQL, push it to plugins
        if err := fetcher.Fetch(mysqldb); err != nil {
            return err
        }
        // 处理好的batch point,批量推送到influx db
        if err := influxdb.PushBatchPoints(); err != nil {
            return err
        }

        if config.once {
            break
        }
        // 最短多久跑一次
        <-ticker
    }
}

// Dispatch receives channels to each type of events, and dispatch them to each plugins.
func Dispatch(plugin plugins.Plugin, DB *InfluxDB, issues chan sql.Issue, eventsCommentsChannel chan interface{}) {
    for {
        var points []plugins.Point
        select {
        case issue, ok := <-issues:
            if !ok {
                return
            }
            points = plugin.ReceiveIssue(issue)
        case event, ok := <-eventsCommentsChannel:
            if !ok {
                return
            }
            switch event := event.(type) {
            case sql.IssueEvent:
                points = plugin.ReceiveIssueEvent(event)
            case sql.Comment:
                points = plugin.ReceiveComment(event)
            default:
                glog.Fatal("Received invalid object: ", event)
            }
        }

        for _, point := range points {
            if err := DB.Push(point.Tags, point.Values, point.Date); err != nil {
                glog.Fatal("Failed to push point: ", err)
            }
        }
    }
}复制代码

plugin

plugin 需要实现Plugin interface

type Plugin interface {
    ReceiveIssue(sql.Issue) []Point
    ReceiveComment(sql.Comment) []Point
    ReceiveIssueEvent(sql.IssueEvent) []Point
}复制代码

入口root.AddCommand(plugins.NewCountPlugin(config.run))
pulgin 是 authorFilter test-infra/velodrome/transform/plugins/count.go

// test-infra/velodrome/transform/plugins/count.go
// 多个plugin wrap 成了一个
func NewCountPlugin(runner func(Plugin) error) *cobra.Command {
    stateCounter := &StatePlugin{}
    eventCounter := &EventCounterPlugin{}
    commentsAsEvents := NewFakeCommentPluginWrapper(eventCounter)
    commentCounter := &CommentCounterPlugin{}
    authorLoggable := NewMultiplexerPluginWrapper(
        commentsAsEvents,
        commentCounter,
    )
    authorLogged := NewAuthorLoggerPluginWrapper(authorLoggable)
    fullMultiplex := NewMultiplexerPluginWrapper(authorLogged, stateCounter)

    fakeOpen := NewFakeOpenPluginWrapper(fullMultiplex)
    typeFilter := NewTypeFilterWrapperPlugin(fakeOpen)
    authorFilter := NewAuthorFilterPluginWrapper(typeFilter)
    ..;复制代码

triage

一个代码很简单但是页面却很丰富的metric展示页面,什么作用还没看明白
=》 Kubernetes Aggregated Failures

testgrid

testgrid.k8s.io 的前后端, 是jenkins test的metrics 统计, grid的方式,很直观
前端可配置,config.yaml 里面是所有的test
比如 1.6-1.7-kubectl-skew 是其中一个dashborad 下面有多个tab,每个是一个test group, 如 gce-1.6-1-7-cvm

The testgrid site is accessible at testgrid.k8s.io. The site is
configured by [config.yaml].
Updates to the config are automatically tested and pushed to production.

Testgrid is composed of:

  • A list of test groups that contain results for a job over time.
  • A list of dashboards that are composed of tabs that display a test group
  • A list of dashboard groups of related dashboards.

scenarios

测试脚本,python 脚本,调用k8s.io/kubernetes/test/....go

Test jobs are composed of two things:
1) A scenario to test
2) Configuration options for the scenario.

Three example scenarios are:

  • Unit tests
  • Node e2e tests
  • e2e tests

Example configurations are:

  • Parallel tests on gce
  • Build all platforms

The assumption is that each scenario will be called a variety of times with
different configuration options. For example at the time of this writing there
are over 300 e2e jobs, each run with a slightly different set of options.

rebots

issue-creator

source 是创建的来源,比如FlakyJobReporter会对flaky 的 jenkins job 创建issue

queue health

This app monitors the submit queue and produces the chart at submit-queue.k8s.io/#/e2e.
就是给submit-queue.k8s.io生成统计图,比如这个https://storage.go…

It does this with two components:

  • a poller, which polls the current state of the queue and appends it to a
    historical log.
  • a grapher, which gets the historical log and renders it into charts.

prow

这个这里面最有意思的app,可以作为处理github command的rebot,plugin的设计
组件包括:

  • cmd/hook is the most important piece. It is a stateless server that listens
    for GitHub webhooks and dispatches them to the appropriate handlers.
  • cmd/plank is the controller that manages jobs running in k8s pods.
  • cmd/jenkins-operator is the controller that manages jobs running in Jenkins.
  • cmd/sinker cleans up old jobs and pods.
  • cmd/splice regularly schedules batch jobs.
  • cmd/deck presents a nice view of recent jobs.
  • cmd/phony sends fake webhooks.
  • cmd/tot vends incrementing build numbers.
  • cmd/horologium starts periodic jobs when necessary.
  • cmd/mkpj creates ProwJobs.

config

deck

prow.k8s.io/的前后端,展示rece… prow jobs (third party resource).

hook

核心,listen github webhook,然后分发,主要是交给plugin处理

k8s Bot Commands

k8s-ci-robot and k8s-merge-robot understand several commands. They should all be uttered on their own line, and they are case-sensitive.

Command Implemented By Who can run it Description
/approve mungegithub approvers owners approve all the files for which you are an approver
/approve no-issue mungegithub approvers owners approve when a PR doesn't have an associated issue
/approve cancel mungegithub approvers owners removes your approval on this pull-request
/area [label1 label2 ...] prow label anyone adds an area/<> label(s) if it exists
/remove-area [label1 label2 ...] prow label anyone removes an area/<> label(s) if it exists
/assign [@userA @userB @etc] prow assign anyone Assigns specified people (or yourself if no one is specified). Target must be a kubernetes org member.
/unassign [@userA @userB @etc] prow assign anyone Unassigns specified people (or yourself if no one is specified). Target must already be assigned.
/cc [@userA @userB @etc] prow assign anyone Request review from specified people (or yourself if no one is specified). Target must be a kubernetes org member.
/uncc [@userA @userB @etc] prow assign anyone Dismiss review request for specified people (or yourself if no one is specified). Target must already have had a review requested.
/close prow close authors and assignees closes the issue/PR
/reopen prow reopen authors and assignees reopens a closed issue/PR
/hold prow hold anyone adds the do-not-merge/hold label
/hold cancel prow hold anyone removes the do-not-merge/hold label
/joke prow yuks anyone tells a bad joke, sometimes
/kind [label1 label2 ...] prow label anyone adds a kind/<> label(s) if it exists
/remove-kind [label1 label2 ...] prow label anyone removes a kind/<> label(s) if it exists
/lgtm prow lgtm assignees adds the lgtm label
/lgtm cancel prow lgtm authors and assignees removes the lgtm label
/ok-to-test prow trigger kubernetes org members allows the PR author to /test all
/test all
/test
prow trigger anyone on trusted PRs runs tests defined in config.yaml
/retest prow trigger anyone on trusted PRs reruns failed tests
/priority [label1 label2 ...] prow label anyone adds a priority/<> label(s) if it exists
/remove-priority [label1 label2 ...] prow label anyone removes a priority/<> label(s) if it exists
/sig [label1 label2 ...] prow label anyone adds a sig/<> label(s) if it exists
@kubernetes/sig- prow label kubernetes org members adds the corresponding sig label
/remove-sig [label1 label2 ...] prow label anyone removes a sig/<> label(s) if it exists
/release-note prow releasenote authors and kubernetes org members adds the release-note label
/release-note-action-required prow releasenote authors and kubernetes org members adds the release-note-action-required label
/release-note-none prow releasenote authors and kubernetes org members adds the release-note-none label
// 一个叫个plugin处理的例子 k8s.io/test-infra/prow/hook/events.go
func (s *Server) handleGenericComment(ce *github.GenericCommentEvent, log *logrus.Entry) {
    for p, h := range s.Plugins.GenericCommentHandlers(ce.Repo.Owner.Login, ce.Repo.Name) {
        go func(p string, h plugins.GenericCommentHandler) {
            pc := s.Plugins.PluginClient
            pc.Logger = log.WithField("plugin", p)
            pc.Config = s.ConfigAgent.Config()
            pc.PluginConfig = s.Plugins.Config()
            if err := h(pc, *ce); err != nil {
                pc.Logger.WithError(err).Error("Error handling GenericCommentEvent.")
            }
        }(p, h)
    }
}


// 开启的plugin 列表 k8s.io/test-infra/prow/hook/plugins.go

import (
    _ "k8s.io/test-infra/prow/plugins/assign"
    _ "k8s.io/test-infra/prow/plugins/cla"
    _ "k8s.io/test-infra/prow/plugins/close"
    _ "k8s.io/test-infra/prow/plugins/golint"
    _ "k8s.io/test-infra/prow/plugins/heart"
    _ "k8s.io/test-infra/prow/plugins/hold"
    _ "k8s.io/test-infra/prow/plugins/label"
    _ "k8s.io/test-infra/prow/plugins/lgtm"
    _ "k8s.io/test-infra/prow/plugins/releasenote"
    _ "k8s.io/test-infra/prow/plugins/reopen"
    _ "k8s.io/test-infra/prow/plugins/shrug"
    _ "k8s.io/test-infra/prow/plugins/size"
    _ "k8s.io/test-infra/prow/plugins/slackevents"
    _ "k8s.io/test-infra/prow/plugins/trigger"
    _ "k8s.io/test-infra/prow/plugins/updateconfig"
    _ "k8s.io/test-infra/prow/plugins/wip"
    _ "k8s.io/test-infra/prow/plugins/yuks"
)复制代码

plugin-lgtm

// k8s.io/test-infra/prow/plugins/lgtm/lgtm.go
// plugin 类型 plugin 基本是实现 下面的一个或者多个handler
genericCommentHandlers     = map[string]GenericCommentHandler{}
issueHandlers              = map[string]IssueHandler{}
issueCommentHandlers       = map[string]IssueCommentHandler{}
pullRequestHandlers        = map[string]PullRequestHandler{}
pushEventHandlers          = map[string]PushEventHandler{}
reviewEventHandlers        = map[string]ReviewEventHandler{}
reviewCommentEventHandlers = map[string]ReviewCommentEventHandler{}
statusEventHandlers        = map[string]StatusEventHandler{}

// 比如lgtm, 做的事主要是 检查"lgtm (cancel)" 看看是不是合法,assign issue, 添加活着删除
// lgtm label, 如果有问题还会创建comment
func init() {
    plugins.RegisterIssueCommentHandler(pluginName, handleIssueComment)
    plugins.RegisterReviewEventHandler(pluginName, handleReview)
    plugins.RegisterReviewCommentEventHandler(pluginName, handleReviewComment)
}复制代码

plugin-assign

// 处理 (un)assign xx; (un)cc; assing issue to xxx
func init() {
    plugins.RegisterIssueCommentHandler(pluginName, handleIssueComment)
    plugins.RegisterIssueHandler(pluginName, handleIssue)
    plugins.RegisterPullRequestHandler(pluginName, handlePullRequest)
}复制代码

plugin-golint

// 会git checkout 代码,对改动的代码调用golang lint 
func init() {
    plugins.RegisterIssueCommentHandler(pluginName, handleIC)
}复制代码

plugin-heart

没看懂是干嘛的

plugin-hold

do-not-merge/hold label

plugin-label

用得最多的plugin, 很多命令都是他实现,比如area,sig,kind...

(?m)^/(area|priority|kind|sig)\s*(.*)$复制代码

plugin-trigger

这个plugin处理的命令比较重要,也比较特殊,会触发prowjob

// 处理 /retest  /ok-to-test

func init() {
    plugins.RegisterIssueCommentHandler(pluginName, handleIssueComment)
    plugins.RegisterPullRequestHandler(pluginName, handlePullRequest)
    plugins.RegisterPushEventHandler(pluginName, handlePush)
}

// 对于pr 的处理 k8s.io/test-infra/prow/plugins/trigger/pr.go
func handlePR(c client, trustedOrg string, pr github.PullRequestEvent) error {
    author := pr.PullRequest.User.Login
    switch pr.Action {
    case github.PullRequestActionOpened:
        // ismember -> buildAll
        // else welcome
    case github.PullRequestActionReopened, github.PullRequestActionSynchronize:
        // if trusted -> buildAll
    case github.PullRequestActionLabeled:
        // When a PR is LGTMd, if it is untrusted then build it once.
    }
    return nil
}

// buildAll -> CreateProwJob


// 对于 comment:retest的处理
// 会收集status中失败的 ->一个presubmit 结构 -> prowjob
// 参考github 的status api, https://developer.github.com/v3/repos/statuses/复制代码

horologium

starts periodic jobs when necessary.

jenkins-operator

controller that manages jobs running in Jenkins, 基本上是从jenkins job的状态sync 到prow job 的status

mkpj

一个cmd 可以手动 creates ProwJobs

plank

controller that manages jobs running in k8s pods, 从pod的状态sync到prowjob 的status

sinker

cleans up old jobs and pods.

splice

regularly schedules batch jobs.

tide

todo

tot

todo

Planter

Planter is a container + wrapper script for your bazel builds.
It will run a docker container as the current user that can run bazel builds
in your $PWD.

mungegithub

a deprecated system befor prow

metrics

todo

maintenance/migratestatus

Status Context Migrator
The migratestatus tool is a maintenance utility used to safely switch a repo from one status context to another.
For example if there is a context named "CI tests" that needs to be moved by "CI tests v2" this tool can be used to copy every "CI tests" status into a "CI tests v2" status context and then mark every "CI tests" context as passing and retired. This ensures that no PRs are ever passing when they shouldn't be and doesn't block PRs that should be passing. The copy and retire phases can be run seperately or together at once in move mode.

LogExporter

LogExporter is a tool that runs post-test on our kubernetes test clusters.
It does the job of computing the set of logfiles to be exported (based on the
node type (master/node), cloud provider, and the node's system services),and
then actually exports them to the GCS path provided to it.

把日志文件上传到gcs

label_sync

todo

kubetest

Kubetest is the interface for launching and running e2e tests.

参考 github.com/kubernetes/…

KETTLE

Kubernetes Extract Tests/Transform/Load Engine

This collects test results scattered across a variety of GCS buckets,
stores them in a local SQLite database, and outputs newline-delimited JSON files
for import into BigQuery.

jenkins

deprecated, 参考 github.com/kubernetes/…

现在不会直接创建jenkins job,而是用prow job

gubernator

k8s-gubernator.appspot.com/ 前端,应该是test-infra最重要的前端,status check 里面的链接,比如k8s-gubernator.appspot.com/build/kuber… 也是来自这里

gcsweb

转载于:https://juejin.im/post/5a1141a7f265da43284072de

你可能感兴趣的:(kubernetes test-infra)