首先，关于k8s的调度机制，可以看我上一篇文章：

k8s学习记录（kube-scheduler）

而这一篇文章探讨的内容，就是关于k8s 调度器的不足之处的。我们知道，对于内存memory，cpu的划分，k8s已经做得很不错了，但如果要追求更多，更细致的划分，就需要自己动手写scheduler。

而scheduler调度是需要信息的。我们这一次准备搞一个能显示我们想要的信息的daemon，给自己写的scheduler使用。

由于k8s的底层接口全部都是go编写的，我们也必须用go来写这个daemon，这就不得不学习一下需要的相关的依赖

在Go 1.11之后，Go 大力推动Go Modules，关于这个的讨论非常多，也非常激烈，在go项目中如何使用Go Modules也是需要了解的。

Cron

https://godoc.org/github.com/robfig/cron

import "github.com/robfig/cron"

Cron是一个相当优秀的处理定时任务的工具，我们如果说要定时采集Pod的数据，可少不了它。

c := cron.New()

c.AddFunc("30 * * * *", func() { fmt.Println("Every hour on the half hour") })

c.AddFunc("30 3-6,20-23 * * *", func() { fmt.Println(".. in the range 3-6am, 8-11pm") })

c.AddFunc("CRON_TZ=Asia/Tokyo 30 04 * * *", func() { fmt.Println("Runs at 04:30 Tokyo time every day") })

c.AddFunc("@hourly", func() { fmt.Println("Every hour, starting an hour from now") })

c.AddFunc("@every 1h30m", func() { fmt.Println("Every hour thirty, starting an hour thirty from now") })

c.Start()

这里的符号特别有意思：

Field name | Mandatory? | Allowed values | Allowed special characters

---------- | ---------- | -------------- | --------------------------

Minutes | Yes | 0-59 | * / , -

Hours | Yes | 0-23 | * / , -

Day of month | Yes | 1-31 | * / , - ?

Month | Yes | 1-12 or JAN-DEC | * / , -

Day of week | Yes | 0-6 or SUN-SAT | * / , - ?

下面是官方的解释：

Asterisk ( * )

The asterisk indicates that the cron expression will match for all values of the

field; e.g., using an asterisk in the 5th field (month) would indicate every

month.

Slash ( / )

Slashes are used to describe increments of ranges. For example 3-59/15 in the

1st field (minutes) would indicate the 3rd minute of the hour and every 15

minutes thereafter. The form "*\/..." is equivalent to the form "first-last/...",

that is, an increment over the largest possible range of the field. The form

"N/..." is accepted as meaning "N-MAX/...", that is, starting at N, use the

increment until the end of that specific range. It does not wrap around.

Comma ( , )

Commas are used to separate items of a list. For example, using "MON,WED,FRI" in

the 5th field (day of week) would mean Mondays, Wednesdays and Fridays.

Hyphen ( - )

Hyphens are used to define ranges. For example, 9-17 would indicate every

hour between 9am and 5pm inclusive.

Question mark ( ? )

Question mark may be used instead of '*' for leaving either day-of-month or

day-of-week blank

也可以参考下文

https://www.jianshu.com/p/fd3dda663953

https://www.cnblogs.com/jiangz222/p/12345566.html

Zap

https://github.com/uber-go/zap

import "go.uber.org/zap"

Zap是一个高性能日志库，拥有非常多优秀的特性，参考

https://www.liwenzhou.com/posts/Go/zap/

Nvml

nvml是nvidia基于自家NVIDIA System Management Interface (nvidia-smi)的获取显卡信息的接口，支持go语言

具体参考

https://github.com/NVIDIA/gpu-monitoring-tools/blob/master/bindings/go/samples/nvml/deviceInfo/main.go

Client-go

client-go提供了容器内应用获取pod信息的途径。

https://github.com/kubernetes/client-go

例子

https://github.com/kubernetes/client-go/tree/master/examples/in-cluster-client-configuration

有了前面这些技术铺垫，我们就可以着手开始写我们自己的gpu scheduler了

具体代码可以看这里：

https://github.com/hyc3z/Omaticaya

核心思想：

利用Cron创建一个轮询任务，每隔一段时间轮询，更新缓存中的gpu信息，然后转换成node tag标注在node上。转换/标注这一块速度很快，大约10ms就能完成。但轮询因为nvml的关系，少则100ms，多则500ms不等。实时性相对还较弱。

初步实现：cron+nvml采集信息+zap实时日志+k8s client-go更新tag

后续计划:Dcgm+prometheus+influxdb存储信息，glusterfs持久化

初步实现最初的思想，这里还发现了一个坑：

vgo/Go modules

笔者这里也查阅了很多vgo相关的信息，发现go的包管理也是走了不少的弯路。今天，vgo是Russ Cox亲手扶上宝座的Go包管理，但它真的好用吗？

首先，第一个问题，就是包名和源名必须相同。也就是，如果你的包是发布在github.com/username/packagename下的，你在本地的module名必须也得是github.com/username/xxx。这对于github的用户还相对友好，对于gitlab、其他平台的用户，真的相当蛋疼。

第二个问题，包版本依赖。笔者在调用k8s.io/client-go时，发现k8s接口对不上，原来在k8s上最新的tag是kubernetes-1.18.0 而go.mod居然只允许v开头的版本号(v1.0.0等），输入这种tag直接报错。真的蛋疼死了……后面直接cmd输命令，找依赖，找关系，感觉本来应该vgo干的事现在全让程序员擦屁股。

第三个问题，本地模组的导入。在vgo中，由于不支持GOPATH，导入本地模组也必须是使用项目托管上的绝对路径。

笔者作为一个学习golang的新人，没有用过Go dep/vendor，听说是另一个社区版本的包管理，但vgo究竟怎样，见仁见智吧。

Docker build

在运行完go build . 之后，我们需要使用docker build -t 进行镜像的制作，这里需要编写Dockerfile文件

FROM nvidia/cuda:10.1-base

WORKDIR /

COPY Omaticaya /usr/local/bin

CMD ["Omaticaya"]

然后docker push 把镜像上传，供k8s使用。

目前这个版本已经可以运行，并采集gpu相关信息，但性能还有待优化。

k8s学习记录（gpu信息采集）

Cron

Zap

Nvml

Client-go

vgo/Go modules

Docker build

你可能感兴趣的:(k8s学习记录（gpu信息采集）)