WaltonWang

Kubernetes Job Controller源码分析

Author: [email protected], WaltonWang@csdn

摘要：对于一般用户，学习和使用Job，官方文档就足够了，但如果你是个变态，那你总会去想，Job Controller和Deployment Controller在管理Pod上，除了RestartPolicy等不同外，还有哪些不同呢？其实是因为最近在搞TensorFlow on Kubernetes项目，想通过Job映射分布式TensorFlow中的worker task，达到训练完数据，自动回收资源资源等目的。本博文通过Job Controller代码分析其内部主要流程。

实现流程图

New JobController

type JobController struct {
    kubeClient clientset.Interface
    podControl controller.PodControlInterface

    // To allow injection of updateJobStatus for testing.
    updateHandler func(job *batch.Job) error
    syncHandler   func(jobKey string) (bool, error)
    // podStoreSynced returns true if the pod store has been synced at least once.
    // Added as a member to the struct to allow injection for testing.
    podStoreSynced cache.InformerSynced
    // jobStoreSynced returns true if the job store has been synced at least once.
    // Added as a member to the struct to allow injection for testing.
    jobStoreSynced cache.InformerSynced

    // A TTLCache of pod creates/deletes each rc expects to see
    expectations controller.ControllerExpectationsInterface

    // A store of jobs
    jobLister batchv1listers.JobLister

    // A store of pods, populated by the podController
    podStore corelisters.PodLister

    // Jobs that need to be updated
    queue workqueue.RateLimitingInterface

    recorder record.EventRecorder
}


func NewJobController(podInformer coreinformers.PodInformer, jobInformer batchinformers.JobInformer, kubeClient clientset.Interface) *JobController {
    eventBroadcaster := record.NewBroadcaster()
    eventBroadcaster.StartLogging(glog.Infof)
    // TODO: remove the wrapper when every clients have moved to use the clientset.
    eventBroadcaster.StartRecordingToSink(&v1core.EventSinkImpl{Interface: v1core.New(kubeClient.CoreV1().RESTClient()).Events("")})

    if kubeClient != nil && kubeClient.CoreV1().RESTClient().GetRateLimiter() != nil {
        metrics.RegisterMetricAndTrackRateLimiterUsage("job_controller", kubeClient.CoreV1().RESTClient().GetRateLimiter())
    }

    jm := &JobController{
        kubeClient: kubeClient,
        podControl: controller.RealPodControl{
            KubeClient: kubeClient,
            Recorder:   eventBroadcaster.NewRecorder(scheme.Scheme, v1.EventSource{Component: "job-controller"}),
        },
        expectations: controller.NewControllerExpectations(),
        queue:        workqueue.NewNamedRateLimitingQueue(workqueue.NewItemExponentialFailureRateLimiter(DefaultJobBackOff, MaxJobBackOff), "job"),
        recorder:     eventBroadcaster.NewRecorder(scheme.Scheme, v1.EventSource{Component: "job-controller"}),
    }

    jobInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
        AddFunc:    jm.enqueueController,
        UpdateFunc: jm.updateJob,
        DeleteFunc: jm.enqueueController,
    })
    jm.jobLister = jobInformer.Lister()
    jm.jobStoreSynced = jobInformer.Informer().HasSynced

    podInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
        AddFunc:    jm.addPod,
        UpdateFunc: jm.updatePod,
        DeleteFunc: jm.deletePod,
    })
    jm.podStore = podInformer.Lister()
    jm.podStoreSynced = podInformer.Informer().HasSynced

    jm.updateHandler = jm.updateJobStatus
    jm.syncHandler = jm.syncJob

    return jm
}

构造JobController，并初始化相关数据，比如rate limiter queue;
watch pod and job object;
注册podInformer的add/del/update EventHandler；
注册jobInformer的add/del/update EventHandler;
注册updataHandler为updateJobStatus，用来更新Job状态；
注册syncHandler为syncJob，用来进行处理queue中的Job；

JobController Run

// Run the main goroutine responsible for watching and syncing jobs.
func (jm *JobController) Run(workers int, stopCh <-chan struct{}) {
    defer utilruntime.HandleCrash()
    defer jm.queue.ShutDown()

    glog.Infof("Starting job controller")
    defer glog.Infof("Shutting down job controller")

    if !controller.WaitForCacheSync("job", stopCh, jm.podStoreSynced, jm.jobStoreSynced) {
        return
    }

    for i := 0; i < workers; i++ {
        go wait.Until(jm.worker, time.Second, stopCh)
    }

    <-stopCh
}

// worker runs a worker thread that just dequeues items, processes them, and marks them done.
// It enforces that the syncHandler is never invoked concurrently with the same key.
func (jm *JobController) worker() {
    for jm.processNextWorkItem() {
    }
}

func (jm *JobController) processNextWorkItem() bool {
    key, quit := jm.queue.Get()
    if quit {
        return false
    }
    defer jm.queue.Done(key)

    forget, err := jm.syncHandler(key.(string))
    if err == nil {
        if forget {
            jm.queue.Forget(key)
        }
        return true
    }

    utilruntime.HandleError(fmt.Errorf("Error syncing job: %v", err))
    jm.queue.AddRateLimited(key)

    return true
}

WaitForCacheSync等待jobController cache同步；
启动5个goruntine，每个协程分别执行worker，每个worker执行完后等待1s，继续执行，如此循环；
worker负责从从queue中get job key，对每个job，调用syncJob进行同步，如果syncJob成功，则forget the job（其实就是让rate limiter 停止tracking it），否则将该key再次加入到queue中，等待下次sync。

syncJob

// syncJob will sync the job with the given key if it has had its expectations fulfilled, meaning
// it did not expect to see any more of its pods created or deleted. This function is not meant to be invoked
// concurrently with the same key.
func (jm *JobController) syncJob(key string) (bool, error) {
    startTime := time.Now()
    defer func() {
        glog.V(4).Infof("Finished syncing job %q (%v)", key, time.Now().Sub(startTime))
    }()

    ns, name, err := cache.SplitMetaNamespaceKey(key)
    if err != nil {
        return false, err
    }
    if len(ns) == 0 || len(name) == 0 {
        return false, fmt.Errorf("invalid job key %q: either namespace or name is missing", key)
    }
    sharedJob, err := jm.jobLister.Jobs(ns).Get(name)
    if err != nil {
        if errors.IsNotFound(err) {
            glog.V(4).Infof("Job has been deleted: %v", key)
            jm.expectations.DeleteExpectations(key)
            return true, nil
        }
        return false, err
    }
    job := *sharedJob

    // if job was finished previously, we don't want to redo the termination
    if IsJobFinished(&job) {
        return true, nil
    }

    // retrieve the previous number of retry
    previousRetry := jm.queue.NumRequeues(key)

    // Check the expectations of the job before counting active pods, otherwise a new pod can sneak in
    // and update the expectations after we've retrieved active pods from the store. If a new pod enters
    // the store after we've checked the expectation, the job sync is just deferred till the next relist.
    jobNeedsSync := jm.expectations.SatisfiedExpectations(key)

    pods, err := jm.getPodsForJob(&job)
    if err != nil {
        return false, err
    }

    activePods := controller.FilterActivePods(pods)
    active := int32(len(activePods))
    succeeded, failed := getStatus(pods)
    conditions := len(job.Status.Conditions)
    // job first start
    if job.Status.StartTime == nil {
        now := metav1.Now()
        job.Status.StartTime = &now
        // enqueue a sync to check if job past ActiveDeadlineSeconds
        if job.Spec.ActiveDeadlineSeconds != nil {
            glog.V(4).Infof("Job %s have ActiveDeadlineSeconds will sync after %d seconds",
                key, *job.Spec.ActiveDeadlineSeconds)
            jm.queue.AddAfter(key, time.Duration(*job.Spec.ActiveDeadlineSeconds)*time.Second)
        }
    }

    var manageJobErr error
    jobFailed := false
    var failureReason string
    var failureMessage string

    jobHaveNewFailure := failed > job.Status.Failed

    // check if the number of failed jobs increased since the last syncJob
    if jobHaveNewFailure && (int32(previousRetry)+1 > *job.Spec.BackoffLimit) {
        jobFailed = true
        failureReason = "BackoffLimitExceeded"
        failureMessage = "Job has reach the specified backoff limit"
    } else if pastActiveDeadline(&job) {
        jobFailed = true
        failureReason = "DeadlineExceeded"
        failureMessage = "Job was active longer than specified deadline"
    }

    if jobFailed {
        errCh := make(chan error, active)
        jm.deleteJobPods(&job, activePods, errCh)
        select {
        case manageJobErr = <-errCh:
            if manageJobErr != nil {
                break
            }
        default:
        }

        // update status values accordingly
        failed += active
        active = 0
        job.Status.Conditions = append(job.Status.Conditions, newCondition(batch.JobFailed, failureReason, failureMessage))
        jm.recorder.Event(&job, v1.EventTypeWarning, failureReason, failureMessage)
    } else {
        if jobNeedsSync && job.DeletionTimestamp == nil {
            active, manageJobErr = jm.manageJob(activePods, succeeded, &job)
        }
        completions := succeeded
        complete := false
        if job.Spec.Completions == nil {
            // This type of job is complete when any pod exits with success.
            // Each pod is capable of
            // determining whether or not the entire Job is done.  Subsequent pods are
            // not expected to fail, but if they do, the failure is ignored.  Once any
            // pod succeeds, the controller waits for remaining pods to finish, and
            // then the job is complete.
            if succeeded > 0 && active == 0 {
                complete = true
            }
        } else {
            // Job specifies a number of completions.  This type of job signals
            // success by having that number of successes.  Since we do not
            // start more pods than there are remaining completions, there should
            // not be any remaining active pods once this count is reached.
            if completions >= *job.Spec.Completions {
                complete = true
                if active > 0 {
                    jm.recorder.Event(&job, v1.EventTypeWarning, "TooManyActivePods", "Too many active pods running after completion count reached")
                }
                if completions > *job.Spec.Completions {
                    jm.recorder.Event(&job, v1.EventTypeWarning, "TooManySucceededPods", "Too many succeeded pods running after completion count reached")
                }
            }
        }
        if complete {
            job.Status.Conditions = append(job.Status.Conditions, newCondition(batch.JobComplete, "", ""))
            now := metav1.Now()
            job.Status.CompletionTime = &now
        }
    }

    forget := false
    // no need to update the job if the status hasn't changed since last time
    if job.Status.Active != active || job.Status.Succeeded != succeeded || job.Status.Failed != failed || len(job.Status.Conditions) != conditions {
        job.Status.Active = active
        job.Status.Succeeded = succeeded
        job.Status.Failed = failed

        if err := jm.updateHandler(&job); err != nil {
            return false, err
        }

        if jobHaveNewFailure && !IsJobFinished(&job) {
            // returning an error will re-enqueue Job after the backoff period
            return false, fmt.Errorf("failed pod(s) detected for job key %q", key)
        }

        forget = true
    }

    return forget, manageJobErr
}

从Indexer中查找指定的Job是否存在，如果不存在，则从expectations中删除该job，流程结束返回true。否则继续下面流程。
根据JobCondition Complete or Failed判断Job是否Finished，如果Finished，则流程结束返回true，否则继续下面流程。
调用SatisfiedExpectations，如果ControlleeExpectations中待add和del都<=0，或者expectations已经超过5分钟没更新过了，则返回jobNeedsSync=true，表示需要进行一次manageJob了。
对于那些第一次启动的jobs (StartTime==nil), 需要把设置StartTime，并且如果ActiveDeadlineSeconds不为空，则经过ActiveDeadlineSeconds后再次把该job加入到queue中进行sync。
获取该job管理的所有pods，过滤出activePods，计算出actived,successed,failed pods的数量。如果failed > job.Status.Failed，说明该job又有新failed Pods了，则jobHaveNewFailure为true。
如果jobHaveNewFailure，并且queue记录的该job retry次数加1，比job.Spec.BackoffLimit(默认为6)，则表示该job BackoffLimitExceeded，jobFailed。如果job StartTime到现在为止的历时>=ActiveDeadlineSeconds，则表示该job DeadlineExceeded，jobFailed。
如果jobFailed，则用sync.WaitGroup并发等待删除所有的前面过滤出来的activePods，删除成功，则failed += acitve, active = 0, 并设置Condition Failed为true。
如果job not failed, jobNeedSync为true，并且job的DeletionTimestamp为空（没有标记为删除），则调用manageJob对Job管理的pods根据复杂的策略进行add or del。
如果job not failed且job.Spec.Completions为nil，表示This type of job is complete when any pod exits with success。因此如果succeeded > 0 && active == 0，则表示job completed。
如果如果job not failed且job.Spec.Completions不为nil，表示This type of job signals success by having that number of successes。因此如果succeeded >= job.Spec.Completions，则表示job completed。
如果job completed，则更新其Conditions Complete为true，并设置CompletionTime。
接下来invoke updateJobStatus更新etcd中job状态，如果更新失败，则返回false，该job将再次加入queue。如果jobHaveNewFailure为true，并且Job Condition显示该Job not Finished，则返回false，该job将再次加入queue。

manageJob

// manageJob is the core method responsible for managing the number of running
// pods according to what is specified in the job.Spec.
// Does NOT modify .
func (jm *JobController) manageJob(activePods []*v1.Pod, succeeded int32, job *batch.Job) (int32, error) {
    var activeLock sync.Mutex
    active := int32(len(activePods))
    parallelism := *job.Spec.Parallelism
    jobKey, err := controller.KeyFunc(job)
    if err != nil {
        utilruntime.HandleError(fmt.Errorf("Couldn't get key for job %#v: %v", job, err))
        return 0, nil
    }

    var errCh chan error
    if active > parallelism {
        diff := active - parallelism
        errCh = make(chan error, diff)
        jm.expectations.ExpectDeletions(jobKey, int(diff))
        glog.V(4).Infof("Too many pods running job %q, need %d, deleting %d", jobKey, parallelism, diff)
        // Sort the pods in the order such that not-ready < ready, unscheduled
        // < scheduled, and pending < running. This ensures that we delete pods
        // in the earlier stages whenever possible.
        sort.Sort(controller.ActivePods(activePods))

        active -= diff
        wait := sync.WaitGroup{}
        wait.Add(int(diff))
        for i := int32(0); i < diff; i++ {
            go func(ix int32) {
                defer wait.Done()
                if err := jm.podControl.DeletePod(job.Namespace, activePods[ix].Name, job); err != nil {
                    defer utilruntime.HandleError(err)
                    // Decrement the expected number of deletes because the informer won't observe this deletion
                    glog.V(2).Infof("Failed to delete %v, decrementing expectations for job %q/%q", activePods[ix].Name, job.Namespace, job.Name)
                    jm.expectations.DeletionObserved(jobKey)
                    activeLock.Lock()
                    active++
                    activeLock.Unlock()
                    errCh <- err
                }
            }(i)
        }
        wait.Wait()

    } else if active < parallelism {
        wantActive := int32(0)
        if job.Spec.Completions == nil {
            // Job does not specify a number of completions.  Therefore, number active
            // should be equal to parallelism, unless the job has seen at least
            // once success, in which leave whatever is running, running.
            if succeeded > 0 {
                wantActive = active
            } else {
                wantActive = parallelism
            }
        } else {
            // Job specifies a specific number of completions.  Therefore, number
            // active should not ever exceed number of remaining completions.
            wantActive = *job.Spec.Completions - succeeded
            if wantActive > parallelism {
                wantActive = parallelism
            }
        }
        diff := wantActive - active
        if diff < 0 {
            utilruntime.HandleError(fmt.Errorf("More active than wanted: job %q, want %d, have %d", jobKey, wantActive, active))
            diff = 0
        }
        jm.expectations.ExpectCreations(jobKey, int(diff))
        errCh = make(chan error, diff)
        glog.V(4).Infof("Too few pods running job %q, need %d, creating %d", jobKey, wantActive, diff)

        active += diff
        wait := sync.WaitGroup{}

        // Batch the pod creates. Batch sizes start at SlowStartInitialBatchSize
        // and double with each successful iteration in a kind of "slow start".
        // This handles attempts to start large numbers of pods that would
        // likely all fail with the same error. For example a project with a
        // low quota that attempts to create a large number of pods will be
        // prevented from spamming the API service with the pod create requests
        // after one of its pods fails.  Conveniently, this also prevents the
        // event spam that those failures would generate.
        for batchSize := int32(integer.IntMin(int(diff), controller.SlowStartInitialBatchSize)); diff > 0; batchSize = integer.Int32Min(2*batchSize, diff) {
            errorCount := len(errCh)
            wait.Add(int(batchSize))
            for i := int32(0); i < batchSize; i++ {
                go func() {
                    defer wait.Done()
                    err := jm.podControl.CreatePodsWithControllerRef(job.Namespace, &job.Spec.Template, job, metav1.NewControllerRef(job, controllerKind))
                    if err != nil && errors.IsTimeout(err) {
                        // Pod is created but its initialization has timed out.
                        // If the initialization is successful eventually, the
                        // controller will observe the creation via the informer.
                        // If the initialization fails, or if the pod keeps
                        // uninitialized for a long time, the informer will not
                        // receive any update, and the controller will create a new
                        // pod when the expectation expires.
                        return
                    }
                    if err != nil {
                        defer utilruntime.HandleError(err)
                        // Decrement the expected number of creates because the informer won't observe this pod
                        glog.V(2).Infof("Failed creation, decrementing expectations for job %q/%q", job.Namespace, job.Name)
                        jm.expectations.CreationObserved(jobKey)
                        activeLock.Lock()
                        active--
                        activeLock.Unlock()
                        errCh <- err
                    }
                }()
            }
            wait.Wait()
            // any skipped pods that we never attempted to start shouldn't be expected.
            skippedPods := diff - batchSize
            if errorCount < len(errCh) && skippedPods > 0 {
                glog.V(2).Infof("Slow-start failure. Skipping creation of %d pods, decrementing expectations for job %q/%q", skippedPods, job.Namespace, job.Name)
                active -= skippedPods
                for i := int32(0); i < skippedPods; i++ {
                    // Decrement the expected number of creates because the informer won't observe this pod
                    jm.expectations.CreationObserved(jobKey)
                }
                // The skipped pods will be retried later. The next controller resync will
                // retry the slow start process.
                break
            }
            diff -= batchSize
        }
    }

    select {
    case err := <-errCh:
        // all errors have been reported before, we only need to inform the controller that there was an error and it should re-try this job once more next time.
        if err != nil {
            return active, err
        }
    default:
    }

    return active, nil
}

如果active > job.Spec.Parallelism, 表示要scale down：
- 计算active与parallelism的差值diff，修改ControllerExpectations中该job的dels为diff，表示要删除diff这么多的pod。
- 计算active与parallelism的差值diff，修改ControllerExpectations中该job的dels为diff，表示要删除diff这么多的pod。
- 将activePods中的Pods按照not-ready < ready, unscheduled < scheduled, pending < running进行排序，确保先删除stage越早的pods。
- 更新active (active减去diff），用sync.WaitGroup并发等待删除etcd中那些Pods。如果删除某个Pod失败，active要加1，expectations中dels要减1.
- 返回active
如果active < job.Spec.Parallelism，
表示要scale up：
- 如果job.Spec.Completions为nil，且succeeded大于0，则diff设为0；如果job.Spec.Completions为nil，但successed = 0，则diff为 parallelism-active；如果job.Spec.Completions不为nil，则diff为max(job.Spec.Completions - succeeded，parallelim) - active；
- 修改ControllerExpectations中该job的adds为diff，表示要新增diff这么多的pod。
- 更新active (active加上diff），用sync.WaitGroup分批的创建Pods，第一批创建1个(代码写死SlowStartInitialBatchSize = 1)，第二批创建2，然后4,8,16…这样下去，但是每次不能超过diff的值。每一批创建pod后，注意更新diff的值（减去batchsize）。如果某一批创建过程Pods中存在失败情况，则更新active和expectations中adds，且不进行后续未启动的批量创建pods行为。
如果active == job.Spec.Parallelism，返回active。

总结

关于Job工作原理及配置，请直接阅读官方文档 jobs run to completion,那里有关于job配置 .spec.completions,.spec.parallelism,spec.activeDeadlineSeconds的使用说明，但是并没有把真正内部怎么工作的讲清楚，本博文就是希望能把这些东西讲清楚。

K8S中Pod控制器之CronJob(CJ)控制器元气满满的热码式 kubernetes 容器云原生
CronJob控制器是Kubernetes中用于周期性执行任务的一种控制器，它基于Job控制器来创建和管理作业。以下是CronJob的一些关键特点：周期性调度：CronJob允许您定义一个基于时间的调度，类似于Linux的cron工具，来周期性地执行任务。时间点触发：CronJob根据指定的时间表（cron表达式）触发，可以精确到分钟。一次性或重复执行：尽管CronJob主要用于重复性任务，但它也
在 Kubernetes 上快速安装 KubeSphere v4.1.2 喝醉酒的小白 K8s kubernetes 容器云原生
目录标题安装文档配置repo安装使用插件安装文档在Kubernetes上快速安装KubeSphere配置repoexporthttps_proxy=10.10.x.x:7890helmrepoaddstablehttps://charts.helm.sh/stablehelmrepoupdate安装helmupgrade--install-nkubesphere-system--create-na
云原生周刊：K8s 生产环境架构设计及成本分析 KubeSphere 云原生 k8s 容器平台 kubesphere 云计算
开源项目推荐KubeZoneNetKubeZoneNet旨在帮助监控和优化Kubernetes集群中的跨可用区（Cross-Zone）网络流量。这个项目提供了一种简便的方式来跟踪和分析Kubernetes集群中跨不同可用区的通信，帮助用户优化集群的网络架构、提高资源利用效率并减少网络延迟。通过实时监控和数据分析，KubeZoneNet能有效地识别跨可用区的网络瓶颈，并提供改进建议，以支持Kuber
我的软件架构师——Java 职位面试经历。小蜗牛慢慢爬行 java 面试开发语言职场和发展后端 spring boot spring
最近，我参加了一家领先的服务型公司的软件架构师（Java）职位的面试。我在这里分享了一些面试官问我的问题。我只列出了与Java相关的问题，因为本文主要关注Java。面试官问我有关AWS、Docker、Kubernetes、Kafka、ElasticSearch、SQL/NoSQL和设计模式的问题。ClassNotFoundException和NoClassDefFoundError有什么区别？当您
Kubernetes(k8s) 架构设计 boonya #k8s kubernetes 容器云原生
目录节点管理节点自注册手动节点管理节点状态地址状况容量与可分配信息节点控制器节点容量节点拓扑节点体面关闭接下来控制面到节点通信节点到控制面控制面到节点API服务器到kubeletapiserver到节点、Pod和服务SSH隧道Konnectivity服务控制器控制器模式通过API服务器来控制直接控制期望状态与当前状态设计运行控制器的方式接下来云控制器管理器的基础概念设计云控制器管理器的功能节点控制
Kubernetes架构原则和对象设计（二） grahamzhu 云原生学习专栏 kubernetes 架构容器集群搭建 API设计云计算 kubelet
云原生学习路线导航页（持续更新中）kubernetes学习系列快捷链接Kubernetes架构原则和对象设计（一）Kubernetes常见问题解答本文从云计算架构发展入手，详细分析了kubernetes的生态系统、设计理念、分层架构、API设计原则、架构设计原则等，并介绍了使用kubelet+staticPod拉起集群的过程1.云计算的传统分类云计算出现之前，对于任何企业，想要搭建自己的服务，需要
Linux Kubernetes Helm之使用helm部署ingress-nginx 阿然A
kubernetesHelm之使用helm部署ingress-nginx一、部署二、测试三、部署加密访问部署前将之前部署的ingress-nginx删除：[root@server1helm]#kubectldeletenamespacesingress-nginxnamespace"ingress-nginx"deleted[root@server1nginx-ingress]#kubectlge
github go star前50的项目可乐泡枸杞· github golang 开发语言开源软件开源
以下是按星标数排序的前50个Go语言的GitHub仓库。1.avelino/awesome-go星标数:126619简介:AcuratedlistofawesomeGoframeworks,librariesandsoftware语言:Go项目Logo:2.golang/go星标数:121848简介:TheGoprogramminglanguage语言:Go项目Logo:3.kubernetes/
kubernetes 集群搭建(二进制方式) 難釋懷 kubernetes 容器云原生
Kubernetes作为当今最流行的容器编排平台，提供了强大的功能来管理和扩展容器化应用。除了使用kubeadm等工具简化集群的创建过程外，直接通过二进制文件安装Kubernetes组件也是一种常见的方法。这种方式给予用户更多的控制权，并且适用于那些希望深入理解Kubernetes内部工作原理的人。本文将详细介绍如何通过二进制方式搭建一个功能齐全的Kubernetes集群，并分享一些实用技巧和注意
微服务学习-Nacos简介 fox_lht java 分布式架构 spring 微服务 java microservices
微服务学习-Nacos简介用于服务管理中心和配置管理中心。实现动态服务发现、服务配置、服务元数据及流量管理。支持的服务KubernetesServicegRPC&DubboRPCServiceSpringCloudRESTfulService关键特性服务管理支持基于DNS和基于RPC的服务发现服务提供者注册（原生SDK、OpenApi、独立的AgentTODO）服务消费者查找和发现服务（DNSTO
使用kubeadm搭建kubernetes单机master，亲测无异常(1) 2301_76238237 程序员 kubernetes 容器云原生
sudocat/sys/class/dmi/id/product_uuid//每台机器的uuid不能相同ifconfig-a//ip不能相同2.开放端口|协议|方向|端口范围|作用|使用者||—|—|—|—|—||TCP|入站|6443|KubernetesAPI服务器|所有组件||TCP|入站|2379-2380|etcd服务器客户端API|kube-apiserver,etcd||TCP|入站
云原生周刊：K8s 生产环境架构设计及成本分析云计算
开源项目推荐KubeZoneNetKubeZoneNet旨在帮助监控和优化Kubernetes集群中的跨可用区（Cross-Zone）网络流量。这个项目提供了一种简便的方式来跟踪和分析Kubernetes集群中跨不同可用区的通信，帮助用户优化集群的网络架构、提高资源利用效率并减少网络延迟。通过实时监控和数据分析，KubeZoneNet能有效地识别跨可用区的网络瓶颈，并提供改进建议，以支持Kuber
Anthropic运维工程师的IT基础设施总结清单（上） CloudPilotAI IT基础设施运维 kubernetes 工程师
Karpenter开源地址：https://github.com/kubernetes-sigs/karpenter本文由Anthropic工程师JackLindamood撰写，分享了他之前在一家初创公司中负责IT基础设施的经验，包括从中吸取的教训和一些最佳实践。过去四年里，我负责了一家初创公司的基础设施建设工作。这家公司当时正寻求快速扩大规模。从一开始，我们就做出了一些核心决策，这些决策在过去四
npm install CERT_HAS_EXPIRED解决方法奔跑吧邓邓子常见问题解答（FAQ）npm 前端 node.js expired npm install
提示：“奔跑吧邓邓子”的常见问题专栏聚焦于各类技术领域常见问题的解答。涵盖操作系统（如CentOS、Linux等）、开发工具（如AndroidStudio）、服务器软件（如Zabbix、JumpServer、RocketMQ等）以及远程桌面、代码克隆等多种场景。针对如远程桌面无法复制粘贴、Kubernetes报错、自启动报错、各类软件安装报错、内存占用问题、网络连接问题等提供了详细的问题描述与有效
Sealos 将计算节点加入 kubeadm 安装的 Kubernetes 集群 ivwdcwso 运维 kubernetes 容器云原生 k8s sealos
引言Kubernetes是云原生应用的核心平台，而kubeadm是官方推荐的Kubernetes集群部署工具。然而，随着集群规模的扩大，手动管理节点变得越来越复杂。Sealos作为一款以Kubernetes为内核的云操作系统，提供了简单高效的节点管理功能，能够轻松地将计算节点加入到现有的Kubernetes集群中。本文将详细介绍如何在使用kubeadm安装主节点的基础上，使用Sealos将计算节点
K8S中Pod控制器之Horizontal Pod Autoscaler(HPA)控制器元气满满的热码式 kubernetes 容器云原生
HorizontalPodAutoscaler(HPA)控制器HorizontalPodAutoscaler（HPA）是Kubernetes中用于自动根据当前的负载情况，自动调整Pod数量的一种控制器。HPA能够根据CPU使用率、内存使用量或其他选择的度量指标来自动扩展Pod的数量，以确保应用的性能。HPA可以获取每个Pod利用率，然后和HPA中定义的指标进行对比，同时计算出需要伸缩的具体值，最后
正式开源，Doris Operator 支持高效 Kubernetes 容器化部署方案 SelectDB技术团队 kubernetes 容器化部署数据仓库云原生开源
容器化凭借其灵活性、跨平台性、自动化管理和极致弹性，吸引了众多企业的关注。一些企业希望将ApacheDoris容器化部署，以实现高效的资源利用与部署迭代。Kubernetes提供的编排和管理功能，能完成大规模容器部署，但Kubernetes自身的复杂性也导致众多企业面临部署复杂、运维困难、使用难度高等挑战。为满足用户在Kubernetes平台上对Doris的高效部署和运维要求，飞轮科技推出了Dor
通俗易懂 serverless 架构、微服务架构和云原生架构，并简单代码 Ai君臣架构架构云原生 serverless
文章目录1serverless架构、微服务架构和云原生架构区别1.Serverless架构示例：AWSLambda+APIGateway2.微服务架构示例：Flask微服务3.云原生架构示例：Docker和Kubernetes2Kubernetes中管理多个副本和流量两个关键组件1.Deployment2.Service负载均衡流量管理1serverless架构、微服务架构和云原生架构区别别用代码
Scaleph：基于Kubernetes的开放式数据平台尤淞渊
Scaleph：基于Kubernetes的开放式数据平台scalephOpendataplatformbasedonFlinkandKubernetes,supportsweb-uiclick-and-dropdataintegrationwithSeaTunnelbackendedbyFlinkengine,flinkonlinesqldevelopmentbackendedbyFlinkSql
【赵渝强老师】Kubernetes中Pod的探针
在K8s集群中，当Pod处于运行状态时，kubelet通过使用探针（Probe）对容器的健康状态执行检查和诊断。Kubernetes支持的三种类型的探针。视频讲解如下：https://www.bilibili.com/video/BV1V1tFenEXL/?aid=113130512390...下面分别进行介绍。livenessProbe（存活探针）该类型的探针将检查Pod中的容器是否正在运行。如
k8s mysql数据目录挂载_【kubernetes】k8s数据卷,pod挂载本地路径九罭之魚 k8s mysql数据目录挂载
环境：Linux服务器配置挂载目录思路：在部署pod的节点(宿主机)配置同样的挂载路径到一个固定的服务器(目标服务器)，这样不管pod在哪里跑，文件的保存路径都是不变的1.安装sshfsyuminstall-ysshfs2.添加ssh认证把节点的ssh公钥拷贝到目标服务器的~/.ssh/authorizedkeys中3.挂载目录在节点服务器执行：sshfsUSER@目标服务器IP:/path/to
K8s组件全解析，你需要知道的一切秘密 master_chenchengg 能力提升面试宝典技术 IT信息化
K8s组件全解析，你需要知道的一切秘密K8s架构概览APIServer：K8s的门面担当控制平面组件详解etcd：高可用的数据存储基石工作负载管理与调度策略网络模型与服务发现机制存储编排与持久化解决方案日志监控与故障排查工具链K8s架构概览Kubernetes（简称K8s）作为现代云原生应用部署的主流平台，其核心在于简化容器化应用的管理和扩展。K8s的基本架构围绕着集群、节点和Pod等概念构建。一
mac系统docker安装k8s 吕海洋操作系统运维 k8s macos docker k8s
一、docker升级到最新版本，否则有可能安装失败二、打开docker配置页面，选择kubernetes，勾选EnableKubernetes等待安装完成也可以通过国内原下载好Kubernetes镜像后在勾选，版本一定要对应三、查看Kubernetes是否成功启动四、安装dashboard可选先查看k8s版本，在docker配置页面Kubernetes里可以看到，v1.22.5访问官方文档选择对应
docker 与K8s的恩怨情仇慧香一格 docker K8s 容器 docker kubernetes 容器
Docker和Kubernetes（通常简称为K8s）是容器化和容器编排领域的两大重要工具，它们在技术生态中扮演着不同的角色，并且有着密切的关系。虽然有时候人们会讨论它们之间的关系，但实际上它们更多的是互补而不是对立。下面详细探讨Docker与Kubernetes的关系及其各自的优劣势。Docker什么是Docker？Docker是一个开源的平台，用于自动化应用程序的部署、扩展和管理。它允许开发者
Kubernetes Service负载均衡机制蓝颜～岁月 kubernetes 负载均衡运维
当一个Service对象在Kubernetes集群中被定义出来时，集群内的客户端应用就可以通过服务IP访问到具体的Pod容器提供的服务了。从服务iP到Pod的负载均衡机制，则是由每个Node上的kube-proxy负责实现的，本节对kube-proxy的代理模式，会话保持机制和基于拓扑感知的服务路由机制（EndpointSlices）进行说明。kube-proxy的代理模式目前kube-proxy
Kubernetes--Service负载均衡机制 GaoChuang_ Kubernetes kubernetes 负载均衡
一、负责均衡机制当一个Service对象在Kubernetes集群中被定义，集群内的客户端应用就可以通过服务IP访问到具体的Pod容器提供的服务器了。从服务IP到后端Pod的负载均衡机制，由每个Node上的kube-proxy负责实现。二、kube-proxy的代理模式kube-proxy提供了代理模式(通过启动参数--proxy-mode设置)userspace模式：用户空间模式，由kube-p
使用 Kubernetes 实现负载均衡卫玠_juncheng kubernetes 负载均衡容器
使用Kubernetes实现负载均衡，可以通过Kubernetes的内置服务（Service）资源，配合负载均衡器（如云平台提供的负载均衡器或Ingress控制器）来完成。以下是详细的步骤和调优案例。一、Kubernetes负载均衡的基本概念ClusterIP（默认类型）只能在集群内部访问，分发到Pod的流量通过IPtables或IPVS转发。NodePort通过每个节点的固定端口将流量暴露给外部
云原生周刊：Prometheus 3.0 正式发布云计算
开源项目推荐Achilles-SDKAchilles-SDK是一个专为构建Kubernetes控制器而设计的开源开发工具包。它简化了控制器的开发流程，提供了强大的API和高效的抽象层，使开发者能够专注于业务逻辑的实现，而无需处理底层复杂性。Achilles-SDK支持快速构建高性能、可扩展的Kubernetes控制器，是开发Kubernetes原生应用和自动化操作的理想选择。KLKL是一个为终端提
云原生周刊：Prometheus 3.0 Beta 发布｜2024.09.16 KubeSphere 云原生 k8s 容器平台 kubesphere 云计算
开源项目推荐KumaKuma是一个现代化的基于Envoy的服务网格，能够在每个云平台上运行，支持单区域或多区域部署，兼容Kubernetes和虚拟机。凭借其广泛的通用工作负载支持，以及对Envoy数据平面代理技术的原生支持（但无需Envoy专业知识），Kuma提供了现代化的L4-L7服务连接、发现、安全、可观察性、路由等功能，适用于任何平台上的任何服务，包括数据库。TopoLVMTopoLVM是一
Ubuntu环境部署Kubernetes 沫殇-MS Kubernetes ubuntu kubernetes linux 服务器经验分享
环境说明：IPHOSTNAMEUSERDockerversionkubeletversionkubeadmversionkubectlversioncalicoversiondashboardversionOS192.168.100.10masterdeployv20.10.11v1.23.0v1.23.0v1.23.0v3.21.2v2.4.0Ubuntu20.04.3server192.168
Java实现的简单双向Map，支持重复Value superlxw1234 java 双向map
关键字：Java双向Map、DualHashBidiMap 有个需求，需要根据即时修改Map结构中的Value值，比如，将Map中所有value=V1的记录改成value=V2，key保持不变。数据量比较大，遍历Map性能太差，这就需要根据Value先找到Key，然后去修改。即：既要根据Key找Value，又要根据Value
PL/SQL触发器基础及例子百合不是茶 oracle数据库触发器 PL/SQL编程
触发器的简介; 触发器的定义就是说某个条件成立的时候，触发器里面所定义的语句就会被自动的执行。因此触发器不需要人为的去调用，也不能调用。触发器和过程函数类似过程函数必须要调用, 一个表中最多只能有12个触发器类型的,触发器和过程函数相似触发器不需要调用直接执行, 触发时间：指明触发器何时执行，该值可取： before：表示在数据库动作之前触发
[时空与探索]穿越时空的一些问题 comsci 问题
我们还没有进行过任何数学形式上的证明,仅仅是一个猜想..... 这个猜想就是; 任何有质量的物体(哪怕只有一微克)都不可能穿越时空,该物体强行穿越时空的时候,物体的质量会与时空粒子产生反应,物体会变成暗物质,也就是说,任何物体穿越时空会变成暗物质..(暗物质就我的理
easy ui datagrid上移下移一行商人shang js 上移下移 easyui datagrid
/** * 向上移动一行 * * @param dg * @param row */ function moveupRow(dg, row) { var datagrid = $(dg); var index = datagrid.datagrid("getRowIndex", row); if (isFirstRow(dg, row)) {
Java反射 oloz 反射
本人菜鸟，今天恰好有时间，写写博客，总结复习一下java反射方面的知识，欢迎大家探讨交流学习指教首先看看java中的Class package demo; public class ClassTest { /*先了解java中的Class*/ public static void main(String[] args) { //任何一个类都
springMVC 使用JSR-303 Validation验证杨白白 spring mvc
JSR-303是一个数据验证的规范，但是spring并没有对其进行实现，Hibernate Validator是实现了这一规范的，通过此这个实现来讲SpringMVC对JSR-303的支持。 JSR-303的校验是基于注解的，首先要把这些注解标记在需要验证的实体类的属性上或是其对应的get方法上。登录需要验证类 public class Login { @NotEmpty
log4j 香水浓 log4j
log4j.rootCategory=DEBUG, STDOUT, DAILYFILE, HTML, DATABASE #log4j.rootCategory=DEBUG, STDOUT, DAILYFILE, ROLLINGFILE, HTML #console log4j.appender.STDOUT=org.apache.log4j.ConsoleAppender log4
使用ajax和history.pushState无刷新改变页面URL agevs jquery 框架 Ajax html5 chrome
表现如果你使用chrome或者firefox等浏览器访问本博客、github.com、plus.google.com等网站时，细心的你会发现页面之间的点击是通过ajax异步请求的，同时页面的URL发生了了改变。并且能够很好的支持浏览器前进和后退。是什么有这么强大的功能呢？ HTML5里引用了新的API，history.pushState和history.replaceState，就是通过
centos中文乱码 AILIKES centos OS ssh
一、CentOS系统访问 g.cn ，发现中文乱码。于是用以前的方式：yum -y install fonts-chinese CentOS系统安装后，还是不能显示中文字体。我使用 gedit 编辑源码，其中文注释也为乱码。后来，终于找到以下方法可以解决，需要两个中文支持的包： fonts-chinese-3.02-12.
触发器 baalwolf 触发器
触发器(trigger)：监视某种情况，并触发某种操作。触发器创建语法四要素：1.监视地点(table) 2.监视事件(insert/update/delete) 3.触发时间(after/before) 4.触发事件(insert/update/delete) 语法： create trigger triggerName after/before
JS正则表达式的i m g bijian1013 JavaScript 正则表达式
g:表示全局（global)模式，即模式将被应用于所有字符串，而非在发现第一个匹配项时立即停止。 i:表示不区分大小写（case-insensitive）模式，即在确定匹配项时忽略模式与字符串的大小写。 m:表示
HTML5模式和Hashbang模式 bijian1013 JavaScript AngularJS Hashbang模式 HTML5模式
我们可以用$locationProvider来配置$location服务（可以采用注入的方式，就像AngularJS中其他所有东西一样）。这里provider的两个参数很有意思，介绍如下。 html5Mode 一个布尔值，标识$location服务是否运行在HTML5模式下。 ha
[Maven学习笔记六]Maven生命周期 bit1129 maven
从mvn test的输出开始说起当我们在user-core中执行mvn test时，执行的输出如下： /software/devsoftware/jdk1.7.0_55/bin/java -Dmaven.home=/software/devsoftware/apache-maven-3.2.1 -Dclassworlds.conf=/software/devs
【Hadoop七】基于Yarn的Hadoop Map Reduce容错 bit1129 hadoop
运行于Yarn的Map Reduce作业，可能发生失败的点包括 Task Failure Application Master Failure Node Manager Failure Resource Manager Failure 1. Task Failure 任务执行过程中产生的异常和JVM的意外终止会汇报给Application Master。僵死的任务也会被A
记一次数据推送的异常解决端口解决 ronin47 记一次数据推送的异常解决
　　需求：从db获取数据然后推送到B 程序开发完成，上jboss,刚开始报了很多错，逐一解决，可最后显示连接不到数据库。机房的同事说可以ping 通。　　自已画了个图，逐一排除，把linux 防火墙　和　setenforce　设置最低。　　　service iptables stop
巧用视错觉-UI更有趣 brotherlamp UI ui视频 ui教程 ui自学 ui资料
我们每个人在生活中都曾感受过视错觉（optical illusion）的魅力。视错觉现象是双眼跟我们开的一个玩笑，而我们往往还心甘情愿地接受我们看到的假象。其实不止如此，视觉错现象的背后还有一个重要的科学原理——格式塔原理。格式塔原理解释了人们如何以视觉方式感觉物体，以及图像的结构，视角，大小等要素是如何影响我们的视觉的。在下面这篇文章中，我们首先会简单介绍一下格式塔原理中的基本概念，
线段树-poj1177-N个矩形求边长（离散化+扫描线） bylijinnan 数据结构算法线段树
package com.ljn.base; import java.util.Arrays; import java.util.Comparator; import java.util.Set; import java.util.TreeSet; /** * POJ 1177 (线段树+离散化+扫描线)，题目链接为http://poj.org/problem?id=1177
HTTP协议详解 chicony http协议
引言
Scala设计模式 chenchao051 设计模式 scala
Scala设计模式我的话：在国外网站上看到一篇文章，里面详细描述了很多设计模式，并且用Java及Scala两种语言描述，清晰的让我们看到各种常规的设计模式，在Scala中是如何在语言特性层面直接支持的。基于文章很nice，我利用今天的空闲时间将其翻译，希望大家能一起学习，讨论。翻译
安装mysql daizj mysql 安装
安装mysql (1)删除linux上已经安装的mysql相关库信息。rpm -e xxxxxxx --nodeps (强制删除) 执行命令rpm -qa |grep mysql 检查是否删除干净 (2)执行命令 rpm -i MySQL-server-5.5.31-2.el
HTTP状态码大全 dcj3sjt126com http状态码
完整的 HTTP 1.1规范说明书来自于RFC 2616，你可以在http://www.talentdigger.cn/home/link.php?url=d3d3LnJmYy1lZGl0b3Iub3JnLw%3D%3D在线查阅。HTTP 1.1的状态码被标记为新特性，因为许多浏览器只支持 HTTP 1.0。你应只把状态码发送给支持 HTTP 1.1的客户端，支持协议版本可以通过调用request
asihttprequest上传图片 dcj3sjt126com ASIHTTPRequest
NSURL *url =@"yourURL"; ASIFormDataRequest*currentRequest =[ASIFormDataRequest requestWithURL:url]; [currentRequest setPostFormat:ASIMultipartFormDataPostFormat];[currentRequest se
C语言中，关键字static的作用 e200702084 C++c C#
在C语言中，关键字static有三个明显的作用： 1)在函数体，局部的static变量。生存期为程序的整个生命周期，（它存活多长时间）；作用域却在函数体内（它在什么地方能被访问（空间））。一个被声明为静态的变量在这一函数被调用过程中维持其值不变。因为它分配在静态存储区，函数调用结束后并不释放单元，但是在其它的作用域的无法访问。当再次调用这个函数时，这个局部的静态变量还存活，而且用在它的访
win7/8使用curl geeksun win7
1. WIN7/8下要使用curl，需要下载curl-7.20.0-win64-ssl-sspi.zip和Win64OpenSSL_Light-1_0_2d.exe。下载地址： http://curl.haxx.se/download.html 请选择不带SSL的版本，否则还需要安装SSL的支持包 2. 可以给Windows增加c
Creating a Shared Repository; Users Sharing The Repository hongtoushizi git
转载自： http://www.gitguys.com/topics/creating-a-shared-repository-users-sharing-the-repository/ Commands discussed in this section: git init –bare git clone git remote git pull git p
Java实现字符串反转的8种或9种方法 Josh_Persistence 异或反转递归反转二分交换反转 java字符串反转栈反转
注：对于第7种使用异或的方式来实现字符串的反转，如果不太看得明白的，可以参照另一篇博客： http://josh-persistence.iteye.com/blog/2205768 /** * */ package com.wsheng.aggregator.algorithm.string; import java.util.Stack; /**
代码实现任意容量倒水问题 home198979 PHP 算法倒水
形象化设计模式实战 HELLO!架构 redis命令源码解析倒水问题：有两个杯子，一个A升，一个B升，水有无限多，现要求利用这两杯子装C
Druid datasource zhb8015 druid
推荐大家使用数据库连接池 DruidDataSource. http://code.alibabatech.com/wiki/display/Druid/DruidDataSource DruidDataSource经过阿里巴巴数百个应用一年多生产环境运行验证，稳定可靠。它最重要的特点是：监控、扩展和性能。下载和Maven配置看这里： http
两种启动监听器ApplicationListener和ServletContextListener spjich java spring 框架
引言:有时候需要在项目初始化的时候进行一系列工作，比如初始化一个线程池，初始化配置文件，初始化缓存等等，这时候就需要用到启动监听器，下面分别介绍一下两种常用的项目启动监听器 ServletContextListener 特点: 依赖于sevlet容器，需要配置web.xml 使用方法: public class StartListener implements
JavaScript Rounding Methods of the Math object 何不笑 JavaScript Math
The next group of methods has to do with rounding decimal values into integers. Three methods — Math.ceil(), Math.floor(), and Math.round() — handle rounding in differen