hanjialeOK

[kubeflow] controller-runtime源码解析

[TODO] 使用 controller-runtime官方文档重构一下文章的脉络。

在上一篇文章 [kubeflow] 从零搭建training-operator项目中，我们从零搭建了一个简单的training-operator项目，最终就差完成controller的Reconcile函数逻辑。这次从TFJob的Reconcile函数为入口，探究training-operator到底是怎么工作的。在此之前，我们需要了解controller-runtime的原理。

controller-runtime源码分析

controller-runtime是社区封装的一个controller框架，借助kubebuilder等工具，开发者只需要关心Reconcile函数的实现，非常方便。下面这图不是controller-runtime，但很接近。Worker可以理解为reconciler，reconciler从工作队列中取出reconcile.request进行消耗。Readonly是指podLister，serviceLister这些。

Controller

下面controller-runtime的版本是v0.15.0，不同版本可能略有差异。下载了training-operator之后，执行go mod tidy下载依赖。使用vscode打开，找到controller相关代码通过 ctrl+左键就可以跳转到源码了。下面关于controller-runtime的分析，我主要是参考 operator：controller-runtime 原理之控制器这篇文章来的，因此分析过程基本差不多。

看看controller的抽象接口的定义，文件在[email protected]/pkg/controller/controller.go。核心主要是这四个函数，其中，reconcile.Reconciler也是一个抽象接口，里面只有一个Reconcile()函数的定义，即用户来实现。

// Controller implements a Kubernetes API.  A Controller manages a work queue fed reconcile.Requests
// from source.Sources.  Work is performed through the reconcile.Reconciler for each enqueued item.
// Work typically is reads and writes Kubernetes objects to make the system state match the state specified
// in the object Spec.
type Controller interface {
	// Reconciler is called to reconcile an object by Namespace/Name
	reconcile.Reconciler

	// Watch takes events provided by a Source and uses the EventHandler to
	// enqueue reconcile.Requests in response to the events.
	//
	// Watch may be provided one or more Predicates to filter events before
	// they are given to the EventHandler.  Events will be passed to the
	// EventHandler if all provided Predicates evaluate to true.
	Watch(src source.Source, eventhandler handler.EventHandler, predicates ...predicate.Predicate) error

	// Start starts the controller.  Start blocks until the context is closed or a
	// controller has an error starting.
	Start(ctx context.Context) error

	// GetLogger returns this controller logger prefilled with basic information.
	GetLogger() logr.Logger
}

看看controller的具体实现，文件在[email protected]/pkg/internal/controller/controller.go。MakeQueue用来初始化限速的工作队列Queue，成员Do则是reconciler，之后会运行用户的reconcile代码。mu是一个锁，保证同时只有一个controller在运行。Started标记controller是否在运行。startWatches用来存储所有的watchDescription对象，一个watchDescription对象包括src，handler和predicates三部分。

// Controller implements controller.Controller.
type Controller struct {
	// Name is used to uniquely identify a Controller in tracing, logging and monitoring.  Name is required.
	Name string

	// MaxConcurrentReconciles is the maximum number of concurrent Reconciles which can be run. Defaults to 1.
	MaxConcurrentReconciles int

	// Reconciler is a function that can be called at any time with the Name / Namespace of an object and
	// ensures that the state of the system matches the state specified in the object.
	// Defaults to the DefaultReconcileFunc.
	Do reconcile.Reconciler

	// MakeQueue constructs the queue for this controller once the controller is ready to start.
	// This exists because the standard Kubernetes workqueues start themselves immediately, which
	// leads to goroutine leaks if something calls controller.New repeatedly.
	MakeQueue func() workqueue.RateLimitingInterface

	// Queue is an listeningQueue that listens for events from Informers and adds object keys to
	// the Queue for processing
	Queue workqueue.RateLimitingInterface

	// mu is used to synchronize Controller setup
	mu sync.Mutex

	// Started is true if the Controller has been Started
	Started bool

	// ctx is the context that was passed to Start() and used when starting watches.
	//
	// According to the docs, contexts should not be stored in a struct: https://golang.org/pkg/context,
	// while we usually always strive to follow best practices, we consider this a legacy case and it should
	// undergo a major refactoring and redesign to allow for context to not be stored in a struct.
	ctx context.Context

	// CacheSyncTimeout refers to the time limit set on waiting for cache to sync
	// Defaults to 2 minutes if not set.
	CacheSyncTimeout time.Duration

	// startWatches maintains a list of sources, handlers, and predicates to start when the controller is started.
	startWatches []watchDescription

	// LogConstructor is used to construct a logger to then log messages to users during reconciliation,
	// or for example when a watch is started.
	// Note: LogConstructor has to be able to handle nil requests as we are also using it
	// outside the context of a reconciliation.
	LogConstructor func(request *reconcile.Request) logr.Logger

	// RecoverPanic indicates whether the panic caused by reconcile should be recovered.
	RecoverPanic *bool

	// LeaderElected indicates whether the controller is leader elected or always running.
	LeaderElected *bool
}

// watchDescription contains all the information necessary to start a watch.
type watchDescription struct {
	src        source.Source
	handler    handler.EventHandler
	predicates []predicate.Predicate
}

Controller.Watch

看看Watch函数的具体实现，文件在[email protected]/pkg/internal/controller/controller.go。可以看到实际上是调用了Source.Start函数来完成。src是什么？是我们想要观察的对象，我们想观察到src对象的增删改时间并调用eventHandler相应处理。注意Watch函数运行时，并没有初始化工作队列Queue，因为src.Start之后只是使用Cache初始化informer并注册事件处理函数，之后会提到。

// Watch implements controller.Controller.
func (c *Controller) Watch(src source.Source, evthdler handler.EventHandler, prct ...predicate.Predicate) error {
	c.mu.Lock()
	defer c.mu.Unlock()

	// Controller hasn't started yet, store the watches locally and return.
	//
	// These watches are going to be held on the controller struct until the manager or user calls Start(...).
	if !c.Started {
		c.startWatches = append(c.startWatches, watchDescription{src: src, handler: evthdler, predicates: prct})
		return nil
	}

	c.LogConstructor(nil).Info("Starting EventSource", "source", src)
	return src.Start(c.ctx, evthdler, c.Queue, prct...)
}

Source也是一个抽象接口，看看定义，文件在[email protected]/pkg/source/source.go。里面只有一个Start函数的定义。

// Source is a source of events (eh.g. Create, Update, Delete operations on Kubernetes Objects, Webhook callbacks, etc)
// which should be processed by event.EventHandlers to enqueue reconcile.Requests.
//
// * Use Kind for events originating in the cluster (e.g. Pod Create, Pod Update, Deployment Update).
//
// * Use Channel for events originating outside the cluster (eh.g. GitHub Webhook callback, Polling external urls).
//
// Users may build their own Source implementations.
type Source interface {
	// Start is internal and should be called only by the Controller to register an EventHandler with the Informer
	// to enqueue reconcile.Requests.
	Start(context.Context, handler.EventHandler, workqueue.RateLimitingInterface, ...predicate.Predicate) error
}

我们再看看Watch函数是如何被training-operator调用的，文件在pkg/controller.v1/tensorflow/tfjob_controller.go。可以看到

Kind结构体作为实参，Kind应该是Source抽象接口的一个实现。kubeflowv1.TFJob{}就是我们想关注的资源对象，Kind对kubeflowv1.TFJob{}进行了包装。
handler.EnqueueRequestForObject{}是抽象接口EventHandler的具体实现，有Create，Delete，Update等函数，后面会提到。
predicate.Funcs{CreateFunc: r.onOwnerCreateFunc()}是用户提供的断言函数，用于判断相关事件是否有必要推入队列，后面也会提到。

	// using onOwnerCreateFunc is easier to set defaults
	if err = c.Watch(source.Kind(mgr.GetCache(), &kubeflowv1.TFJob{}), &handler.EnqueueRequestForObject{},
		predicate.Funcs{CreateFunc: r.onOwnerCreateFunc()},
	); err != nil {
		return err
	}

看看Kind的具体实现，文件在[email protected]/pkg/internal/source/kind.go。Type便是我们想观察的具体类型，Kind为其做了封装，增加了一个实参来自manager的Cache，可以提供informer，毕竟Kind要实现Start这么重要的函数，肯定得需要相应的工具包。

// Kind is used to provide a source of events originating inside the cluster from Watches (e.g. Pod Create).
type Kind struct {
	// Type is the type of object to watch.  e.g. &v1.Pod{}
	Type client.Object

	// Cache used to watch APIs
	Cache cache.Cache

	// started may contain an error if one was encountered during startup. If its closed and does not
	// contain an error, startup and syncing finished.
	started     chan error
	startCancel func()
}

结合上面的分析可以知道，Controller.Watch实际会调用Kind.Start。Kind.Start的实现就在下面。核心函数是调用i.AddEventHandler来监听资源的变动并通过handler来进行相应的处理。使用过informer的应该对AddEventHandler不陌生，注意informer调用的增删改回调函数都是发生后才通知，也就是说资源对象已经发生了增删改事件。i就是一个informer，使用Kind.Cache.GetInformer来初始化。

// Start is internal and should be called only by the Controller to register an EventHandler with the Informer
// to enqueue reconcile.Requests.
func (ks *Kind) Start(ctx context.Context, handler handler.EventHandler, queue workqueue.RateLimitingInterface,
	// ...

	// cache.GetInformer will block until its context is cancelled if the cache was already started and it can not
	// sync that informer (most commonly due to RBAC issues).
	ctx, ks.startCancel = context.WithCancel(ctx)
	ks.started = make(chan error)
	go func() {
		var (
			i       cache.Informer
			lastErr error
		)

		// ...
		i, lastErr = ks.Cache.GetInformer(ctx, ks.Type)
		// ...

		_, err := i.AddEventHandler(NewEventHandler(ctx, queue, handler, prct).HandlerFuncs())
		// ...
	}()

	return nil
}

NewEventHandler函数初始化一个EventHandler，成员包括一个事件处理handler，一个限速的工作队列queue，还有一堆判断函数predicates。注意这里的EventHandler是一个结构体，其成员之一的handler.EventHandler是抽象接口，而这虽然命名相同，但并不无关系，不要搞混了。handler就是前面提到的handler.EnqueueRequestForObject{}，predicates就是前面提到的predicate.Funcs{CreateFunc: r.onOwnerCreateFunc()}，这两个都是用户提供的。OnAdd函数预先使用predicates进行判断，成功通过判断函数后，最终会调用handler的Create函数，把reconcile.request推入队列。OnUpdate和OnDelete函数，逻辑都是类似的。

// EventHandler adapts a handler.EventHandler interface to a cache.ResourceEventHandler interface.
type EventHandler struct {
	// ctx stores the context that created the event handler
	// that is used to propagate cancellation signals to each handler function.
	ctx context.Context

	handler    handler.EventHandler
	queue      workqueue.RateLimitingInterface
	predicates []predicate.Predicate
}

// HandlerFuncs converts EventHandler to a ResourceEventHandlerFuncs
// TODO: switch to ResourceEventHandlerDetailedFuncs with client-go 1.27
func (e *EventHandler) HandlerFuncs() cache.ResourceEventHandlerFuncs {
	return cache.ResourceEventHandlerFuncs{
		AddFunc:    e.OnAdd,
		UpdateFunc: e.OnUpdate,
		DeleteFunc: e.OnDelete,
	}
}

// OnAdd creates CreateEvent and calls Create on EventHandler.
func (e *EventHandler) OnAdd(obj interface{}) {
	c := event.CreateEvent{}

	// ...

	for _, p := range e.predicates {
		if !p.Create(c) {
			return
		}
	}

	// Invoke create handler
	ctx, cancel := context.WithCancel(e.ctx)
	defer cancel()
	e.handler.Create(ctx, c, e.queue)
}

看看EnqueueRequestForObject是如何实现Create函数的，文件在[email protected]/pkg/handler/enqueue.go。函数逻辑非常简单，而且顾名思义，就是把对象本身的namespace和name作为reconcile.Request推入工作队列。与之对应的还有一个enqueueRequestForOwner，这个则是把对象的owner的namespace和name作为reconcile.Request推入工作队列。对pod和service的Watch需要使用enqueueRequestForOwner，因为pod和service的结构体里面有ownRerference字段来标记其owner(TFJob)；而对于TFJob本身的Watch，则使用EnqueueRequestForObject。tfjob_controller.go里面就是这样用的。

// EnqueueRequestForObject enqueues a Request containing the Name and Namespace of the object that is the source of the Event.
// (e.g. the created / deleted / updated objects Name and Namespace).  handler.EnqueueRequestForObject is used by almost all
// Controllers that have associated Resources (e.g. CRDs) to reconcile the associated Resource.
type EnqueueRequestForObject struct{}

// Create implements EventHandler.
func (e *EnqueueRequestForObject) Create(ctx context.Context, evt event.CreateEvent, q workqueue.RateLimitingInterface) {
	if evt.Object == nil {
		enqueueLog.Error(nil, "CreateEvent received with no metadata", "event", evt)
		return
	}
	q.Add(reconcile.Request{NamespacedName: types.NamespacedName{
		Name:      evt.Object.GetName(),
		Namespace: evt.Object.GetNamespace(),
	}})
}

看看training-operator提供的断言函数predicate.Funcs{CreateFunc: r.onOwnerCreateFunc()}。逻辑很直接，只要是TFJob，那么就判断为true。因为informer通知的时候资源已经发生了改动，因此状态标记为JobCreated。

// onOwnerCreateFunc modify creation condition.
func (r *TFJobReconciler) onOwnerCreateFunc() func(event.CreateEvent) bool {
	return func(e event.CreateEvent) bool {
		tfJob, ok := e.Object.(*kubeflowv1.TFJob)
		if !ok {
			return true
		}

		r.Scheme.Default(tfJob)
		msg := fmt.Sprintf("TFJob %s is created.", e.Object.GetName())
		logrus.Info(msg)
		trainingoperatorcommon.CreatedJobsCounterInc(tfJob.Namespace, r.GetFrameworkName())
		commonutil.UpdateJobConditions(&tfJob.Status, kubeflowv1.JobCreated, corev1.ConditionTrue, commonutil.NewReason(kubeflowv1.TFJobKind, commonutil.JobCreatedReason), msg)
		return true
	}
}

Controller.Start

Watch函数讲完了，一言以蔽之，那就是注册informer。然后我们再来看看Start函数，位置在[email protected]/pkg/internal/controller/controller.go。运行前先加锁，保证同时只有一个Controller在运行。使用MakeQueue对Queue进行初始化（Watch的时候没有初始化）。然后对startWatches里的每个对象再次执行src.Start（之前在Controller.Watch时调用了一次）。可以有MaxConcurrentReconciles个同时执行processNextWorkItem从Queue中取出reconcile.request进行消费。

// Start implements controller.Controller.
func (c *Controller) Start(ctx context.Context) error {
	// use an IIFE to get proper lock handling
	// but lock outside to get proper handling of the queue shutdown
	c.mu.Lock()
	if c.Started {
		return errors.New("controller was started more than once. This is likely to be caused by being added to a manager multiple times")
	}

	c.initMetrics()

	// Set the internal context.
	c.ctx = ctx

	c.Queue = c.MakeQueue()
	go func() {
		<-ctx.Done()
		c.Queue.ShutDown()
	}()

	wg := &sync.WaitGroup{}
	err := func() error {
		defer c.mu.Unlock()

		// TODO(pwittrock): Reconsider HandleCrash
		defer utilruntime.HandleCrash()

		// NB(directxman12): launch the sources *before* trying to wait for the
		// caches to sync so that they have a chance to register their intendeded
		// caches.
		for _, watch := range c.startWatches {
			c.LogConstructor(nil).Info("Starting EventSource", "source", fmt.Sprintf("%s", watch.src))

			if err := watch.src.Start(ctx, watch.handler, c.Queue, watch.predicates...); err != nil {
				return err
			}
		}

		// Start the SharedIndexInformer factories to begin populating the SharedIndexInformer caches
		c.LogConstructor(nil).Info("Starting Controller")

		for _, watch := range c.startWatches {
			syncingSource, ok := watch.src.(source.SyncingSource)
			if !ok {
				continue
			}

			if err := func() error {
				// use a context with timeout for launching sources and syncing caches.
				sourceStartCtx, cancel := context.WithTimeout(ctx, c.CacheSyncTimeout)
				defer cancel()

				// WaitForSync waits for a definitive timeout, and returns if there
				// is an error or a timeout
				if err := syncingSource.WaitForSync(sourceStartCtx); err != nil {
					err := fmt.Errorf("failed to wait for %s caches to sync: %w", c.Name, err)
					c.LogConstructor(nil).Error(err, "Could not wait for Cache to sync")
					return err
				}

				return nil
			}(); err != nil {
				return err
			}
		}

		// All the watches have been started, we can reset the local slice.
		//
		// We should never hold watches more than necessary, each watch source can hold a backing cache,
		// which won't be garbage collected if we hold a reference to it.
		c.startWatches = nil

		// Launch workers to process resources
		c.LogConstructor(nil).Info("Starting workers", "worker count", c.MaxConcurrentReconciles)
		wg.Add(c.MaxConcurrentReconciles)
		for i := 0; i < c.MaxConcurrentReconciles; i++ {
			go func() {
				defer wg.Done()
				// Run a worker thread that just dequeues items, processes them, and marks them done.
				// It enforces that the reconcileHandler is never invoked concurrently with the same object.
				for c.processNextWorkItem(ctx) {
				}
			}()
		}

		c.Started = true
		return nil
	}()
	if err != nil {
		return err
	}

	<-ctx.Done()
	c.LogConstructor(nil).Info("Shutdown signal received, waiting for all workers to finish")
	wg.Wait()
	c.LogConstructor(nil).Info("All workers finished")
	return nil
}

processNextWorkItem的代码如下，实际上调用了reconcileHandler。

// processNextWorkItem will read a single work item off the workqueue and
// attempt to process it, by calling the reconcileHandler.
func (c *Controller) processNextWorkItem(ctx context.Context) bool {
	obj, shutdown := c.Queue.Get()
	if shutdown {
		// Stop working
		return false
	}

	// We call Done here so the workqueue knows we have finished
	// processing this item. We also must remember to call Forget if we
	// do not want this work item being re-queued. For example, we do
	// not call Forget if a transient error occurs, instead the item is
	// put back on the workqueue and attempted again after a back-off
	// period.
	defer c.Queue.Done(obj)

	ctrlmetrics.ActiveWorkers.WithLabelValues(c.Name).Add(1)
	defer ctrlmetrics.ActiveWorkers.WithLabelValues(c.Name).Add(-1)

	c.reconcileHandler(ctx, obj)
	return true
}

reconcileHandler的代码如下，实际调用了Reconcile函数。

func (c *Controller) reconcileHandler(ctx context.Context, obj interface{}) {
	// Update metrics after processing each item
	reconcileStartTS := time.Now()
	defer func() {
		c.updateMetrics(time.Since(reconcileStartTS))
	}()

	// Make sure that the object is a valid request.
	req, ok := obj.(reconcile.Request)
	// ...

	log := c.LogConstructor(&req)
	reconcileID := uuid.NewUUID()

	log = log.WithValues("reconcileID", reconcileID)
	ctx = logf.IntoContext(ctx, log)
	ctx = addReconcileID(ctx, reconcileID)

	// RunInformersAndControllers the syncHandler, passing it the Namespace/Name string of the
	// resource to be synced.
	result, err := c.Reconcile(ctx, req)
	// ...
}

最终，c.Do.Reconcile(ctx, req)便是用户写的Reconclie函数。

// Reconcile implements reconcile.Reconciler.
func (c *Controller) Reconcile(ctx context.Context, req reconcile.Request) (_ reconcile.Result, err error) {
	defer func() {
		if r := recover(); r != nil {
			if c.RecoverPanic != nil && *c.RecoverPanic {
				for _, fn := range utilruntime.PanicHandlers {
					fn(r)
				}
				err = fmt.Errorf("panic: %v [recovered]", r)
				return
			}

			log := logf.FromContext(ctx)
			log.Info(fmt.Sprintf("Observed a panic in reconciler: %v", r))
			panic(r)
		}
	}()
	return c.Do.Reconcile(ctx, req)
}

k8s:安装 Helm 私有仓库ChartMuseum、helm-push插件并上传、安装Zookeeper 云游 docker helm helm-push
ChartMuseum是Kubernetes生态中用于存储、管理和发布HelmCharts的开源系统，主要用于扩展Helm包管理器的功能核心功能‌集中存储‌：提供中央化仓库存储Charts，支持版本管理和权限控制。‌‌跨集群部署‌：支持多集群环境下共享Charts，简化部署流程。‌‌离线部署‌：适配无网络环境，可将Charts存储在本地或局域网内。‌‌HTTP接口‌：通过HTTP协议提供服务，用户
Kubernetes自动扩缩容方案对比与实践指南浅沫云归后端技术栈小结 kubernetes autoscaling devops
Kubernetes自动扩缩容方案对比与实践指南随着微服务架构和容器化的广泛采用，Kubernetes自动扩缩容（Autoscaling）成为保障生产环境性能稳定与资源高效利用的关键技术。面对水平Pod扩缩容、垂直资源调整、集群节点扩缩容以及事件驱动扩缩容等多种需求，社区提供了HPA、VPA、ClusterAutoscaler、KEDA等多种方案。本篇文章将从业务背景、方案对比、优缺点分析、选型建
【运维实战】解决 K8s 节点无法拉取 pause:3.6 镜像导致 API Server 启动失败的问题 gs80140 各种问题运维 kubernetes 容器
目录【运维实战】解决K8s节点无法拉取pause:3.6镜像导致APIServer启动失败的问题问题分析✅解决方案：替代拉取方式导入pause镜像Step1.从私有仓库拉取pause镜像Step2.重新打tag为Kubernetes默认命名Step3.导出镜像为tar包Step4.拷贝镜像到目标节点Step5.在目标节点导入镜像到containerd的k8s.io命名空间Step6.验证镜像是否导
zookeeper etcd区别 sun007700 zookeeper etcd 分布式
ZooKeeper与etcd的核心区别体现在设计理念、数据模型、一致性协议及适用场景等方面。‌ZooKeeper基于ZAB协议实现分布式协调，采用树形数据结构和临时节点特性，适合传统分布式系统；而etcd基于Raft协议，以高性能键值对存储为核心，专为云原生场景优化，是Kubernetes等容器编排系统的默认存储组件。‌‌1‌‌2‌架构与设计目标差异‌‌ZooKeeper‌。‌设计定位‌:专注于分
在 Linux（openEuler 24.03 LTS-SP1）上安装 Kubernetes + KubeSphere 的防火墙放行全攻略
目录在Linux（openEuler24.03LTS-SP1）上安装Kubernetes+KubeSphere的防火墙放行全攻略一、为什么要先搞定防火墙？二、目标环境三、需放行的端口和协议列表四、核心工具说明1.修正后的exec.sh脚本（支持管道/重定向）2.批量放行脚本：open_firewall.sh五、使用示例1.批量放行端口2.查看当前防火墙规则3.仅开放单一端口（临时需求）4.检查特定
K3s-io/kine项目核心架构与数据流解析富珂祯
K3s-io/kine项目核心架构与数据流解析kineRunKubernetesonMySQL,Postgres,sqlite,dqlite,notetcd.项目地址:https://gitcode.com/gh_mirrors/ki/kine项目概述K3s-io/kine是一个创新的存储适配器，它在传统SQL数据库之上实现了轻量级的键值存储功能。该项目最显著的特点是采用单一数据表结构，通过巧妙的
20250707-3-Kubernetes 核心概念-有了Docker，为什么还用K8s_笔记 Andy杨 CKA-专栏 kubernetes docker 笔记
一、Kubernetes核心概念1.有了Docker，为什么还用Kubernetes1）企业需求独立性问题：Docker容器本质上是独立存在的，多个容器跨主机提供服务时缺乏统一管理机制负载均衡需求：为提高业务并发和高可用，企业会使用多台服务器部署多个容器实例，但Docker本身不具备负载均衡能力管理复杂度：随着Docker主机和容器数量增加，面临部署、升级、监控等统一管理难题运维效率：单机升
20250707-4-Kubernetes 集群部署、配置和验证-K8s基本资源概念初_笔记
一、kubeconfig配置文件文件作用:kubectl使用kubeconfig认证文件连接K8s集群生成方式:使用kubectlconfig指令生成核心字段:clusters:定义集群信息，包括证书和服务端地址contexts:定义上下文，关联集群和用户users:定义客户端认证信息current-context:指定当前使用的上下文二、Kubernetes弃用Docker1.弃用背景原因:
k8s之configmap 西京刀客云原生(Cloud Native)云计算虚拟化 #Kubernetes(k8s)kubernetes 容器云原生
文章目录k8s之configmap什么是ConfigMap？为什么需要ConfigMap？ConfigMap的创建方式ConfigMap的使用方式实际应用场景ConfigMap最佳实践参考k8s之configmap什么是ConfigMap？ConfigMap是Kubernetes中用于存储非机密配置数据的API对象。它允许你将配置信息与容器镜像解耦，使应用程序更加灵活和可移植。ConfigMap以
SkyWalking实现微服务链路追踪的埋点方案 MenzilBiz 服务器运维微服务 skywalking
SkyWalking实现微服务链路追踪的埋点方案一、SkyWalking简介SkyWalking是一款开源的APM(应用性能监控)系统，特别为微服务、云原生架构和容器化(Docker/Kubernetes)应用而设计。它主要功能包括分布式追踪、服务网格遥测分析、指标聚合和可视化等。SkyWalking支持多种语言（Java、Go、Python等）和协议（HTTP、gRPC等），能够提供端到端的调用
Kubernetes 高级调度 01 惊起白鸽450 kubernetes 容器云原生
一、初始化容器（InitContainer）：应用启动前的"预备军"在Kubernetes集群中，Pod作为最小部署单元，往往需要在主容器启动前完成一系列准备工作。例如，配置文件生成、依赖服务检查、内核参数调整等。这些操作若直接嵌入主容器镜像，会导致镜像体积膨胀、安全性降低，甚至引发启动逻辑混乱。初始化容器（InitContainer）的出现，正是为了解决这一痛点。1.1核心概念与特性InitCo
# 深度解析:k8s技术架构从入门到精通
从零开始，带你玩转Kubernetes！不再是"听说很牛逼，但不知道怎么用"的状态文章目录初识K8s：不只是一个"容器编排工具"K8s核心架构：Master和Node的"君臣关系"ControlPlane：大脑中枢的精密运作WorkerNode：真正干活的"打工人"Pod：K8s世界的最小单位Service：让应用"找得到彼此"实战场景：从单体到微服务的华丽转身进阶之路：从入门到精通的修炼指南总结
3-2-1、k8s学习-k8s介绍向新35° 一 be a K8s God kubernetes 学习容器
1、Kubernetes（K8s）详细介绍一、Kubernetes概述Kubernetes是一个开源的容器编排平台，用于自动化部署、扩展和管理容器化应用程序。其名称源于希腊语，意为“舵手”或“飞行员”，象征着对容器化应用的精准控制。核心目标：简化微服务架构下的应用部署与管理，解决容器化应用的服务发现、负载均衡、自动扩缩容、故障恢复等问题。起源：由Google开发（基于内部Borg系统），2014年
【ubuntu24.04】k8s 部署6：calico容器正常启动等风来不如迎风去网络服务入门与实战 kubernetes 容器云原生
参考大神以及tigera官方的calico教程：拥有一个带有Calico的单主机Kubernetes集群后，【k8s】配置calico1：镜像拉取【k8s】master节点重新安装docker-ce本文进一步解决容器启动问题：非常感谢大神的指点：准备Kubernetes集群环境做好了各种配置以后，kube**的镜像起始
云原生周刊：Istio 1.24.0 正式发布
云原生周刊：Istio1.24.0正式发布开源项目推荐KopfKopf是一个简洁高效的Python框架，只需几行代码即可编写KubernetesOperator。Kubernetes（K8s）作为强大的容器编排系统，虽自带命令行工具（kubectl），但在应对复杂操作时往往力不从心。通过Kopf，您可以使用Python轻松实现Kubernetes的复杂操作，包括条件判断、事件触发等，让自定义操作变
异构推理系统动态负载调度与资源分配实战：多节点协同、任务绑定与智能分发策略全解析观熵大模型高阶优化技术专题算法人工智能
异构推理系统动态负载调度与资源分配实战：多节点协同、任务绑定与智能分发策略全解析关键词异构调度、Kubernetes调度器、GPU任务绑定、MIG分配、推理流量调度、服务亲和性、任务隔离、资源优先级、边缘协同、动态算力管理摘要在AI推理系统的生产级部署中，单一自动扩缩容机制已无法满足实际复杂环境中对资源利用率、任务延迟与系统稳定性的多重要求。特别是在GPU/NPU/CPU并存的异构计算集群中，运行
Kubernetes 配置管理伤不起bb kubernetes 容器云原生
目录前言：为什么需要K8s配置管理？一、为什么需要ConfigMap和Secret？二、ConfigMap：非敏感配置的管理工具1.什么是ConfigMap？2.实战：创建ConfigMap的4种方式①基于目录创建（多文件批量导入）②基于单个文件创建（指定key名）③基于ENV文件创建（key=value格式）④基于命令行键值对创建（少量配置）3.实战：在Pod中使用ConfigMap①作为环境变
深度解析：SUSE Harvester私有云平台建设指南
关键词:SUSEHarvester,私有云,HCI,超融合,Kubernetes,KubeVirt,Longhorn,云原生,虚拟化,容器目录导航一、初识SUSEHarvester-私有云的新选择二、核心架构解析-揭开HCI的神秘面纱三、部署实战-从零到一搭建你的私有云四、存储与网络配置-数据的安全港湾五、虚拟机管理-让资源调度更智能六、监控与运维-保驾护航的守护者七、最佳实践-踩坑经验分享八、总
Spring Boot：将应用部署到Kubernetes的完整指南小马不敲代码 SpringBoot spring boot kubernetes 后端
详细介绍如何将一个SpringBoot应用程序部署到Kubernetes集群。从一个基础的SpringBoot应用开始，通过Docker容器化，最后完成Kubernetes集群的部署配置。这个过程将帮助你理解现代云原生应用部署的完整流程。示例项目SpringBoot2.6.13Java8Maven构建工具RESTfulAPI接口标准的项目结构项目结构如下：├──src/│├──main/││├──
k8s深度讲解----宏观架构与集群之脑 - API Server 和 etcd weixin_42587823 云原生 kubernetes 架构 etcd
宏观架构与集群之脑-APIServer和etcd宏观架构：数据中心的操作系统在开始之前，让我们先建立一个高层视角。你可以将Kubernetes想象成一个管理整个数据中心的分布式操作系统。在这个操作系统中：控制平面(ControlPlane)就是它的“内核”，负责管理和决策。工作节点(WorkerNodes)就是它的“CPU和内存”，是真正运行应用程序的地方。我们常用的kubectl就是与这个“内核
容器和 Kubernetes 中的退出码 riverz1227 k8s kubernetes 容器云原生
在Kubernetes中，Pod中容器的退出状态（exitCode）表示容器进程退出时的状态码。这个exitCode通常是应用程序或shell返回的标准UNIX/Linux退出码。理解常见的exitCode有助于我们快速定位容器异常退出的原因。一、常见exitCode及含义（基础类）exitCode含义说明常见原因0成功退出（正常）容器程序已完成任务或被优雅终止1一般性错误（GeneralErro
Kubernetes Pod DNS 配置 riverz1227 k8s k8s
一、概述Kubernetes提供多种DNS策略（dnsPolicy），同时kubelet参数也影响最终的DNS行为。二、dnsPolicy策略说明dnsPolicy含义说明Default使用宿主机的DNS配置（kubelet的--resolv-conf）ClusterFirst优先使用集群DNS（如CoreDNS），仅在无法解析时退回宿主机DNS（默认策略）ClusterFirstWithHost
Kubernetes Ingress 服务发布进阶伤不起bb kubernetes 容器云原生
目录一、核心概念与原理1.Ingress基础概念（1）Ingress的定位（2）Ingress与其他暴露方式对比（3）Ingress组成与分工2.Ingress工作原理（以Nginx为例）3.典型流量访问链路二、IngressNginxController安装（Helm方式）1.环境准备与工具安装2.核心配置文件修改（values.yaml）3.部署与验证命令三、Ingress基础使用：域名绑定服
Kubernetes日志运维痛点及日志系统架构设计（Promtail+Loki+Grafana）
Kubernetes日志运维痛点及日志系统架构设计（Promtail+Loki+Grafana）运维痛点日志采集的可靠性与复杂性pod生命周期短、易销毁容器重启或Pod被销毁后，日志会丢失（除非已持久化或集中采集）。需要侧重于实时采集和转发，而不能依赖节点本地日志。多样化的日志来源与格式应用日志、系统日志、Kubernetes组件日志（如kubelet、kube-apiserver）、中间件日志（
云原生时代的日志管理：ELK、Loki、Fluentd 如何选型？
一、引言在微服务和Kubernetes普及的今天，传统的日志管理方式已经难以应对高并发、分布式架构带来的挑战。随着容器化应用数量激增，日志数据量呈指数级增长，如何高效地收集、存储、查询和分析日志，成为每个团队必须面对的问题。在这样的背景下，ELK（Elasticsearch+Logstash+Kibana）、Loki和Fluentd成为当前主流的日志解决方案。它们各有特色，适用于不同规模和技术栈的
CKA认证 | 使用kubeadm部署K8s集群（v1.26）小安运维日记 Kubernetes CKA 认证培训 kubernetes 容器云原生运维 k8s docker
一、前置知识点1.1生产环境可部署Kubernetes集群的两种方式目前生产部署Kubernetes集群主要有两种方式：①kubeadmKubeadm是一个K8s部署工具，提供kubeadminit和kubeadmjoin，用于快速部署Kubernetes集群。②二进制包从github下载发行版的二进制包，手动部署每个组件，组成Kubernetes集群。这里采用kubeadm搭建集群kubeadm
116、掌握Docker Compose与Kubernetes：Rust应用部署实操多多的编程笔记 Rust之Web开发 docker kubernetes rust
Rust部署与运维：掌握使用DockerCompose、Kubernetes等工具进行应用部署和管理1.引言Rust是一种注重性能、安全和并发的系统编程语言。近年来，随着云计算和微服务架构的普及，如何将Rust应用部署到生产环境中，成为越来越多开发者关注的问题。本文将介绍如何使用DockerCompose和Kubernetes等工具进行Rust应用的部署和管理。2.DockerCompose简介D
开源的服务网格:Istio 深海科技服务行业发展开源 istio 云原生
一、lstio介绍Istio是一个开源的服务网格（ServiceMesh），它为微服务架构中的服务间通信提供了统一的管理、连接、安全、控制和可观测性。在复杂的云原生环境中，尤其是基于Kubernetes的部署中，随着微服务数量的增加，管理它们之间的网络通信会变得异常复杂。Istio就是为了解决这些挑战而设计的。1、为什么需要Istio？在传统的微服务架构中，开发人员需要在每个服务中编写代码来处理服
serviceMesh 学习一切顺势而行 service_mesh 学习 java
根据您已掌握的Docker、Kubernetes及灰度发布等技能，以下是ServiceMesh需要重点掌握的知识体系，分为核心概念、关键技术、实践场景和进阶能力四部分，助您系统化掌握服务网格：一、ServiceMesh核心概念概念说明与K8s的关联数据平面Sidecar代理（如Envoy），拦截服务间流量通过sidecar-injector自动注入到Pod中控制平面管理Sidecar的组件（如Is
Docker 和 Kubernetes 入门到精通：运维工程师的实战笔记 (近5万字) 运维小贺运维 linux docker 容器 kubernetes 云原生 kubelet
文章目录1.Docker1.1Docker是什么？1.1.1容器服务原理1.2Docker的三大概念1.2.1镜像1.2.2容器1.2.3仓库1.2.4总结1.3Docker常用命令1.3.1镜像常用命令1.3.2容器常用命令1.4Dockerfile1.4.1commit的局限1.4.2Dockerfile是什么？1.4.3如何使用Dockerfile制作镜像?1.4.4Dockerfile中常
JAVA基础灵静志远位运算加载 Date 字符串池覆盖
一、类的初始化顺序 1 （静态变量，静态代码块）-->（变量，初始化块）--> 构造器同一括号里的，根据它们在程序中的顺序来决定。上面所述是同一类中。如果是继承的情况，那就在父类到子类交替初始化。二、String 1 String a = "abc"; JAVA虚拟机首先在字符串池中查找是否已经存在了值为"abc"的对象，根
keepalived实现redis主从高可用 bylijinnan redis
方案说明两台机器（称为A和B），以统一的VIP对外提供服务 1.正常情况下，A和B都启动，B会把A的数据同步过来（B is slave of A） 2.当A挂了后，VIP漂移到B；B的keepalived 通知redis 执行：slaveof no one，由B提供服务 3.当A起来后，VIP不切换，仍在B上面；而A的keepalived 通知redis 执行slaveof B，开始
java文件操作大全 0624chenhong java
最近在博客园看到一篇比较全面的文件操作文章，转过来留着。 http://www.cnblogs.com/zhuocheng/archive/2011/12/12/2285290.html 转自http://blog.sina.com.cn/s/blog_4a9f789a0100ik3p.html 一.获得控制台用户输入的信息 &nbs
android学习任务不懂事的小屁孩工作
任务完成情况搞清楚带箭头的pupupwindows和不带的使用已完成熟练使用pupupwindows和alertdialog，并搞清楚两者的区别已完成熟练使用android的线程handler,并敲示例代码进行中了解游戏2048的流程，并完成其代码工作进行中-差几个actionbar 研究一下android的动画效果，写一个实例已完成复习fragem
zoom.js 换个号韩国红果果 oom
它的基于bootstrap 的 https://raw.github.com/twbs/bootstrap/master/js/transition.js transition.js模块引用顺序 <link rel="stylesheet" href="style/zoom.css"> <script src=&q
详解Oracle云操作系统Solaris 11.2 蓝儿唯美 Solaris
当Oracle发布Solaris 11时，它将自己的操作系统称为第一个面向云的操作系统。Oracle在发布Solaris 11.2时继续它以云为中心的基调。但是，这些说法没有告诉我们为什么Solaris是配得上云的。幸好，我们不需要等太久。Solaris11.2有4个重要的技术可以在一个有效的云实现中发挥重要作用：OpenStack、内核域、统一存档（UA）和弹性虚拟交换（EVS）。
spring学习——springmvc（一） a-john springMVC
Spring MVC基于模型-视图-控制器（Model-View-Controller，MVC）实现，能够帮助我们构建像Spring框架那样灵活和松耦合的Web应用程序。 1，跟踪Spring MVC的请求请求的第一站是Spring的DispatcherServlet。与大多数基于Java的Web框架一样，Spring MVC所有的请求都会通过一个前端控制器Servlet。前
hdu4342 History repeat itself-------多校联合五 aijuans 数论
水题就不多说什么了。 #include<iostream>#include<cstdlib>#include<stdio.h>#define ll __int64using namespace std;int main(){ int t; ll n; scanf("%d",&t); while(t--)
EJB和javabean的区别 asia007 bean ejb
EJB不是一般的JavaBean,EJB是企业级JavaBean,EJB一共分为3种,实体Bean,消息Bean,会话Bean,书写EJB是需要遵循一定的规范的,具体规范你可以参考相关的资料.另外,要运行EJB,你需要相应的EJB容器,比如Weblogic,Jboss等,而JavaBean不需要,只需要安装Tomcat就可以了 1.EJB用于服务端应用开发, 而JavaBeans
Struts的action和Result总结百合不是茶 struts Action配置 Result配置
一:Action的配置详解: 下面是一个Struts中一个空的Struts.xml的配置文件 <?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE struts PUBLIC &quo
如何带好自已的团队 bijian1013 项目管理团队管理团队
在网上看到博客" 怎么才能让团队成员好好干活"的评论，觉得写的比较好。原文如下：我做团队管理有几年了吧，我和你分享一下我认为带好团队的几点： 1.诚信对团队内成员，无论是技术研究、交流、问题探讨，要尽可能的保持一种诚信的态度，用心去做好，你的团队会感觉得到。 2.努力提
Java代码混淆工具 sunjing ProGuard
Open Source Obfuscators ProGuard http://java-source.net/open-source/obfuscators/proguardProGuard is a free Java class file shrinker and obfuscator. It can detect and remove unused classes, fields, m
【Redis三】基于Redis sentinel的自动failover主从复制 bit1129 redis
在第二篇中使用2.8.17搭建了主从复制，但是它存在Master单点问题，为了解决这个问题，Redis从2.6开始引入sentinel，用于监控和管理Redis的主从复制环境，进行自动failover，即Master挂了后，sentinel自动从从服务器选出一个Master使主从复制集群仍然可以工作，如果Master醒来再次加入集群，只能以从服务器的形式工作。什么是Sentine
使用代理实现Hibernate Dao层自动事务白糖_ DAO spring AOP 框架 Hibernate
都说spring利用AOP实现自动事务处理机制非常好，但在只有hibernate这个框架情况下，我们开启session、管理事务就往往很麻烦。 public void save(Object obj){ Session session = this.getSession(); Transaction tran = session.beginTransaction(); try
maven3实战读书笔记 braveCS maven3
Maven简介是什么？ Is a software project management and comprehension tool.项目管理工具是基于POM概念(工程对象模型) [设计重复、编码重复、文档重复、构建重复，maven最大化消除了构建的重复] [与XP：简单、交流与反馈；测试驱动开发、十分钟构建、持续集成、富有信息的工作区] 功能：
编程之美-子数组的最大乘积 bylijinnan 编程之美
public class MaxProduct { /** * 编程之美子数组的最大乘积 * 题目: 给定一个长度为N的整数数组，只允许使用乘法，不能用除法，计算任意N-1个数的组合中乘积中最大的一组，并写出算法的时间复杂度。 * 以下程序对应书上两种方法，求得“乘积中最大的一组”的乘积——都是有溢出的可能的。 * 但按题目的意思，是要求得这个子数组，而不
读书笔记-2 chengxuyuancsdn 读书笔记
1、反射 2、oracle年-月-日时-分-秒 3、oracle创建有参、无参函数 4、oracle行转列 5、Struts2拦截器 6、Filter过滤器(web.xml) 1、反射 (1)检查类的结构在java.lang.reflect包里有3个类Field,Method,Constructor分别用于描述类的域、方法和构造器。 2、oracle年月日时分秒 s
[求学与房地产]慎重选择IT培训学校 comsci it
关于培训学校的教学和教师的问题,我们就不讨论了,我主要关心的是这个问题培训学校的教学楼和宿舍的环境和稳定性问题我们大家都知道，房子是一个比较昂贵的东西，特别是那种能够当教室的房子... &nb
RMAN配置中通道(CHANNEL)相关参数 PARALLELISM 、FILESPERSET的关系 daizj oracle rman filesperset PARALLELISM
RMAN配置中通道(CHANNEL)相关参数 PARALLELISM 、FILESPERSET的关系转 PARALLELISM --- 我们还可以通过parallelism参数来指定同时"自动"创建多少个通道： RMAN > configure device type disk parallelism 3 ; 表示启动三个通道，可以加快备份恢复的速度。
简单排序:冒泡排序 dieslrae 冒泡排序
public void bubbleSort(int[] array){ for(int i=1;i<array.length;i++){ for(int k=0;k<array.length-i;k++){ if(array[k] > array[k+1]){
初二上学期难记单词三 dcj3sjt126com sciet
concert 音乐会 tonight 今晚 famous 有名的；著名的 song 歌曲 thousand 千 accident 事故；灾难 careless 粗心的，大意的 break 折断；断裂；破碎 heart 心（脏） happen 偶尔发生，碰巧 tourist 旅游者；观光者 science （自然）科学 marry 结婚 subject 题目；
I.安装Memcahce 1. 安装依赖包libevent Memcache需要安装libevent,所以安装前可能需要执行 Shell代码收藏代码 dcj3sjt126com redis
wget http://download.redis.io/redis-stable.tar.gz tar xvzf redis-stable.tar.gz cd redis-stable make 前面3步应该没有问题，主要的问题是执行make的时候，出现了异常。异常一： make[2]: cc: Command not found 异常原因：没有安装g
并发容器 shuizhaosi888 并发容器
通过并发容器来改善同步容器的性能，同步容器将所有对容器状态的访问都串行化，来实现线程安全，这种方式严重降低并发性，当多个线程访问时，吞吐量严重降低。并发容器ConcurrentHashMap 替代同步基于散列的Map，通过Lock控制。 &nb
Spring Security（12）——Remember-Me功能 234390216 Spring Security Remember Me 记住我
Remember-Me功能目录 1.1 概述 1.2 基于简单加密token的方法 1.3 基于持久化token的方法 1.4 Remember-Me相关接口和实现
位运算焦志广位运算
一、位运算符Ｃ语言提供了六种位运算符： & 按位与 | 按位或 ^ 按位异或 ~ 取反 << 左移 >> 右移 1. 按位与运算按位与运算符"&"是双目运算符。其功能是参与运算的两数各对应的二进位相与。只有对应的两个二进位均为1时，结果位才为1 ，否则为0。参与运算的数以补码方式出现。例如：9&am
nodejs 数据库连接 mongodb mysql liguangsong mongodb mysql node 数据库连接
1.mysql 连接 package.json中dependencies加入 "mysql":"~2.7.0" 执行 npm install 在config 下创建文件 database.js
java动态编译 olive6615 java HotSpot jvm 动态编译
在HotSpot虚拟机中，有两个技术是至关重要的，即动态编译(Dynamic compilation)和Profiling。 HotSpot是如何动态编译Javad的bytecode呢？Java bytecode是以解释方式被load到虚拟机的。HotSpot里有一个运行监视器，即Profile Monitor,专门监视
Storm0.9.5的集群部署配置优化 roadrunners 优化 storm.yaml
nimbus结点配置（storm.yaml）信息： # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional inf
101个MySQL 的调节和优化的提示 tomcat_oracle mysql
　1. 拥有足够的物理内存来把整个InnoDB文件加载到内存中——在内存中访问文件时的速度要比在硬盘中访问时快的多。　　2. 不惜一切代价避免使用Swap交换分区 – 交换时是从硬盘读取的，它的速度很慢。　　3. 使用电池供电的RAM（注：RAM即随机存储器）。　　4. 使用高级的RAID（注：Redundant Arrays of Inexpensive Disks，即磁盘阵列
zoj 3829 Known Notation(贪心) 阿尔萨斯 ZOJ
题目链接：zoj 3829 Known Notation 题目大意：给定一个不完整的后缀表达式，要求有2种不同操作，用尽量少的操作使得表达式完整。解题思路：贪心，数字的个数要要保证比∗的个数多1，不够的话优先补在开头是最优的。然后遍历一遍字符串，碰到数字+1，碰到∗-1,保证数字的个数大于等1，如果不够减的话，可以和最后面的一个数字交换位置（用栈维护十分方便），因为添加和交换代价都是1

[kubeflow] controller-runtime源码解析

controller-runtime源码分析

Controller

Controller.Watch

Controller.Start

你可能感兴趣的:(kubernetes,kubernetes)