(本文基于Ansible 2.7)
forks选项是Ansible原生支持的一种支持并发执行的方式,可以通过配置文件指定默认值,可以在运行ansible时指定,也可以在调用ansble API做开发时赋值。
forks选项的接收和处理在lib/ansible/cli/__init__.py 的442-444行:
if fork_opts:
parser.add_option('-f', '--forks', dest='forks', default=C.DEFAULT_FORKS, type='int',
help="specify number of parallel processes to use (default=%s)" % C.DEFAULT_FORKS)
389-391行表明,该值不可小于1
if fork_opts:
if op.forks < 1:
self.parser.error("The number of processes (--forks) must be >= 1")
help信息表明,此选项是“使用的并发进程数”,默认值是C.DEFAULT_FORKS。我们可以在启动ansible时加入 -f 选项覆盖默认值。
那么Ansible是如何在运行中使用这个值的呢?我们知道,ansible是通过TaskQueueManager类来运行任务的(参见 Ansible 源码解析: Ansible的运行过程 )
那么创建process的过程应该也是由TaskQueueManager来负责,我们来看一下TaskQueueManager的run方法:
lib/ansible/executer/task_queue_manager.py,220-299行:
def run(self, play):
'''
Iterates over the roles/tasks in a play, using the given (or default)
strategy for queueing tasks. The default is the linear strategy, which
operates like classic Ansible by keeping all hosts in lock-step with
a given task (meaning no hosts move on to the next task until all hosts
are done with the current task).
'''
if not self._callbacks_loaded:
self.load_callbacks()
all_vars = self._variable_manager.get_vars(play=play)
warn_if_reserved(all_vars)
templar = Templar(loader=self._loader, variables=all_vars)
new_play = play.copy()
new_play.post_validate(templar)
new_play.handlers = new_play.compile_roles_handlers() + new_play.handlers
self.hostvars = HostVars(
inventory=self._inventory,
variable_manager=self._variable_manager,
loader=self._loader,
)
play_context = PlayContext(new_play, self._options, self.passwords, self._connection_lockfile.fileno())
for callback_plugin in self._callback_plugins:
if hasattr(callback_plugin, 'set_play_context'):
callback_plugin.set_play_context(play_context)
self.send_callback('v2_playbook_on_play_start', new_play)
# initialize the shared dictionary containing the notified handlers
self._initialize_notified_handlers(new_play)
# build the iterator
iterator = PlayIterator(
inventory=self._inventory,
play=new_play,
play_context=play_context,
variable_manager=self._variable_manager,
all_vars=all_vars,
start_at_done=self._start_at_done,
)
# adjust to # of workers to configured forks or size of batch, whatever is lower
self._initialize_processes(min(self._options.forks, iterator.batch_size))
# load the specified strategy (or the default linear one)
strategy = strategy_loader.get(new_play.strategy, self)
if strategy is None:
raise AnsibleError("Invalid play strategy specified: %s" % new_play.strategy, obj=play._ds)
# Because the TQM may survive multiple play runs, we start by marking
# any hosts as failed in the iterator here which may have been marked
# as failed in previous runs. Then we clear the internal list of failed
# hosts so we know what failed this round.
for host_name in self._failed_hosts.keys():
host = self._inventory.get_host(host_name)
iterator.mark_host_failed(host)
self.clear_failed_hosts()
# during initialization, the PlayContext will clear the start_at_task
# field to signal that a matching task was found, so check that here
# and remember it so we don't try to skip tasks on future plays
if getattr(self._options, 'start_at_task', None) is not None and play_context.start_at_task is None:
self._start_at_done = True
# and run the play using the strategy and cleanup on way out
play_return = strategy.run(iterator, play_context)
# now re-save the hosts that failed from the iterator to our internal list
for host_name in iterator.get_failed_hosts():
self._failed_hosts[host_name] = True
strategy.cleanup()
self._cleanup_processes()
return play_return
其中第266-267行:
# adjust to # of workers to configured forks or size of batch, whatever is lower
self._initialize_processes(min(self._options.forks, iterator.batch_size))
表明创建process的数量是在目标数量和option中指定的forks数量中取较小值。
而_initialize_processes方法实际上仅仅是建立了一个空worker的list(113-117行)
def _initialize_processes(self, num):
self._workers = []
for i in range(num):
self._workers.append(None)
并没有创建进程的内容。
这时我们注意到,290-291行,play的运行过程是由strategy来负责的:
# and run the play using the strategy and cleanup on way out
play_return = strategy.run(iterator, play_context)
而strategy对象是根据self,即TaskQueueManager对象创建的(269-270行):
# load the specified strategy (or the default linear one)
strategy = strategy_loader.get(new_play.strategy, self)
这时可以到strategy中查看它是如何做的。
默认的strategy是liner(lib/ansible/plugins/strategy/liner.py),但其中仅有一次对workers的引用,是用来处理结果信息的,显然不是我们要寻找的内容。
再去strategy的基类StrategyBase中查找,发现了在liner strategy的run中有调用的_queue_task方法:
lib/ansible/plugins/strategy/__init__.py,279-336行:
def _queue_task(self, host, task, task_vars, play_context):
''' handles queueing the task up to be sent to a worker '''
display.debug("entering _queue_task() for %s/%s" % (host.name, task.action))
# Add a write lock for tasks.
# Maybe this should be added somewhere further up the call stack but
# this is the earliest in the code where we have task (1) extracted
# into its own variable and (2) there's only a single code path
# leading to the module being run. This is called by three
# functions: __init__.py::_do_handler_run(), linear.py::run(), and
# free.py::run() so we'd have to add to all three to do it there.
# The next common higher level is __init__.py::run() and that has
# tasks inside of play_iterator so we'd have to extract them to do it
# there.
if task.action not in action_write_locks.action_write_locks:
display.debug('Creating lock for %s' % task.action)
action_write_locks.action_write_locks[task.action] = Lock()
# and then queue the new task
try:
# create a dummy object with plugin loaders set as an easier
# way to share them with the forked processes
shared_loader_obj = SharedPluginLoaderObj()
queued = False
starting_worker = self._cur_worker
while True:
worker_prc = self._workers[self._cur_worker]
if worker_prc is None or not worker_prc.is_alive():
self._queued_task_cache[(host.name, task._uuid)] = {
'host': host,
'task': task,
'task_vars': task_vars,
'play_context': play_context
}
worker_prc = WorkerProcess(self._final_q, task_vars, host, task, play_context, self._loader, self._variable_manager, shared_loader_obj)
self._workers[self._cur_worker] = worker_prc
worker_prc.start()
display.debug("worker is %d (out of %d available)" % (self._cur_worker + 1, len(self._workers)))
queued = True
self._cur_worker += 1
if self._cur_worker >= len(self._workers):
self._cur_worker = 0
if queued:
break
elif self._cur_worker == starting_worker:
time.sleep(0.0001)
self._pending_results += 1
except (EOFError, IOError, AssertionError) as e:
# most likely an abort
display.debug("got an error while queuing: %s" % e)
return
display.debug("exiting _queue_task() for %s/%s" % (host.name, task.action))
这里,self._cur_worker 是一个计数器,循环中,每次创建WorkerProcess对象后会+1.当self._cur_worker值达到_wokers的长度时,计数器会被清零,继续循环。直到有任务结束。相当于创建了一个子进程池,在未给全部任务分配子进程前,任意子进程退出后就会有新的子进程填补进来,运行新的任务。