Celery--Worker
准备:
安装
pip install celery
easy_install celery
使用Redis作为Broker时 ,需安装 celery-with-redis, 一般使用rabbitmq作为Broker
开始:
使用
启动一个worker
简洁--celery -A proj.task worker --loglevel=info
解释: -A 是指对应的应用程序, 其参数是项目中 Celery实例的位置,也即 celery_app = Celery()的位置。
worker 是指这里要启动其中的worker,此时,就启动了一个worker
具体的参数还有很多:
可以使用celery worker --help 进行查看,如需查看celery的参数,可以celery --help 进行查看。
具体内容文末有详细说明。
内部分析:
当启动一个worker的时候,这个worker会与broker建立链接(tcp长链接),然后如果有数据传输,则会创建相应的channel, 这个连接可以有多个channel。然后,worker就会去borker的队列里面取相应的task来进行消费了,这也是典型的消费者生产者模式。
其中,这个worker主要是有四部分组成的,task_pool, consumer, scheduler, mediator。其中,task_pool主要是用来存放的是一些worker,当启动了一个worker,并且提供并发参数的时候,会将一些worker放在这里面。celery默认的并发方式是prefork,也就是多进程的方式,这里只是celery对multiprocessing.Pool进行了轻量的改造,然后给了一个新的名字叫做prefork,这个pool与多进程的进程池的区别就是这个task_pool只是存放一些运行的worker. consumer也就是消费者, 主要是从broker那里接受一些message,然后将message转化为celery.worker.request.Request的一个实例。并且在适当的时候,会把这个请求包装进Task中,Task就是用装饰器app_celery.task()装饰的函数所生成的类,所以可以在自定义的任务函数中使用这个请求参数,获取一些关键的信息。此时,已经了解了task_pool和consumer。
接下来,这个worker具有两套数据结构,这两套数据结构是并行运行的,他们分别是 'ET时刻表' 、就绪队列。
就绪队列:那些 立刻就需要运行的task, 这些task到达worker的时候会被放到这个就绪队列中等待consumer执行。
ETA:是那些有ETA参数,或是rate_limit参数。
未完,待续
附:
celery worker 的相关参数:
Usage: celery worker [options]
Start worker instance.
Examples::
celery worker --app=proj -l info
celery worker -A proj -l info -Q hipri,lopri
celery worker -A proj --concurrency=4
celery worker -A proj --concurrency=1000 -P eventlet
celery worker --autoscale=10,0
Options:
-A APP, --app=APP app instance to use (e.g. module.attr_name)
-b BROKER, --broker=BROKER
url to broker. default is 'amqp://guest@localhost//'
--loader=LOADER name of custom loader class to use.
--config=CONFIG Name of the configuration module
--workdir=WORKING_DIRECTORY
Optional directory to change to after detaching.
-C, --no-color
-q, --quiet
-c CONCURRENCY, --concurrency=CONCURRENCY
Number of child processes processing the queue. The
default is the number of CPUs available on your
system.
-P POOL_CLS, --pool=POOL_CLS
Pool implementation: prefork (default), eventlet,
gevent, solo or threads.
--purge, --discard Purges all waiting tasks before the daemon is started.
**WARNING**: This is unrecoverable, and the tasks will
be deleted from the messaging server.
-l LOGLEVEL, --loglevel=LOGLEVEL
Logging level, choose between DEBUG, INFO, WARNING,
ERROR, CRITICAL, or FATAL.
-n HOSTNAME, --hostname=HOSTNAME
Set custom hostname, e.g. 'w1.%h'. Expands: %h
(hostname), %n (name) and %d, (domain).
-B, --beat Also run the celery beat periodic task scheduler.
Please note that there must only be one instance of
this service.
-s SCHEDULE_FILENAME, --schedule=SCHEDULE_FILENAME
Path to the schedule database if running with the -B
option. Defaults to celerybeat-schedule. The extension
".db" may be appended to the filename. Apply
optimization profile. Supported: default, fair
--scheduler=SCHEDULER_CLS
Scheduler class to use. Default is
celery.beat.PersistentScheduler
-S STATE_DB, --statedb=STATE_DB
Path to the state database. The extension '.db' may be
appended to the filename. Default: None
-E, --events Send events that can be captured by monitors like
celery events, celerymon, and others.
--time-limit=TASK_TIME_LIMIT
Enables a hard time limit (in seconds int/float) for
tasks.
--soft-time-limit=TASK_SOFT_TIME_LIMIT
Enables a soft time limit (in seconds int/float) for
tasks.
--maxtasksperchild=MAX_TASKS_PER_CHILD
Maximum number of tasks a pool worker can execute
before it's terminated and replaced by a new worker.
-Q QUEUES, --queues=QUEUES
List of queues to enable for this worker, separated by
comma. By default all configured queues are enabled.
Example: -Q video,image
-X EXCLUDE_QUEUES, --exclude-queues=EXCLUDE_QUEUES
-I INCLUDE, --include=INCLUDE
Comma separated list of additional modules to import.
Example: -I foo.tasks,bar.tasks
--autoscale=AUTOSCALE
Enable autoscaling by providing max_concurrency,
min_concurrency. Example:: --autoscale=10,3 (always
keep 3 processes, but grow to 10 if necessary)
--autoreload Enable autoreloading.
--no-execv Don't do execv after multiprocessing child fork.
--without-gossip Do not subscribe to other workers events.
--without-mingle Do not synchronize with other workers at startup.
--without-heartbeat Do not send event heartbeats.
--heartbeat-interval=HEARTBEAT_INTERVAL
Interval in seconds at which to send worker heartbeat
-O OPTIMIZATION
-D, --detach
-f LOGFILE, --logfile=LOGFILE
Path to log file. If no logfile is specified, stderr
is used.
--pidfile=PIDFILE Optional file used to store the process pid. The
program will not start if this file already exists and
the pid is still alive.
--uid=UID User id, or user name of the user to run as after
detaching.
--gid=GID Group id, or group name of the main group to change to
after detaching.
--umask=UMASK Effective umask (in octal) of the process after
detaching. Inherits the umask of the parent process
by default.
--executable=EXECUTABLE
Executable to use for the detached process.
--version show program's version number and exit
-h, --help show this help message and exit
设置时区CELERY_TIMEZONE = 'Asia/Shanghai'
启动时区设置CELERY_ENABLE_UTC = True
限制任务的执行频率
下面这个就是限制tasks模块下的add函数,每秒钟只能执行10次CELERY_ANNOTATIONS = {'tasks.add':{'rate_limit':'10/s'}}
或者限制所有的任务的刷新频率CELERY_ANNOTATIONS = {'*':{'rate_limit':'10/s'}}
也可以设置如果任务执行失败后调用的函数
def my_on_failure(self,exc,task_id,args,kwargs,einfo):
print('task failed')
CELERY_ANNOTATIONS = {'*':{'on_failure':my_on_failure}}
并发的worker数量,也是命令行-c指定的数目
事实上并不是worker数量越多越好,保证任务不堆积,加上一些新增任务的预留就可以了CELERYD_CONCURRENCY = 20
celery worker每次去redis取任务的数量,默认值就是4CELERYD_PREFETCH_MULTIPLIER = 4
每个worker执行了多少次任务后就会死掉,建议数量大一些CELERYD_MAX_TASKS_PER_CHILD = 200
使用redis作为任务队列
组成: db+scheme://user:password@host:port/dbnameBROKER_URL = 'redis://127.0.0.1:6379/0'
celery任务执行结果的超时时间CELERY_TASK_RESULT_EXPIRES = 1200
单个任务的运行时间限制,否则会被杀死CELERYD_TASK_TIME_LIMIT = 60
使用redis存储任务执行结果,默认不使用CELERY_RESULT_BACKEND = 'redis://127.0.0.1:6379/1'
将任务结果使用'pickle'序列化成'json'格式
任务序列化方式CELERY_TASK_SERIALIZER = 'pickle'
任务执行结果序列化方式CELERY_RESULT_SERIALIZER = 'json'
也可以直接在Celery对象中设置序列化方式app = Celery('tasks', broker='...', task_serializer='yaml')
关闭限速CELERY_DISABLE_RATE_LIMITS = True
在celery4.x以后,就是BROKER_URL,如果是以前,需要写成CELERY_BROKER_URLBROKER_URL = 'redis://127.0.0.1:6379/0'
指定结果的接收地址CELERY_RESULT_BACKEND = 'redis://127.0.0.1:6379/1'
指定任务序列化方式CELERY_TASK_SERIALIZER = 'msgpack'
指定结果序列化方式CELERY_RESULT_SERIALIZER = 'msgpack'
指定任务接受的序列化类型.CELERY_ACCEPT_CONTENT = ['msgpack']
任务过期时间,celery任务执行结果的超时时间CELERY_TASK_RESULT_EXPIRES = 24 * 60 * 60
任务发送完成是否需要确认,对性能会稍有影响CELERY_ACKS_LATE = True
压缩方案选择,可以是zlib, bzip2,默认是发送没有压缩的数据CELERY_MESSAGE_COMPRESSION = 'zlib'
规定完成任务的时间
在5s内完成任务,否则执行该任务的worker将被杀死,任务移交给父进程CELERYD_TASK_TIME_LIMIT = 5
celery worker的并发数,默认是服务器的内核数目,也是命令行-c参数指定的数目CELERYD_CONCURRENCY = 4
celery worker 每次去BROKER中预取任务的数量CELERYD_PREFETCH_MULTIPLIER = 4
每个worker执行了多少任务就会死掉,默认是无限的CELERYD_MAX_TASKS_PER_CHILD = 40
设置默认的队列名称,如果一个消息不符合其他的队列就会放在默认队列里面,如果什么都不设置的话,数据都会发送到默认的队列中CELERY_DEFAULT_QUEUE = "default"
队列的详细设置
CELERY_QUEUES = {
"default": { # 这是上面指定的默认队列
"exchange": "default",
"exchange_type": "direct",
"routing_key": "default"
},
"topicqueue": { # 这是一个topic队列 凡是topictest开头的routing key都会被放到这个队列
"routing_key": "topic.#",
"exchange": "topic_exchange",
"exchange_type": "topic",
},
"task_eeg": { # 设置扇形交换机
"exchange": "tasks",
"exchange_type": "fanout",
"binding_key": "tasks",
},
或者配置成下面两种方式:
# 配置队列(settings.py)
CELERY_QUEUES = (
Queue('default',
Exchange('default'),
routing_key='default'),
Queue('for_task_collect',
Exchange('for_task_collect'),
routing_key='for_task_collect'),
Queue('for_task_compute',
Exchange('for_task_compute'),
routing_key='for_task_compute'),
)
# 路由(哪个任务放入哪个队列)
CELERY_ROUTES = {
'umonitor.tasks.multiple_thread_metric_collector':
{
'queue': 'for_task_collect',
'routing_key': 'for_task_collect'
},
'compute.tasks.multiple_thread_metric_aggregate':
{
'queue': 'for_task_compute',
'routing_key': 'for_task_compute'
},
'compute.tasks.test':
{
'queue': 'for_task_compute',
'routing_key': 'for_task_compute'
},
}