Celery参数详解、配置参数

参数详解

Celery--Worker

准备:

安装

pip install celery 

easy_install celery 

使用Redis作为Broker时 ,需安装 celery-with-redis, 一般使用rabbitmq作为Broker

开始:

使用

启动一个worker

简洁--celery -A proj.task worker --loglevel=info

解释: -A 是指对应的应用程序, 其参数是项目中 Celery实例的位置,也即 celery_app = Celery()的位置。

            worker 是指这里要启动其中的worker,此时,就启动了一个worker

具体的参数还有很多:

        可以使用celery worker --help 进行查看,如需查看celery的参数,可以celery --help 进行查看。

        具体内容文末有详细说明。

内部分析:

        当启动一个worker的时候,这个worker会与broker建立链接(tcp长链接),然后如果有数据传输,则会创建相应的channel, 这个连接可以有多个channel。然后,worker就会去borker的队列里面取相应的task来进行消费了,这也是典型的消费者生产者模式。

        其中,这个worker主要是有四部分组成的,task_pool, consumer, scheduler, mediator。其中,task_pool主要是用来存放的是一些worker,当启动了一个worker,并且提供并发参数的时候,会将一些worker放在这里面。celery默认的并发方式是prefork,也就是多进程的方式,这里只是celery对multiprocessing.Pool进行了轻量的改造,然后给了一个新的名字叫做prefork,这个pool与多进程的进程池的区别就是这个task_pool只是存放一些运行的worker. consumer也就是消费者, 主要是从broker那里接受一些message,然后将message转化为celery.worker.request.Request的一个实例。并且在适当的时候,会把这个请求包装进Task中,Task就是用装饰器app_celery.task()装饰的函数所生成的类,所以可以在自定义的任务函数中使用这个请求参数,获取一些关键的信息。此时,已经了解了task_pool和consumer。

        接下来,这个worker具有两套数据结构,这两套数据结构是并行运行的,他们分别是 'ET时刻表' 、就绪队列。

        就绪队列:那些 立刻就需要运行的task, 这些task到达worker的时候会被放到这个就绪队列中等待consumer执行。

        ETA:是那些有ETA参数,或是rate_limit参数。

        未完,待续

附:

        celery worker 的相关参数:

Usage: celery worker [options]
 
Start worker instance.
 
Examples::
 
    celery worker --app=proj -l info
    celery worker -A proj -l info -Q hipri,lopri
 
    celery worker -A proj --concurrency=4
    celery worker -A proj --concurrency=1000 -P eventlet
 
    celery worker --autoscale=10,0
 
Options:
  -A APP, --app=APP     app instance to use (e.g. module.attr_name)
  -b BROKER, --broker=BROKER
                        url to broker.  default is 'amqp://guest@localhost//'
  --loader=LOADER       name of custom loader class to use.
  --config=CONFIG       Name of the configuration module
  --workdir=WORKING_DIRECTORY
                        Optional directory to change to after detaching.
  -C, --no-color
  -q, --quiet
  -c CONCURRENCY, --concurrency=CONCURRENCY
                        Number of child processes processing the queue. The
                        default is the number of CPUs available on your
                        system.
  -P POOL_CLS, --pool=POOL_CLS
                        Pool implementation: prefork (default), eventlet,
                        gevent, solo or threads.
  --purge, --discard    Purges all waiting tasks before the daemon is started.
                        **WARNING**: This is unrecoverable, and the tasks will
                        be deleted from the messaging server.
  -l LOGLEVEL, --loglevel=LOGLEVEL
                        Logging level, choose between DEBUG, INFO, WARNING,
                        ERROR, CRITICAL, or FATAL.
  -n HOSTNAME, --hostname=HOSTNAME
                        Set custom hostname, e.g. 'w1.%h'. Expands: %h
                        (hostname), %n (name) and %d, (domain).
  -B, --beat            Also run the celery beat periodic task scheduler.
                        Please note that there must only be one instance of
                        this service.
  -s SCHEDULE_FILENAME, --schedule=SCHEDULE_FILENAME
                        Path to the schedule database if running with the -B
                        option. Defaults to celerybeat-schedule. The extension
                        ".db" may be appended to the filename. Apply
                        optimization profile.  Supported: default, fair
  --scheduler=SCHEDULER_CLS
                        Scheduler class to use. Default is
                        celery.beat.PersistentScheduler
  -S STATE_DB, --statedb=STATE_DB
                        Path to the state database. The extension '.db' may be
                        appended to the filename. Default: None
  -E, --events          Send events that can be captured by monitors like
                        celery events, celerymon, and others.
  --time-limit=TASK_TIME_LIMIT
                        Enables a hard time limit (in seconds int/float) for
                        tasks.
  --soft-time-limit=TASK_SOFT_TIME_LIMIT
                        Enables a soft time limit (in seconds int/float) for
                        tasks.
  --maxtasksperchild=MAX_TASKS_PER_CHILD
                        Maximum number of tasks a pool worker can execute
                        before it's terminated and replaced by a new worker.
  -Q QUEUES, --queues=QUEUES
                        List of queues to enable for this worker, separated by
                        comma. By default all configured queues are enabled.
                        Example: -Q video,image
  -X EXCLUDE_QUEUES, --exclude-queues=EXCLUDE_QUEUES
  -I INCLUDE, --include=INCLUDE
                        Comma separated list of additional modules to import.
                        Example: -I foo.tasks,bar.tasks
  --autoscale=AUTOSCALE
                        Enable autoscaling by providing max_concurrency,
                        min_concurrency. Example:: --autoscale=10,3 (always
                        keep 3 processes, but grow to 10 if necessary)
  --autoreload          Enable autoreloading.
  --no-execv            Don't do execv after multiprocessing child fork.
  --without-gossip      Do not subscribe to other workers events.
  --without-mingle      Do not synchronize with other workers at startup.
  --without-heartbeat   Do not send event heartbeats.
  --heartbeat-interval=HEARTBEAT_INTERVAL
                        Interval in seconds at which to send worker heartbeat
  -O OPTIMIZATION
  -D, --detach
  -f LOGFILE, --logfile=LOGFILE
                        Path to log file. If no logfile is specified, stderr
                        is used.
  --pidfile=PIDFILE     Optional file used to store the process pid. The
                        program will not start if this file already exists and
                        the pid is still alive.
  --uid=UID             User id, or user name of the user to run as after
                        detaching.
  --gid=GID             Group id, or group name of the main group to change to
                        after detaching.
  --umask=UMASK         Effective umask (in octal) of the process after
                        detaching.  Inherits the umask of the parent process
                        by default.
  --executable=EXECUTABLE
                        Executable to use for the detached process.
  --version             show program's version number and exit
  -h, --help            show this help message and exit

 

常用配置介绍

设置时区
CELERY_TIMEZONE = 'Asia/Shanghai'
启动时区设置
CELERY_ENABLE_UTC = True


限制任务的执行频率
下面这个就是限制tasks模块下的add函数,每秒钟只能执行10次
CELERY_ANNOTATIONS = {'tasks.add':{'rate_limit':'10/s'}}
或者限制所有的任务的刷新频率
CELERY_ANNOTATIONS = {'*':{'rate_limit':'10/s'}}
也可以设置如果任务执行失败后调用的函数

def my_on_failure(self,exc,task_id,args,kwargs,einfo):
    print('task failed')

CELERY_ANNOTATIONS = {'*':{'on_failure':my_on_failure}}

 

并发的worker数量,也是命令行-c指定的数目
事实上并不是worker数量越多越好,保证任务不堆积,加上一些新增任务的预留就可以了
CELERYD_CONCURRENCY = 20

 

celery worker每次去redis取任务的数量,默认值就是4
CELERYD_PREFETCH_MULTIPLIER = 4

 

每个worker执行了多少次任务后就会死掉,建议数量大一些
CELERYD_MAX_TASKS_PER_CHILD = 200

 

使用redis作为任务队列
组成: db+scheme://user:password@host:port/dbname
BROKER_URL = 'redis://127.0.0.1:6379/0'

 

celery任务执行结果的超时时间
CELERY_TASK_RESULT_EXPIRES = 1200
单个任务的运行时间限制,否则会被杀死
CELERYD_TASK_TIME_LIMIT = 60

 

使用redis存储任务执行结果,默认不使用
CELERY_RESULT_BACKEND = 'redis://127.0.0.1:6379/1'

 

将任务结果使用'pickle'序列化成'json'格式
任务序列化方式
CELERY_TASK_SERIALIZER = 'pickle'
任务执行结果序列化方式
CELERY_RESULT_SERIALIZER = 'json'
也可以直接在Celery对象中设置序列化方式
app = Celery('tasks', broker='...', task_serializer='yaml')


关闭限速
CELERY_DISABLE_RATE_LIMITS = True

 

一份比较常用的配置文件:

在celery4.x以后,就是BROKER_URL,如果是以前,需要写成CELERY_BROKER_URL
BROKER_URL = 'redis://127.0.0.1:6379/0'
指定结果的接收地址
CELERY_RESULT_BACKEND = 'redis://127.0.0.1:6379/1'
 

指定任务序列化方式
CELERY_TASK_SERIALIZER = 'msgpack'
指定结果序列化方式
CELERY_RESULT_SERIALIZER = 'msgpack'
指定任务接受的序列化类型.
CELERY_ACCEPT_CONTENT = ['msgpack']
 

任务过期时间,celery任务执行结果的超时时间
CELERY_TASK_RESULT_EXPIRES = 24 * 60 * 60
 

任务发送完成是否需要确认,对性能会稍有影响
CELERY_ACKS_LATE = True
 

压缩方案选择,可以是zlib, bzip2,默认是发送没有压缩的数据
CELERY_MESSAGE_COMPRESSION = 'zlib'
 

规定完成任务的时间
在5s内完成任务,否则执行该任务的worker将被杀死,任务移交给父进程
CELERYD_TASK_TIME_LIMIT = 5
 

celery worker的并发数,默认是服务器的内核数目,也是命令行-c参数指定的数目
CELERYD_CONCURRENCY = 4
 

celery worker 每次去BROKER中预取任务的数量
CELERYD_PREFETCH_MULTIPLIER = 4
 

每个worker执行了多少任务就会死掉,默认是无限的
CELERYD_MAX_TASKS_PER_CHILD = 40
 

设置默认的队列名称,如果一个消息不符合其他的队列就会放在默认队列里面,如果什么都不设置的话,数据都会发送到默认的队列中
CELERY_DEFAULT_QUEUE = "default"
队列的详细设置

CELERY_QUEUES = {
    "default": { # 这是上面指定的默认队列
        "exchange": "default",
        "exchange_type": "direct",
        "routing_key": "default"
    },
    "topicqueue": { # 这是一个topic队列 凡是topictest开头的routing key都会被放到这个队列
        "routing_key": "topic.#",
        "exchange": "topic_exchange",
        "exchange_type": "topic",
    },
    "task_eeg": { # 设置扇形交换机
        "exchange": "tasks",
        "exchange_type": "fanout",
        "binding_key": "tasks",
    },

或者配置成下面两种方式:

# 配置队列(settings.py)
CELERY_QUEUES = (
    Queue('default', 
        Exchange('default'), 
        routing_key='default'),
    Queue('for_task_collect', 
        Exchange('for_task_collect'), 
        routing_key='for_task_collect'),
    Queue('for_task_compute', 
        Exchange('for_task_compute'), 
        routing_key='for_task_compute'),
)
# 路由(哪个任务放入哪个队列)
CELERY_ROUTES = {
    'umonitor.tasks.multiple_thread_metric_collector': 
    {
        'queue': 'for_task_collect', 
        'routing_key': 'for_task_collect'
    },
    'compute.tasks.multiple_thread_metric_aggregate': 
    {
        'queue': 'for_task_compute', 
        'routing_key': 'for_task_compute'
    },
    'compute.tasks.test': 
    {
         'queue': 'for_task_compute',
         'routing_key': 'for_task_compute'
    },
}

 

你可能感兴趣的:(Python)