转载链接:http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html
目录
目录
1. Introduction
1.1. What is a resource agent?
1.2. Who or what uses a resource agent?
1.3. Which language is a resource agent written in?
2. API definitions
2.1. Environment variables
2.2. Actions
2.3. Timeouts
2.4. Metadata
3. Return codes
3.1. OCF_SUCCESS (0)
3.2. OCF_ERR_GENERIC (1)
3.3. OCF_ERR_ARGS (2)
3.4. OCF_ERR_UNIMPLEMENTED (3)
3.5. OCF_ERR_PERM (4)
3.6. OCF_ERR_INSTALLED (5)
3.7. OCF_ERR_CONFIGURED (6)
3.8. OCF_NOT_RUNNING (7)
3.9. OCF_RUNNING_MASTER (8)
3.10. OCF_FAILED_MASTER (9)
4. Resource agent structure
4.1. Resource agent interpreter
4.2. Author and license information
4.3. Initialization
4.4. Functions implementing resource agent actions
4.5. Execution block
5. Resource agent actions
5.1. start action
5.2. stop action
5.3. monitor action
5.4. validate-all action
5.5. meta-data action
5.6. promote action
5.7. demote action
5.8. migrate_to action
5.9. migrate_from action
5.10. notify action
6. Script variables
6.1. $OCF_ROOT
6.2. $OCF_FUNCTIONS_DIR
6.3. $OCF_RESOURCE_INSTANCE
6.4. $__OCF_ACTION
6.5. $__SCRIPT_NAME
6.6. $HA_RSCTMP
7. Convenience functions
7.1. Logging: ocf_log
7.2. Testing for binaries: have_binary and check_binary
7.3. Executing commands and capturing their output: ocf_run
7.4. Locks: ocf_take_lock and ocf_release_lock_on_exit
7.5. Testing for numerical values: ocf_is_decimal
7.6. Testing for boolean values: ocf_is_true
7.7. Pseudo resources: ha_pseudo_resource
8. Conventions
8.1. Well-known parameter names
8.2. Parameter defaults
8.3. Honoring PATH for binaries
9. Special considerations
9.1. Licensing
9.2. Locale settings
9.3. Testing for running processes
9.4. Specifying a master preference
10. Testing resource agents
10.1. Testing with ocf-tester
10.2. Testing with ocft
11. Installing and packaging resource agents
11.1. Installing resource agents
11.2. Packaging resource agents
11.3. Submitting resource agents
11.4. Maintaining resource agents
This document is to serve as a guide and reference for all developers, maintainers, and contributors working on OCF (Open Cluster Framework) compliant cluster resource agents. It explains the anatomy and general functionality of a resource agent, illustrates the resource agent API, and provides valuable hints and tips to resource agent authors.
A resource agent is an executable that manages a cluster resource. No formal definition of a cluster resource exists, other than "anything a cluster manages is a resource." Cluster resources can be as diverse as IP addresses, file systems, database services, and entire virtual machines — to name just a few examples.
Any Open Cluster Framework (OCF) compliant cluster management application is capable of managing resources using the resource agents described in this document. At the time of writing, two OCF compliant cluster management applications exist for the Linux platform:
An OCF compliant resource agent can be implemented in any programming language. The API is not language specific. However, most resource agents are implemented as shell scripts, which is why this guide primarily uses example code written in shell language.
A resource agent receives all configuration information about the resource it manages via environment variables. The names of these environment variables are always the name of the resource parameter, prefixed with OCF_RESKEY_
. For example, if the resource has an ip
parameter set to 192.168.1.1
, then the resource agent will have access to an environment variable OCF_RESKEY_ip
holding that value.
For any resource parameter that is not required to be set by the user — that is, its parameter definition in the resource agent metadata does not specify required="true"
— then the resource agent must
OCF_RESKEY__default
that holds this default.In addition, the cluster manager may also support meta resource parameters. These do not apply directly to the resource configuration, but rather specify how the cluster resource manager is expected to manage the resource. For example, the Pacemaker cluster manager uses the target-role
meta parameter to specify whether the resource should be started or stopped.
Meta parameters are passed into the resource agent in the OCF_RESKEY_CRM_meta_
namespace, with any hypens converted to underscores. Thus, the target-role
attribute maps to an environment variable named OCF_RESKEY_CRM_meta_target_role
.
Any resource agent must support one command-line argument which specifies the action the resource agent is about to execute. The following actions must be supported by any resource agent:
start
— starts the resource.stop
— shuts down the resource.monitor
— queries the resource for its state.meta-data
— dumps the resource agent metadata.In addition, resource agents may optionally support the following actions:
promote
— turns a resource into the Master
role (Master/Slave resources only).demote
— turns a resource into the Slave
role (Master/Slave resources only).migrate_to
and migrate_from
— implement live migration of resources.validate-all
— validates a resource’s configuration.usage
or help
— displays a usage message when the resource agent is invoked from the command line, rather than by the cluster manager.status
— historical (deprecated) synonym for monitor
.Action timeouts are enforced outside the resource agent proper. It is the cluster manager’s responsibility to monitor how long a resource agent action has been running, and terminate it if it does not meet its completion deadline. Thus, resource agents need not themselves check for any timeout expiry.
Resource agents can, however, advise the user of sensible timeout values (which, when correctly set, will be duly enforced by the cluster manager). See the following section for details on how a resource agent advertises its suggested timeouts.
Every resource agent must describe its own purpose and supported parameters in a set of XML metadata. This metadata is used by cluster management applications for on-line help, and resource agent man pages are generated from it as well. The following is a fictitious set of metadata from an imaginary resource agent:
0.1
This is a fictitious example resource agent written for the
OCF Resource Agent Developers Guide.
Example resource agent
for budding OCF RA developers
Number of eggs, an example numeric parameter
Number of eggs
Enable superfrobnication, an example boolean parameter
Enable superfrobnication
Data directory, an example string parameter
Data directory
The resource-agent
element, of which there must only be one per resource agent, defines the resource agent name
and version
.
The longdesc
and shortdesc
elements in resource-agent
provide a long and short description of the resource agent’s functionality. While shortdesc
is a one-line description of what the resource agent does and is usually used in terse listings,longdesc
should give a full-blown description of the resource agent in as much detail as possible.
The parameters
element describes the resource agent parameters, and should hold any number of parameter
children — one for each parameter that the resource agent supports.
Every parameter
should, like the resource-agent
as a whole, come with a shortdesc
and a longdesc
, and also a content
child that describes the parameter’s expected content. parameter
supports the following attributes:
required
indicates whether setting the parameter is mandatory (required="0"
) or optional (required="1"
).unique
(allowed values: 0
or 1
) indicates that a specific value must be unique across the cluster, for this parameter of this particular resource type. For example, a highly available floating IP address is declared unique
— as that one IP address should run only once throughout the cluster, avoiding duplicates.On the content
element, there may be two attributes:
type
describes the parameter type (string
, integer
, or boolean
). If unset, type
defaults to string
.default
attribute.The actions
list defines the actions that the resource agent advertises as supported.
Every action
should list its own timeout
value. This is a hint to the user what minimal timeout should be configured for the action. This is meant to cater for the fact that some resources are quick to start and stop (IP addresses or filesystems, for example), some may take several minutes to do so (such as databases).
In addition, recurring actions (such as monitor
) should also specify a recommended minimum interval
, which is the time between two consecutive invocations of the same action. Like timeout
, this value does not constitute a default — it is merely a hint for the user which action interval to configure, at minimum.
For any invocation, resource agents must exit with a defined return code that informs the caller of the outcome of the invoked action. The return codes are explained in detail in the following subsections.
OCF_SUCCESS
(0)The action completed successfully. This is the expected return code for any successful start
, stop
, promote
, demote
, migrate_from
, migrate_to
, meta_data
, help
, and usage
action.
For monitor
(and its deprecated alias, status
), however, a modified convention applies:
OCF_SUCCESS
from monitor
means that the resource is running. Non-running and gracefully shut-down resources must instead return OCF_NOT_RUNNING
.OCF_SUCCESS
from monitor
means that the resource is running in Slave mode. Resources running in Master mode must instead return OCF_RUNNING_MASTER
, and gracefully shut-down resources must instead return OCF_NOT_RUNNING
.OCF_ERR_GENERIC
(1)The action returned a generic error. A resource agent should use this exit code only when none of the more specific error codes, defined below, accurately describes the problem.
The cluster resource manager interprets this exit code as a soft error. This means that unless specifically configured otherwise, the resource manager will attempt to recover a resource which failed with OCF_ERR_GENERIC
in-place — usually by restarting the resource on the same node.
OCF_ERR_ARGS
(2)The resource agent was invoked with incorrect arguments. This is a safety net "can’t happen" error which the resource agent should only return when invoked with, for example, an incorrect number of command line arguments.
Note | |
---|---|
The resource agent should not return this error when instructed to perform an action that it does not support. Instead, under those circumstances, it should return |
OCF_ERR_UNIMPLEMENTED
(3)The resource agent was instructed to execute an action that the agent does not implement.
Not all resource agent actions are mandatory. promote
, demote
, migrate_to
, migrate_from
, and notify
, are all optional actions which the resource agent may or may not implement. When a non-stateful resource agent is misconfigured as a master/slave resource, for example, then the resource agent should alert the user about this misconfiguration by returning OCF_ERR_UNIMPLEMENTED
on the promote
and demote
actions.
OCF_ERR_PERM
(4)The action failed due to insufficient permissions. This may be due to the agent not being able to open a certain file, to listen on a specific socket, to write to a directory, or similar.
The cluster resource manager interprets this exit code as a hard error. This means that unless specifically configured otherwise, the resource manager will attempt to recover a resource which failed with this error by restarting the resource on a different node (where the permission problem may not exist).
OCF_ERR_INSTALLED
(5)The action failed because a required component is missing on the node where the action was executed. This may be due to a required binary not being executable, or a vital configuration file being unreadable.
The cluster resource manager interprets this exit code as a hard error. This means that unless specifically configured otherwise, the resource manager will attempt to recover a resource which failed with this error by restarting the resource on a different node (where the required files or binaries may be present).
OCF_ERR_CONFIGURED
(6)The action failed because the user misconfigured the resource. For example, the user may have configured an alphanumeric string for a parameter that really should be an integer.
The cluster resource manager interprets this exit code as a fatal error. Since this is a configuration error that is present cluster-wide, it would make no sense to recover such a resource on a different node, let alone in-place. When a resource fails with this error, the cluster manager will attempt to shut down the resource, and wait for administrator intervention.
OCF_NOT_RUNNING
(7)The resource was found not to be running. This is an exit code that may be returned by the monitor
action exclusively. Note that this implies that the resource has either gracefully shut down, or has never been started.
If the resource is not running due to an error condition, the monitor
action should instead return one of the OCF_ERR_
exit codes or OCF_FAILED_MASTER
.
OCF_RUNNING_MASTER
(8)The resource was found to be running in the Master
role. This applies only to stateful (Master/Slave) resources, and only to their monitor
action.
Note that there is no specific exit code for "running in slave mode". This is because their is no functional distinction between a primitive resource running normally, and a stateful resource running as a slave. The monitor
action of a stateful resource running normally in the Slave
role should simply return OCF_SUCCESS
.
OCF_FAILED_MASTER
(9)The resource was found to have failed in the Master
role. This applies only to stateful (Master/Slave) resources, and only to their monitor
action.
The cluster resource manager interprets this exit code as a soft error. This means that unless specifically configured otherwise, the resource manager will attempt to recover a resource which failed with $OCF_FAILED_MASTER
in-place — usually by demoting, stopping, starting and then promoting the resource on the same node.
A typical (shell-based) resource agent contains standard structural items, in the order as listed in this section. It describes the expected behavior of a resource agent with respect to the various actions it supports, using a fictitous resource agent named foobar
as an example.
Any resource agent implemented as a script must specify its interpreter using standard "shebang" (#!
) header syntax.
#!/bin/sh
If a resource agent is written in shell, specifying the generic shell interpreter (#!/bin/sh
) is generally preferred, though not required. Resource agents declared as /bin/sh
compatible must not use constructs native to a specific shell (such as, for example, ${!variable}
syntax native to bash
). It is advisable to occasionally run such resource agents through a sanitization utility such as checkbashisms
.
It is considered a regression to introduce a patch that will make a previously sh
compatible resource agent suitable only for bash
, ksh
, or any other non-generic shell. It is, however, perfectly acceptable for a new resource agent to explicitly define a specific shell, such as /bin/bash
, as its interpreter.
The resource agent should contain a comment listing the resource agent author(s) and/or copyright holder(s), and stating the license that applies to the resource agent:
#
# Resource Agent for managing foobar resources.
#
# License: GNU General Public License (GPL)
# (c) 2008-2010 John Doe, Jane Roe,
# and Linux-HA contributors
When a resource agent refers to a license for which multiple versions exist, it is assumed that the current version applies.
Any shell resource agent should source the .ocf-shellfuncs
function library. With the syntax below, this is done in terms of$OCF_FUNCTIONS_DIR
, which — for testing purposes, and also for generating documentation — may be overridden from the command line.
# Initialization:
: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
. ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs
What follows next are the functions implementing the resource agent’s advertised actions. The individual actions are described in detail in Section 5, “Resource agent actions”.
This is the part of the resource agent that actually executes when the resource agent is invoked. It typically follows a fairly standard structure:
# Make sure meta-data and usage always succeed
case $__OCF_ACTION in
meta-data) foobar_meta_data
exit $OCF_SUCCESS
;;
usage|help) foobar_usage
exit $OCF_SUCCESS
;;
esac
Each action is typically implemented in a separate function or method in the resource agent. By convention, these are usually named
, so the function implementing the start
action in foobar
would be named foobar_start()
.
As a general rule, whenever the resource agent encounters an error that it is not able to recover, it is permitted to immediately exit, throw an exception, or otherwise cease execution. Examples for this include configuration issues, missing binaries, permission problems, etc. It is not necessary to pass these errors up the call stack.
It is the cluster manager’s responsibility to initiate the appropriate recovery action based on the user’s configuration. The resource agent should not guess at said configuration.
start
actionWhen invoked with the start
action, the resource agent must start the resource if it is not yet running. This means that the agent must verify the resource’s configuration, query its state, and then start it only if it is not running. A common way of doing this would be to invoke the validate_all
and monitor
function first, as in the following example:
foobar_start() {
# exit immediately if configuration is not valid
foobar_validate_all || exit $?
# if resource is already running, bail out early
if foobar_monitor; then
ocf_log info "Resource is already running"
return $OCF_SUCCESS
fi
# actually start up the resource here (make sure to immediately
# exit with an $OCF_ERR_ error code if anything goes seriously
# wrong)
...
# After the resource has been started, check whether it started up
# correctly. If the resource starts asynchronously, the agent may
# spin on the monitor function here -- if the resource does not
# start up within the defined timeout, the cluster manager will
# consider the start action failed
while ! foobar_monitor; do
ocf_log debug "Resource has not started yet, waiting"
sleep 1
done
# only return $OCF_SUCCESS if _everything_ succeeded as expected
return $OCF_SUCCESS
}
# Anything other than meta-data and usage must pass validation
foobar_validate_all || exit $?
# Translate each action into the appropriate function call
case $__OCF_ACTION in
start) foobar_start;;
stop) foobar_stop;;
status|monitor) foobar_monitor;;
promote) foobar_promote;;
demote) foobar_demote;;
reload) ocf_log info "Reloading..."
foobar_start
;;
validate-all) ;;
*) foobar_usage
exit $OCF_ERR_UNIMPLEMENTED
;;
esac
rc=$?
# The resource agent may optionally log a debug message
ocf_log debug "${OCF_RESOURCE_INSTANCE} $__OCF_ACTION returned $rc"
exit $rc
stop
actionWhen invoked with the stop
action, the resource agent must stop the resource, if it is running. This means that the agent must verify the resource configuration, query its state, and then stop it only if it is currently running. A common way of doing this would be to invoke the validate_all
and monitor
function first. It is important to understand that stop
is a force operation — the resource agent must do everything in its power to shut down, the resource, short of rebooting the node or shutting it off. Consider the following example:
foobar_stop() {
local rc
# exit immediately if configuration is not valid
foobar_validate_all || exit $?
foobar_monitor
rc=$?
case "$rc" in
"$OCF_SUCCESS")
# Currently running. Normal, expected behavior.
ocf_log debug "Resource is currently running"
;;
"$OCF_RUNNING_MASTER")
# Running as a Master. Need to demote before stopping.
ocf_log info "Resource is currently running as Master"
foobar_demote || \
ocf_log warn "Demote failed, trying to stop anyway"
;;
"$OCF_NOT_RUNNING")
# Currently not running. Nothing to do.
ocf_log info "Resource is already stopped"
return $OCF_SUCCESS
;;
esac
# actually shut down the resource here (make sure to immediately
# exit with an $OCF_ERR_ error code if anything goes seriously
# wrong)
...
# After the resource has been stopped, check whether it shut down
# correctly. If the resource stops asynchronously, the agent may
# spin on the monitor function here -- if the resource does not
# shut down within the defined timeout, the cluster manager will
# consider the stop action failed
while foobar_monitor; do
ocf_log debug "Resource has not stopped yet, waiting"
sleep 1
done
# only return $OCF_SUCCESS if _everything_ succeeded as expected
return $OCF_SUCCESS
}
Note | |
---|---|
The expected exit code for a successful stop operation is |
monitor
actionThe monitor
action queries the current status of a resource. It must discern between three different states:
$OCF_SUCCESS
);$OCF_NOT_RUNNING
);$OCF_ERR_
code to indicate the nature of the problem).foobar_monitor() {
local rc
# exit immediately if configuration is not valid
foobar_validate_all || exit $?
ocf_run frobnicate --test
# This example assumes the following exit code convention
# for frobnicate:
# 0: running, and fully caught up with master
# 1: gracefully stopped
# any other: error
case "$?" in
0)
rc=$OCF_SUCCESS
ocf_log debug "Resource is running"
;;
1)
rc=$OCF_NOT_RUNNING
ocf_log debug "Resource is not running"
;;
*)
ocf_log err "Resource has failed"
exit $OCF_ERR_GENERIC
esac
return $rc
}
Stateful (master/slave) resource agents may use a more elaborate monitoring scheme where they can provide "hints" to the cluster manager identifying which instance is best suited to assume the Master
role. Section 9.4, “Specifying a master preference” explains the details.
validate-all
actionThe validate-all
action tests for correct resource agent configuration and a working environment. validate-all
should exit with one of the following return codes:
$OCF_SUCCESS
— all is well, the configuration is valid and usable.$OCF_ERR_CONFIGURED
— the user has misconfigured the resource.$OCF_ERR_INSTALLED
— the resource has possibly been configured correctly, but a vital component is missing on the node where validate-all
is being executed.$OCF_ERR_PERM
— the resource is configured correctly and is not missing any required components, but is suffering from a permission issue (such as not being able to create a necessary file).validate-all
is usually wrapped in a function that is not only called when explicitly invoking the corresponding action, but also — as a sanity check — from just about any other function. Therefore, the resource agent author must keep in mind that the function may be invoked during the start
, stop
, and monitor
operations, and also during probes.
Probes pose a separate challenge for validation. During a probe (when the cluster manager may expect the resource notto be running on the node where the probe is executed), some required components may be expected to not be available on the affected node. For example, this includes any shared data on storage devices not available for reading during the probe. The validate-all
function may thus need to treat probes specially, using the ocf_is_probe
convenience function:
foobar_validate_all() {
# Test for configuration errors first
if ! ocf_is_decimal $OCF_RESKEY_eggs; then
ocf_log err "eggs is not numeric!"
exit $OCF_ERR_CONFIGURED
fi
# Test for required binaries
check_binary frobnicate
# Check for data directory (this may be on shared storage, so
# disable this test during probes)
if ! ocf_is_probe; then
if ! [ -d $OCF_RESKEY_datadir ]; then
ocf_log err "$OCF_RESKEY_datadir does not exist or is not a directory!"
exit $OCF_ERR_INSTALLED
fi
fi
return $OCF_SUCCESS
}
meta-data
actionThe meta-data
action dumps the resource agent metadata to standard output. The output must follow the metadata format as specified in Section 2.4, “Metadata”.
foobar_meta_data {
cat <
0.1
...
EOF
}
promote
actionThe promote
action is optional. It must only be supported by stateful resource agents, which means agents that discern between two distinct roles: Master
and Slave
. Slave
is functionally identical to the Started
state in a stateless resource agent. Thus, while a regular (stateless) resource agent only needs to implement start
and stop
, a stateful resource agent must also support the promote
action to be able to make a transition between the Started
(Slave
) and Master
roles.
foobar_promote() {
local rc
# exit immediately if configuration is not valid
foobar_validate_all || exit $?
# test the resource's current state
foobar_monitor
rc=$?
case "$rc" in
"$OCF_SUCCESS")
# Running as slave. Normal, expected behavior.
ocf_log debug "Resource is currently running as Slave"
;;
"$OCF_RUNNING_MASTER")
# Already a master. Unexpected, but not a problem.
ocf_log info "Resource is already running as Master"
return $OCF_SUCCESS
;;
"$OCF_NOT_RUNNING")
# Currently not running. Need to start before promoting.
ocf_log info "Resource is currently not running"
foobar_start
;;
*)
# Failed resource. Let the cluster manager recover.
ocf_log err "Unexpected error, cannot promote"
exit $rc
;;
esac
# actually promote the resource here (make sure to immediately
# exit with an $OCF_ERR_ error code if anything goes seriously
# wrong)
ocf_run frobnicate --master-mode || exit $OCF_ERR_GENERIC
# After the resource has been promoted, check whether the
# promotion worked. If the resource promotion is asynchronous, the
# agent may spin on the monitor function here -- if the resource
# does not assume the Master role within the defined timeout, the
# cluster manager will consider the promote action failed.
while true; do
foobar_monitor
if [ $? -eq $OCF_RUNNING_MASTER ]; then
ocf_log debug "Resource promoted"
break
else
ocf_log debug "Resource still awaiting promotion"
sleep 1
fi
done
# only return $OCF_SUCCESS if _everything_ succeeded as expected
return $OCF_SUCCESS
}
demote
actionThe demote
action is optional. It must only be supported by stateful resource agents, which means agents that discern between two distict roles: Master
and Slave
. Slave
is functionally identical to the Started
state in a stateless resource agent. Thus, while a regular (stateless) resource agent only needs to implement start
and stop
, a stateful resource agent must also support the demote
action to be able to make a transition between the Master
and Started
(Slave
) roles.
foobar_demote() {
local rc
# exit immediately if configuration is not valid
foobar_validate_all || exit $?
# test the resource's current state
foobar_monitor
rc=$?
case "$rc" in
"$OCF_RUNNING_MASTER")
# Running as master. Normal, expected behavior.
ocf_log debug "Resource is currently running as Master"
;;
"$OCF_SUCCESS")
# Alread running as slave. Nothing to do.
ocf_log debug "Resource is currently running as Slave"
return $OCF_SUCCESS
;;
"$OCF_NOT_RUNNING")
# Currently not running. Getting a demote action
# in this state is unexpected. Exit with an error
# and let the cluster manager recover.
ocf_log err "Resource is currently not running"
exit $OCF_ERR_GENERIC
;;
*)
# Failed resource. Let the cluster manager recover.
ocf_log err "Unexpected error, cannot demote"
exit $rc
;;
esac
# actually demote the resource here (make sure to immediately
# exit with an $OCF_ERR_ error code if anything goes seriously
# wrong)
ocf_run frobnicate --unset-master-mode || exit $OCF_ERR_GENERIC
# After the resource has been demoted, check whether the
# demotion worked. If the resource demotion is asynchronous, the
# agent may spin on the monitor function here -- if the resource
# does not assume the Slave role within the defined timeout, the
# cluster manager will consider the demote action failed.
while true; do
foobar_monitor
if [ $? -eq $OCF_RUNNING_MASTER ]; then
ocf_log debug "Resource still awaiting promotion"
sleep 1
else
ocf_log debug "Resource demoted"
break
fi
done
# only return $OCF_SUCCESS if _everything_ succeeded as expected
return $OCF_SUCCESS
}
migrate_to
actionThe migrate_to
action can serve one of two purposes:
$OCF_RESKEY_CRM_meta_migrate_target
environment variable.The example below illustrates a push type migration:
foobar_migrate_to() {
# exit immediately if configuration is not valid
foobar_validate_all || exit $?
# if resource is not running, bail out early
if ! foobar_monitor; then
ocf_log err "Resource is not running"
exit $OCF_ERR_GENERIC
fi
# actually start up the resource here (make sure to immediately
# exit with an $OCF_ERR_ error code if anything goes seriously
# wrong)
ocf_run frobnicate --migrate \
--dest=$OCF_RESKEY_CRM_meta_migrate_target \
|| exit OCF_ERR_GENERIC
...
# only return $OCF_SUCCESS if _everything_ succeeded as expected
return $OCF_SUCCESS
}
In contrast, a freeze/thaw type migration may implement its freeze operation like this:
foobar_migrate_to() {
# exit immediately if configuration is not valid
foobar_validate_all || exit $?
# if resource is not running, bail out early
if ! foobar_monitor; then
ocf_log err "Resource is not running"
exit $OCF_ERR_GENERIC
fi
# actually start up the resource here (make sure to immediately
# exit with an $OCF_ERR_ error code if anything goes seriously
# wrong)
ocf_run frobnicate --freeze || exit OCF_ERR_GENERIC
...
# only return $OCF_SUCCESS if _everything_ succeeded as expected
return $OCF_SUCCESS
}
migrate_from
actionThe migrate_from
action can serve one of two purposes:
$OCF_RESKEY_CRM_meta_migrate_source
environment variable.The example below illustrates a push type migration:
foobar_migrate_from() {
# exit immediately if configuration is not valid
foobar_validate_all || exit $?
# After the resource has been migrated, check whether it resumed
# correctly. If the resource starts asynchronously, the agent may
# spin on the monitor function here -- if the resource does not
# run within the defined timeout, the cluster manager will
# consider the migrate_from action failed
while ! foobar_monitor; do
ocf_log debug "Resource has not yet migrated, waiting"
sleep 1
done
# only return $OCF_SUCCESS if _everything_ succeeded as expected
return $OCF_SUCCESS
}
In contrast, a freeze/thaw type migration may implement its thaw operation like this:
foobar_migrate_from() {
# exit immediately if configuration is not valid
foobar_validate_all || exit $?
# actually start up the resource here (make sure to immediately
# exit with an $OCF_ERR_ error code if anything goes seriously
# wrong)
ocf_run frobnicate --thaw || exit OCF_ERR_GENERIC
# After the resource has been migrated, check whether it resumed
# correctly. If the resource starts asynchronously, the agent may
# spin on the monitor function here -- if the resource does not
# run within the defined timeout, the cluster manager will
# consider the migrate_from action failed
while ! foobar_monitor; do
ocf_log debug "Resource has not yet migrated, waiting"
sleep 1
done
# only return $OCF_SUCCESS if _everything_ succeeded as expected
return $OCF_SUCCESS
}
notify
actionWith notifications, instances of clones (and of master/slave resources, which are an extended kind of clones) can inform each other about their state. When notifications are enabled, any action on any instance of a clone carries a pre
and post
notification. Then, the cluster manager invokes the notify
operation on all clone instances. For notify
operations, additional environment variables are passed into the resource agent during execution:
$OCF_RESKEY_CRM_meta_notify_type
— the notification type (pre
or post
)$OCF_RESKEY_CRM_meta_notify_operation
— the operation (action) that the notification is about (start
, stop
, promote
, demote
etc.)$OCF_RESKEY_CRM_meta_notify_start_uname
— node name of the node where the resource is being started (start
notifications only)$OCF_RESKEY_CRM_meta_notify_stop_uname
— node name of the node where the resource is being stopped (stop
notifications only)$OCF_RESKEY_CRM_meta_notify_master_uname
— node name of the node where the resource currently is in the Master role$OCF_RESKEY_CRM_meta_notify_promote_uname
— node name of the node where the resource currently is being promoted to the Master role (promote
notifications only)$OCF_RESKEY_CRM_meta_notify_demote_uname
— node name of the node where the resource currently is being demoted to the Slave role (demote
notifications only)Notifications come in particularly handy for master/slave resources using a "pull" scheme, where the master is a publisher and the slave a subscriber. Since the master is obviously only available as such when a promotion has occurred, the slaves can use a "pre-promote" notification to configure themselves to subscribe to the right publisher.
Likewise, the subscribers may want to unsubscribe from the publisher after it has relinquished its master status, and a "post-demote" notification can be used for that purpose.
Consider the example below to illustrate the concept.
foobar_notify() {
local type_op
type_op="${OCF_RESKEY_CRM_meta_notify_type}-${OCF_RESKEY_CRM_meta_notify_operation}"
ocf_log debug "Received $type_op notification."
case "$type_op" in
'pre-promote')
ocf_run frobnicate --slave-mode \
--master=$OCF_RESKEY_CRM_meta_notify_promote_uname \
|| exit $OCF_ERR_GENERIC
;;
'post-demote')
ocf_run frobnicate --unset-slave-mode || exit $OCF_ERR_GENERIC
;;
esac
return $OCF_SUCCESS
}
This section outlines variables typically available to resource agents, primarily for convenience purposes. For additional variables available while the agent is being executed, refer to Section 2.1, “Environment variables” and Section 3, “Return codes”.
$OCF_ROOT
The root of the OCF resource agent hierarchy. This should never be changed by a resource agent. This is usually /usr/lib/ocf
.
$OCF_FUNCTIONS_DIR
The directory where the resource agents shell function library, .ocf-shellfuncs
, resides. This is usually defined in terms of$OCF_ROOT
and should never be changed by a resource agent. This variable may, however, be overridden from the command line while testing a new or modified resource agent.
$OCF_RESOURCE_INSTANCE
The resource instance name. For primitive (non-clone, non-stateful) resources, this is simply the resource name. For clones and stateful resources, this is the primitive name, followed by a colon an the clone instance number (such as p_foobar:0
).
$__OCF_ACTION
The currently invoked action. This is exactly the first command-line argument that the cluster manager specifies when it invokes the resource agent.
$__SCRIPT_NAME
The name of the resource agent. This is exactly the base name of the resource agent script, with leading directory names removed.
$HA_RSCTMP
A temporary directory for use by resource agents. The system startup sequence (on any LSB compliant Linux distribution) guarantees that this directory is emptied on system startup, so this directory will not contain any stale data after a node reboot.
ocf_log
Resource agents should use the ocf_log
function for logging purposes. This convenient logging wrapper is invoked as follows:
ocf_log "Log message"
It supports following the following severity levels:
debug
— for debugging messages. Most logging configurations suppress this level by default.info
— for informational messages about the agent’s behavior or status.warn
— for warnings. This is for any messages which reflect unexpected behavior that does not constitute an unrecoverable error.err
— for errors. As a general rule, this logging level should only be used immediately prior to an exit
with the appropriate error code.crit
— for critical errors. As with err
, this logging level should not be used unless the resource agent also exits with an error code. Very rarely used.have_binary
and check_binary
A resource agent may need to test for the availability of a specific executable. The have_binary
convenience function comes in handy here:
if ! have_binary frobnicate; then
ocf_log warn "Missing frobnicate binary, frobnication disabled!"
fi
If a missing binary is a fatal problem for the resource, then the check_binary
function should be used:
check_binary frobnicate
Using check_binary
is a shorthand method for testing for the existence (and executability) of the specified binary, and exiting with $OCF_ERR_INSTALLED
if it cannot be found or executed.
ocf_run
Whenever a resource agent needs to execute a command and capture its output, it should use the ocf_run
convenience function, invoked as in this example:
ocf_run "frobnicate --spam=eggs" || exit $OCF_ERR_GENERIC
With the command specified above, the resource agent will invoke frobnicate --spam=eggs
and capture its output and exit code. If the exit code is nonzero (indicating an error), ocf_run
logs the command output with the err
logging severity, and the resource agent subsequently exits. If the exit code is zero (indicating success), any command output will be logged with the info
logging severity.
If the resource agent wishes to ignore the output of a successful command execution, it can use the -q
flag with ocf_run
. In the example below, ocf_run
will only log output if the command exit code is nonzero.
ocf_run -q "frobnicate --spam=eggs" || exit $OCF_ERR_GENERIC
Finally, if the resource agent wants to log the output of a command with a nonzero exit code with a severity other than error, it may do so by adding the -info
or -warn
option to ocf_run
:
ocf_run -warn "frobnicate --spam=eggs"
ocf_take_lock
and ocf_release_lock_on_exit
Occasionally, there may be different resources of the same type in a cluster configuration that should not execute actions in parallel. When a resource agent needs to guard against parallel execution on the same machine, it can use the ocf_take_lock
and ocf_release_lock_on_exit
convenience functions:
LOCKFILE=${HA_RSCTMP}/foobar
ocf_release_lock_on_exit $LOCKFILE
foobar_start() {
...
ocf_take_lock $LOCKFILE
...
}
ocf_take_lock
attempts to acquire the designated $LOCKFILE
. When it is unavailable, it sleeps a random amount of time between 0 and 1 seconds, and retries. ocf_release_lock_on_exit
releases the lock file when the agent exits (for any reason).
ocf_is_decimal
Specifically for parameter validation, it can be helpful to test whether a given value is numeric. The ocf_is_decimal
function exists for that purpose:
foobar_validate_all() {
if ! ocf_is_decimal $OCF_RESKEY_eggs; then
ocf_log err "eggs is not numeric!"
exit $OCF_ERR_CONFIGURED
fi
...
}
ocf_is_true
When a resource agent defines a boolean parameter, the value for this parameter may be specified by the user as 0
/1
,true
/false
, or on
/off
. Since it is tedious to test for all these values from within the resource agent, the agent should instead use the ocf_is_true
convenience function:
if ocf_is_true $OCF_RESKEY_superfrobnicate; then
ocf_run "frobnicate --super"
fi
Note | |
---|---|
If |
ha_pseudo_resource
"Pseudo resources" are those where the resource agent in fact does not actually start or stop something akin to a runnable process, but merely executes a single action and then needs some form of tracing whether that action has been executed or not. The portblock
resource agent is an example of this.
Resource agents for pseudo resources can use a convenience function, ha_pseudo_resource
, which makes use of tracking filesto keep tabs on the status of a resource. If foobar
was designed to manage a pseudo resource, then its start
action could look like this:
foobar_start() {
# exit immediately if configuration is not valid
foobar_validate_all || exit $?
# if resource is already running, bail out early
if foobar_monitor; then
ocf_log info "Resource is already running"
return $OCF_SUCCESS
fi
# start the pseudo resource
ha_pseudo_resource ${OCF_RESOURCE_INSTANCE} start
# After the resource has been started, check whether it started up
# correctly. If the resource starts asynchronously, the agent may
# spin on the monitor function here -- if the resource does not
# start up within the defined timeout, the cluster manager will
# consider the start action failed
while ! foobar_monitor; do
ocf_log debug "Resource has not started yet, waiting"
sleep 1
done
# only return $OCF_SUCCESS if _everything_ succeeded as expected
return $OCF_SUCCESS
}
This section contains a collection of conventions that have emerged in the resource agent repositories over the years. Following these conventions is by no means mandatory for resource agent authors, but it is a good idea based on thePrinciple of Least Surprise — resource agents following these conventions will be easier to understand, review, and use than those that do not.
Several parameter names are supported by a number of resource agents. For new resource agents, following these examples is generally a good idea:
binary
— the name of a binary that principally manages the resource, such as a server daemonconfig
— the full path to a configuration filepid
— the full path to a file holding a process ID (PID)log
— the full path to a log filesocket
— the full path to a UNIX socket that the resource managesip
— an IP address that a daemon binds toport
— a TCP or UDP port that a daemon binds toNeedless to say, resource agents should only implement any of these parameters if they are sensible to use in the agent’s context.
Defaults for resource agent parameters should be set by initializing variables with the suffix _default
:
# Defaults
OCF_RESKEY_superfrobnicate_default=0
: ${OCF_RESKEY_superfrobnicate=${OCF_RESKEY_superfrobnicate_default}}
Note | |
---|---|
The resource agent should make sure that it sets a default for any parameter not marked as |
PATH
for binariesWhen a resource agent supports a parameter designed to hold the name of a binary (such as a daemon, or a client utility for querying status), then that parameter should honor the PATH
environment variable. Do not supply full paths. Thus, the following approach:
# Good example -- do it this way
OCF_RESKEY_frobnicate_default="frobnicate"
: ${OCF_RESKEY_frobnicate="${OCF_RESKEY_frobnicate_default}"}
is much preferred over specifying a full path, as shown here:
# Bad example -- avoid if you can
OCF_RESKEY_frobnicate_default="/usr/local/sbin/frobnicate"
: ${OCF_RESKEY_frobnicate="${OCF_RESKEY_frobnicate_default}"}
This rule holds for defaults, as well.
Whenever possible, resource agent contributors are encouraged to use the GNU General Public License (GPL), version 2 and later, for any new resource agents. The shell functions library does not strictly mandate this, however, as it is licensed under the GNU Lesser General Public License (LGPL), version 2.1 and later (so it can be used by non-GPL agents).
The resource agent must explicitly state its own license in the agent source code.
When sourcing .ocf-shellfuncs
as explained in Section 4.3, “Initialization”, any resource agent automatically sets LANG
and LC_ALL
to the C
locale. Resource agents can thus expect to always operate in the C
locale, and need not reset LANG
or any of the LC_
environment variables themselves.
For testing whether a particular process (with a known process ID) is currently running, a frequently found method is to send it a 0
signal and catch errors, similar to this example:
if kill -s 0 `cat $daemon_pid_file`; then
ocf_log debug "Process is currently running"
else
ocf_log warn "Process is dead, removing pid file"
rm -f $daemon_pid_file
if
Stateful (master/slave) resources must set their own master preference — they can thus provide hints to the cluster manager which is the the best instance to promote to the Master
role.
For this purpose, crm_master
comes in handy. This convenience wrapper around the crm_attribute
sets a node attribute namedmaster-$OCF_RESOURCE_INSTANCE
for the node it is being executed on, and fills this attribute with the specified value. The cluster manager is then expected to translate this into a promotion score for the corresponding instance, and base its promotion preference on that score.
Stateful resource agents typically execute crm_master
during the monitor
and/or notify
action.
The following example assumes that the foobar
resource agent can test the application’s status by executing a binary that returns certain exit codes based on whether
foobar_monitor() {
local rc
# exit immediately if configuration is not valid
foobar_validate_all || exit $?
ocf_run frobnicate --test
# This example assumes the following exit code convention
# for frobnicate:
# 0: running, and fully caught up with master
# 1: gracefully stopped
# 2: running, but lagging behind master
# any other: error
case "$?" in
0)
rc=$OCF_SUCCESS
ocf_log debug "Resource is running"
# Set a high master preference. The current master
# will always get this, plus 1. Any current slaves
# will get a high preference so that if the master
# fails, they are next in line to take over.
crm_master -l reboot -v 100
;;
1)
rc=$OCF_NOT_RUNNING
ocf_log debug "Resource is not running"
# Remove the master preference for this node
crm_master -l reboot -D
;;
2)
rc=$OCF_SUCCESS
ocf_log debug "Resource is lagging behind master"
# Set a low master preference: if the master fails
# right now, and there is another slave that does
# not lag behind the master, its higher master
# preference will win and that slave will become
# the new master
crm_master -l reboot -v 5
;;
*)
ocf_log err "Resource has failed"
exit $OCF_ERR_GENERIC
esac
return $rc
}
This section discusses automated testing for resource agents. Testing is a vital aspect of development; it is crucial both for creating new resource agents, and for modifying existing ones.
ocf-tester
The resource agents repository (and hence, any installed resource agents package) contains a utility named ocf-tester
. This shell script allows you to conveniently and easily test the functionality of your resource agent.
ocf-tester
is commonly invoked, as root
, like this:
ocf-tester -n [-o = ... ]
is an arbitrary resource name.=
with the -o
option, corresponding to any resource parameters you wish to set for testing.
is the full path to your resource agent.When invoked, ocf-tester
executes all mandatory actions and enforces action behavior as explained in Section 5, “Resource agent actions”.
It also tests for optional actions. Optional actions must behave as expected when advertised, but do not cause ocf-tester
to flag an error if not implemented.
For example, you could run ocf-tester
on the foobar
resource agent as follows:
# ocf-tester -n foobartest \
-o superfrobnicate=true \
-o datadir=/tmp \
/home/johndoe/ra-dev/foobar
Beginning tests for /home/johndoe/ra-dev/foobar...
* Your agent does not support the notify action (optional)
* Your agent does not support the reload action (optional)
/home/johndoe/ra-dev/foobar passed all tests
ocft
ocft
is a testing tool for resource agents. The main difference to ocf-tester
is that ocft
can automate creating complex testing environments. That includes package installation and arbitrary shell scripting.
10.2.1. ocft
components
ocft
consists of the following components:
/usr/sbin/ocft
) — generates shell scripts from test case configuration files/usr/share/resource-agents/ocft/configs/
) — a configuration file contains environment setup and test cases for one resource agent/var/lib/resource-agents/ocft/cases/
, but normally there is no need to inspect them10.2.2. Customizing the testing environment
ocft
modifies the runtime environment of the resource agent either by changing environment variables (through the interface defined by OCF) or by running ad-hoc shell scripts which can for instance change permissions of a file or unmount a file system.
10.2.3. How to test
You need to know the software (resource) you want to test. Draw a sketch of all interesting scenarios, with all expected and unexpected conditions and how the resource agent should react to them. Then you need to encode these conditions and the expected outcomes as ocft
test cases. Running ocft is then simple:
# ocft make
# ocft test
The first subcommand generates the scripts for your test cases whereas the second runs them and checks the outcome.
10.2.4. ocft
configuration file syntax
There are four top level options each of which can contain one or more sub-options.
CONFIG
(top level option)
This option is global and influences every test case.
AgentRoot
(sub-option)AgentRoot /usr/lib/ocf/resource.d/xxx
Normally, we assume that the resource agent lives under the heartbeat
provider. Use AgentRoot
to test agent which is distributed by another vendor.
InstallPackage
(sub-option)InstallPackage package [package2 [...]]
Install packages necessary for testing. The installation is skipped if the packages have already been installed.
HangTimeout secs
The maximum time allowed for a single RA action. If this timer expires, the action is considered as failed.
SETUP-AGENT
(top level option)
SETUP-AGENT
bash commands
If the RA needs to be initialized before testing, you can put bash code here for that purpose. The initialization is done only once. If you need to reinitialize then delete the /tmp/.[AGENT_NAME]_set
stamp file.
CASE
(top level option)
CASE "description"
This is the main building block of the test suite. Each test case is to be described in one CASE
top level option.
One case consists of several suboptions typically followed by the RunAgent
suboption.
Var
(sub-option)Var VARIABLE=value
It is to set up an environment variable of the resource agent. They usually appear to be OCF_RESKEY_xxx. One point is to be noted is there is no blank by both sides of "=".
Unvar
(sub-option)Unvar VARIABLE [VARIABLE2 [...]]
Remove the environment variable.
Include
(sub-option)Include macro_name
Include statements in macro_name. See below for description of CASE-BLOCK
.
Bash
(sub-option)Bash bash_codes
This option is to set up the environment of OS, where you can insert BASH code to customize the system randomly. Note, do not cause unrecoverable consequences to the system.
BashAtExit
(sub-option)BashAtExit bash_codes
This option is to recover the OS environment in order to run another test case correctly. Of cause you can use Bash option to recover it. However, if mistakes occur in the process, the script will quit directly instead of running your recovery codes. If it happens, you ought to use BashAtExit which can restore the system environment before you quit.
RunAgent
(sub-option)RunAgent cmd [ret_value]
This option is to run resource agent. "cmd" is the parameter of the resource agent, such as "start, status, stop …". The second parameter is optional. It will compare the actual returned value with the expected value when the script has run recourse agent. If differs, bugs will be found.
It is also possible to execute a suboption on a remote host instead of locally. The protocol used is ssh and the command is run in the background. Just add the @
suffix to the suboption name. For instance:
[email protected] date
would run the date program. Remote commands are run in background.
NB: Not clear how can ssh be automated as we don’t know in advance the environment. Perhaps use "well-known" host names such as "node2"? Also, if the command runs in the background, it’s not clear how is the exit code checked. Finally, does Var@node make sense? Or is the current environment somehow copied over? We probably need an example here.
Need examples in general.
CASE-BLOCK
(top level option)
CASE-BLOCK macro_name
The CASE-BLOCK
option defines a macro which can be Include+d in any +CASE
. All CASE
suboptions are valid in CASE-BLOCK
.
This section discusses what to do with your resource agent once it is done and tested — where to install it, and how to include it in either your own application package or in the Linux-HA resource agents repository.
If you choose to include your resource agent in your own project, make sure it installs into the correct location. Resource agents should install into the /usr/lib/ocf/resource.d/
directory, where
is the name of your project or any other name you wish to identify the resource agent with.
For example, if your foobar
resource agent is being packaged as part of a project named fortytwo
, then the correct full path to your resource agent would be /usr/lib/ocf/resource.d/fortytwo/foobar
. Make sure your resource agent installs with 0755
(-rwxr-xr-x
) permission bits.
When installed this way, OCF-compliant cluster resource managers will be able to properly identify, parse, and execute your resource agent. The Pacemaker cluster manager, for example, would map the above-mentioned installation path to the ocf:fortytwo:foobar
resource type identifier.
When you package resource agents as part of your own project, you should apply the considerations outlined in this section.
Note | |
---|---|
If you instead prefer to submit your resource agent to the Linux-HA resource agents repository, see Section 11.3, “Submitting resource agents” for information on doing so. |
11.2.1. RPM packaging
It is recommended to put your OCF resource agent(s) in an RPM sub-package, with the name
. Ensure that the package owns its provider directory, and depends on the upstream resource-agents
package which lays out the directory hierarchy and provides convenience shell functions. An example RPM spec snippet is given below:
%package resource-agents
Summary: OCF resource agent for Foobar
Group: System Environment/Base
Requires: %{name} = %{version}-%{release}, resource-agents
%description resource-agents
This package contains the OCF-compliant resource agents for Foobar.
%files resource-agents
%defattr(755,root,root,-)
%dir %{_prefix}/lib/ocf/resource.d/fortytwo
%{_prefix}/lib/ocf/resource.d/fortytwo/foobar
11.2.2. Debian packaging
For Debian packages, like for RPMs, it is recommended to create a separate package holding your resource agents, which then should depend on the cluster-agents
package.
Note | |
---|---|
This section assumes that you are packaging with |
An example debian/control
snippet is given below:
Package: foobar-cluster-agents
Priority: extra
Architecture: all
Depends: cluster-agents
Description: OCF-compliant resource agents for Foobar
You will also create a separate .install
file. Sticking with the example of installing the foobar
resource agent as a sub-package of fortytwo
, the debian/fortytwo-cluster-agents.install
file could consist of the following content:
usr/lib/ocf/resource.d/fortytwo/foobar
If you choose not to bundle your resource agent with your own package, but instead wish to submit it to the upstream resource agent repository hosted on the ClusterLabs repository on GitHub, please follow the steps outlined in this section.
Create a working copy (a Git clone) of the upstream repository with the following command:
git clone git://github.com/ClusterLabs/resource-agents
Then, copy your resource agent into the heartbeat
subdirectory:
cd resource-agents/heartbeat
cp /path/to/your/local/copy/of/foobar .
chmod 0755 foobar
cd ..
Next, modify the Makefile.am
file in resource-agents/heartbeat
and add your new resource agent to the ocf_SCRIPTS
list. This will make sure the agent is properly installed.
Lastly, open Makefile.am in resource-agents/doc/man
and add ocf_heartbeat_
to the man_MANS
variable. This will automatically generate a resource agent manual page from its metadata, and then install that man page into the correct location.
Now, add your new resource agents, and the two modifications to the Makefiles, to your changeset:
git add heartbeat/foobar
git add heartbeat/Makefile.am
git add doc/man/Makefile.am
git commit
In your commit message, be sure to include a meaningful description, for example:
High: foobar: new resource agent
This new resource agent adds functionality to manage a foobar service.
It supports being configured as a primitive or as a master/slave set,
and also optionally supports superfrobnication.
Now the patch set is good for review on the mailing list:
git send-email [email protected]
git send-email
will now roll all local commits not in the upstream repository into a nicely formatted email, and submit that to the mailing list. Please consult man git-send-email
for details on configuring and using git send-email
.
Once your new resource agent has been accepted for merging, one of the upstream developers will push your patch into the upstream repository. At that point, you can update your checkout from upstream, and remove your own patch set.
git reset --hard origin/master
git pull
If you maintain a specific resource agent, or you are making repeated contributions to the codebase, it’s usually a good idea to maintain your own fork of the ClusterLabs/resource-agents
repository on GitHub.
To do so,
resource-agents
repository.As you work on resource agents, please commit early, and commit often. You can always fold commits later with git rebase -i
.
Once you have made a number of changes that you would like others to review, push them to your GitHub fork and send a post to the linux-ha-dev
mailing list pointing people to it.
After the review is done, fix up your tree with any requested changes, and then issue a pull request. There are two ways of doing so:
git request-pull
utility to get a pre-populated email skeleton summarizing your changesets. Add any information you see fit, and send it to the list. It is a good idea to prefix your email subject with [GIT PULL]
so upstream maintainers can pick the message out easily.