A brief introduction to XenStore: Format and Interface

About

This document describes the format of the entries in XenStore, how and what they're used for, and how third-party apps should use XenStore as a management interface.

Overview

XenStore is a hierarchical namespace (similar to sysfs or Open Firmware) which is shared between domains. The interdomain communication primitives exposed by Xen are very low-level (virtual IRQ and shared memory). XenStore is implemented on top of these primitives and provides some higher level operations (read a key, write a key, enumerate a directory, notify when a key changes value).

XenStore is a database, hosted by domain 0, that supports transactions and atomic operations. It's accessible by either a Unix domain socket in Domain-0, a kernel-level API, or an ioctl interface via /proc/xen/xenbus. XenStore should always be accessed through the functions defined in <xs.h>. XenStore is used to store information about the domains during their execution and as a mechanism of creating and controlling Domain-U devices.

XenBus is the in-kernel API used by virtual IO drivers to interact with XenStore.

General Format

There are three main paths in XenStore :

  • /vm - stores configuration information about domain

  • /local/domain - stores information about the domain on the local node (domid, etc.)

  • /tool - stores information for the various tools

/vm

The /vm path stores configuration information for a domain. This information doesn't change and is indexed by the domain's UUID. A /vm entry contains the following information:

  • ssidref - ssid reference for domain

  • uuid - uuid of the domain (somewhat redundant)

  • on_reboot - the action to take on a domain reboot request (destroy or restart)

  • on_poweroff - the action to take on a domain halt request (destroy or restart)

  • on_crash - the action to take on a domain crash (destroy or restart)

  • vcpus - the number of allocated vcpus for the domain

  • memory - the amount of memory (in megabytes) for the domain Note: appears to sometimes be empty for domain-0

  • vcpu_avail - the number of active vcpus for the domain (vcpus - number of disabled vcpus)

  • name - the name of the domain

/vm/<uuid>/image/

The image path is only available for Domain-Us and contains:

  • ostype - identifies the builder type (linux or vmx)

  • kernel - path to kernel on domain-0

  • cmdline - command line to pass to domain-U kernel

  • ramdisk - path to ramdisk on domain-0

/local

The /local path currently only contains one directory, /local/domain that is indexed by domain id. It contains the running domain information. The reason to have two storage areas is that during migration, the uuid doesn't change but the domain id does. The /local/domain directory can be created and populated before finalizing the migration enabling localhost=>localhost migration.

/local/domain/<domid>

This path contains:

  • cpu_time - xend start time (this is only around for domain-0)

  • handle - private handle for xend

  • name - see /vm

  • on_reboot - see /vm

  • on_poweroff - see /vm

  • on_crash - see /vm

  • vm - the path to the VM directory for the domain

  • domid - the domain id (somewhat redundant)

  • running - indicates that the domain is currently running

  • memory/ - a directory for memory information

    • target - target memory size for the domain (in kilobytes)

  • cpu - the current CPU the domain is pinned to (empty for domain-0?)

  • cpu_weight - the weight assigned to the domain

  • vcpu_avail - a bitmap telling the domain whether it may use a given VCPU

  • online_vcpus - how many vcpus are currently online

  • vcpus - the total number of vcpus allocated to the domain

  • console/ - a directory for console information

    • ring-ref - the grant table reference of the console ring queue

    • port - the event channel being used for the console ring queue (local port)

    • tty - the current tty the console data is being exposed of

    • limit - the limit (in bytes) of console data to buffer

  • backend/ - a directory containing all backends the domain hosts

    • vbd/ - a directory containing vbd backends

      • <domid>/ - a directory containing vbd's for domid

        • <virtual-device>/ - a directory for a particular virtual-device on domid

          • frontend-id - domain id of frontend

          • frontend - the path to the frontend domain

          • physical-device - backend device number

          • sector-size - backend sector size

          • sectors - backend number of sectors

          • info - device information flags. 1=cdrom, 2=removable, 4=read-only

          • domain - name of frontend domain

          • params - parameters for device

          • type - the type of the device

          • dev - frontend virtual device (as given by the user)

          • node - backend device node (output from block creation script)

          • hotplug-status - connected or error (output from block creation script)

          • state - communication state across XenBus to the frontend. 0=unknown, 1=initialising, 2=init. wait, 3=initialised, 4=connected, 5=closing, 6=closed

    • vif/ - a directory containing vif backends

      • <domid>/ - a directory containing vif's for domid

        • <vif number>/ - a directory for each vif

          • frontend-id - the domain id of the frontend

          • frontend - the path to the frontend

          • mac - the mac address of the vif

          • bridge - the bridge the vif is connected to

          • handle - the handle of the vif

          • script - the script used to create/stop the vif

          • domain - the name of the frontend

          • hotplug-status - connected or error (output from block creation script)

          • state - communication state across XenBus to the frontend. 0=unknown, 1=initialising, 2=init. wait, 3=initialised, 4=connected, 5=closing, 6=closed

  • device/ - a directory containing the frontend devices for the domain

    • vbd/ - a directory containing vbd frontend devices for the domain

      • <virtual-device>/ - a directory containing the vbd frontend for virtual-device

        • virtual-device - the device number of the frontend device

        • device-type - the device type ("disk", "cdrom", "floppy")

        • backend-id - the domain id of the backend

        • backend - the path of the backend in the store (/local/domain path)

        • ring-ref - the grant table reference for the block request ring queue

        • event-channel - the event channel used for the block request ring queue

        • state - communication state across XenBus to the backend. 0=unknown, 1=initialising, 2=init. wait, 3=initialised, 4=connected, 5=closing, 6=closed

    • vif/ - a directory containing vif frontend devices for the domain

      • <id>/ - a directory for vif id frontend device for the domain

        • backend-id - the backend domain id

        • mac - the mac address of the vif

        • handle - the internal vif handle

        • backend - a path to the backend's store entry

        • tx-ring-ref - the grant table reference for the transmission ring queue

        • rx-ring-ref - the grant table reference for the receiving ring queue

        • event-channel - the event channel used for the two ring queues

        • state - communication state across XenBus to the backend. 0=unknown, 1=initialising, 2=init. wait, 3=initialised, 4=connected, 5=closing, 6=closed

  • device-misc/ - miscellanous information for devices

    • vif/ - miscellanous information for vif devices

      • nextDeviceID - the next device id to use

  • store/ - per-domain information for the store

    • port - the event channel used for the store ring queue

    • ring-ref - the grant table reference used for the store's communication channel

  • image - private xend information

Interacting with the XenStore

The XenStore interface provides transaction based reads and writes to points in the xenstore hierarchy. Watches can be set at points in the hierarchy and an individual watch will be triggered when anything at or below that point in the hierachy changes. A watch is registered with a callback function and a "token". The "token" can be a pointer to any piece of data. The callback function is invoked with the of the changed node and the "token".

The interface is centered around the idea of a central polling loop that reads watches, providing the path, callback, and token, and invoking the callback.

API Usage Examples

These code snippets should provide a helpful starting point.

C

struct xs_handle *xs;
xs_transaction_t th;
char *path;
int fd;
fd_set set;
int er;
struct timeval tv = {.tv_sec = 0, .tv_usec = 0 };
char **vec;
unsigned int num_strings;
char * buf;
unsigned int len;
/* Get a connection to the daemon */
xs = xs_daemon_open();
if ( xs == NULL ) error();
/* Get the local domain path */
path = xs_get_domain_path(xs, domid);
if ( path == NULL ) error();
/* Make space for our node on the path */
path = realloc(path, strlen(path) + strlen("/mynode") + 1);
if ( path == NULL ) error();
strcat(path, "/mynode");
/* Create a watch on /local/domain/%d/mynode. */
er = xs_watch(xs, path, "mytoken");
if ( er == 0 ) error();
/* We are notified of read availability on the watch via the
* file descriptor.
*/
fd = xs_fileno(xs);
while (1)
{
/* TODO (TimPost), show a simpler example with poll()
* in a modular style, using a simple callback. Most
* people think 'inotify' when they see 'watches'. */
FD_ZERO(&set);
FD_SET(fd, &set);
/* Poll for data. */
if ( select(fd + 1, &set, NULL, NULL, &tv) > 0
&& FD_ISSET(fd, &set))
{
/* num_strings will be set to the number of elements in vec
* (typically, 2 - the watched path and the token) */
vec = xs_read_watch(xs, &num_strings);
if ( !vec ) error();
printf("vec contents: %s|%s\n", vec[XS_WATCH_PATH],
vec[XS_WATCH_TOKEN]);
/* Prepare a transaction and do a read. */
th = xs_transaction_start(xs);
buf = xs_read(xs, th, vec[XS_WATCH_PATH], &len);
xs_transaction_end(xs, th);
if ( buf )
{
printf("buflen: %d\nbuf: %s\n", len, buf);
}
/* Prepare a transaction and do a write. */
th = xs_transaction_start(xs);
er = xs_write(xs, th, path, "somestuff", strlen("somestuff"));
xs_transaction_end(xs);
if ( er == 0 ) error();
}
}
/* Cleanup */
close(fd);
xs_daemon_close(xs);
free(path);

Python

function isnumbered(obj) { return obj.childNodes.length && obj.firstChild.childNodes.length && obj.firstChild.firstChild.className == 'LineNumber'; } function nformat(num,chrs,add) { var nlen = Math.max(0,chrs-(''+num).length), res = ''; while (nlen>0) { res += ' '; nlen-- } return res+num+add; } function addnumber(did, nstart, nstep) { var c = document.getElementById(did), l = c.firstChild, n = 1; if (!isnumbered(c)) if (typeof nstart == 'undefined') nstart = 1; if (typeof nstep == 'undefined') nstep = 1; n = nstart; while (l != null) { if (l.tagName == 'SPAN') { var s = document.createElement('SPAN'); s.className = 'LineNumber' s.appendChild(document.createTextNode(nformat(n,4,' '))); n += nstep; if (l.childNodes.length) l.insertBefore(s, l.firstChild) else l.appendChild(s) } l = l.nextSibling; } return false; } function remnumber(did) { var c = document.getElementById(did), l = c.firstChild; if (isnumbered(c)) while (l != null) { if (l.tagName == 'SPAN' && l.firstChild.className == 'LineNumber') l.removeChild(l.firstChild); l = l.nextSibling; } return false; } function togglenumber(did, nstart, nstep) { var c = document.getElementById(did); if (isnumbered(c)) { remnumber(did); } else { addnumber(did,nstart,nstep); } return false; } document.write('<a href="#" onclick="return togglenumber(\'CA-8383d983ef4482b5d6a4fa3af9156c2a13676a9a_000\', 1, 1);" \ class="codenumbers">Toggle line numbers<\/a>'); Toggle line numbers
 1 
# xsutil provides access to xshandle() which allows you to use something closer to the C-style API,


2 # however it does not support polling in the same manner.
3 from xen . xend . xenstore . xsutil import *
4 # xswatch provides a callback interface for the watches. I similar interface exists for C within xenbus.
5 from xen . xend . xenstore . xswatch import *
6 xs = xshandle ( ) # From xsutil
7 path = xs . get_domain_path ( ) + "/mynode"
8 # Watch functions take the path as the first argument
9 # all other arguments that are passed via the xswatch are also included.
10 def watch_func ( path , xs ) :
11 # Read the data
12 th = xs . transaction_start ( )
13 buf = xs . read ( th , path )
14 xs . transaction_end ( th )
15 log . info ( "Got %s" % buf )
16 # Write back
17 th = xs . transaction_start ( )
18 xs . write ( th , path , "somestuff" )
19 xs . transaction_end ( th )
20 mywatch = xswatch ( path , xs )

You can use direct Read/Write or gather calls via xstransact.

By default the python xsutil.xshandle() is a shared global handle. xswatch uses this handle with a blocking read_watch call. Because the read_watch function is protected by a per-handle mutex, multiple calls will be interleaved and you probably do not want this behavior. If you would like a blocking mechanism, you might consider introducing a semaphore in the callback function that can be used to block code execution. You need to be sure to handle failure cases and not block indefinitely. For instance, the "@releaseDomain" watch will be triggered on domain destruction for watches within the /local/domain/* trees.

It is also possible -- currently indirectly -- to get a fresh XenStore handle within python and block on read_watch in the main execution path. This may be necessary if you want to block waiting for a XenStore node value in a code path initialed by an xswatch callback.

N.B.: Changes subject to http://wiki.xensource.com/xenwiki/XenStoreReference

你可能感兴趣的:(C++,c,linux,python,C#)