http://www.mirantis.com/blog/bare-metal-provisioning-with-openstack-cloud/
Many people refer to ‘cloud’ and ‘virtualization’ in the same breath, and from there assume that the cloud is all about managing the virtual machines that run on your hypervisor. CurrentlyOpenStack supports Virtual Machine management through a number of hypervisors, the most widespread being KVM and Xen.
As it turns out, in certain circumstances, using virtualization is not optimal—for example, if there are substantial requirements for performance (e.g., I/O and CPU) that are not compatible with the overhead of virtualization. However, it’s still very convenient to utilize OpenStack features such as instance management, image management, authentication services and so forth for IaaS use cases that require provisioning on bare metal. In addressing these cases we implemented a driver for OpenStack compute, Nova, to support bare-metal provisioning.
When we undertook our first bare-metal provisioning implementation, there was code implemented byUSC/ISI to support bare-metal provisioning on Tilera hardware. We weren’t going to be targeting Tilera hardware, but the other bits of the bare-metal implementation were pretty useful. NTT Docomo also had code to support a more generic scheme using PXE boot and an IPMI-based power manager, but unfortunately it took some time to open source it, so we had to start development of the generic backend before the NTT Docomo code was open sourced.
A blueprint on bare-metal provisioning can be found on the OpenStack Wiki here:General Bare Metal Provisioning Framework.
Our driver implements the standard driver interface for the OpenStack hypervisor driver, with the difference that it doesn’t actually talk to any hypervisor. Instead it manages a pool of physical nodes. Each physical node could be used to provision only one “Virtual” (sorry for the pun) Machine (VM) instance. When a new provisioning request arrives, the driver chooses a physical host from a pool to place this VM on and it stays there until destroyed. The operator can add, remove, and modify the physical nodes in the pool.
bare-metal provisioning architecture
The main components related to the bare-metal provisioning support are:
nova-compute
with the bare-metal driver: The bare-metal driver itself consists of several components:
dnsmasq
is a Netboot environment for instance provisioning.nova-baremetal-agent
: This is the agent that is supposed to be run onbootstrap-linux
(see the next bullet) and executes various provisioning tasks spawned by the bare-metal driver.bootstrap-linux
: A tiny Linux image to be booted over the network and perform basic initialization. It is based on theTiny Core Linux and contains a basic set of packages such as Python to runnova-baremetal-agent
(which is implemented in Python) and curl to be able to download an image from Glance. Additionally, it contains an init script that downloads nova-baremetal-agent
using curl and executes it.nova-baremetal-service
: A service that is responsible for orchestration of the provisioning tasks (tasks are applied by nova-baremetal-agent
directly to the bare-metal server it is running on).Let’s see what each component actually does in the course of provisioning a new VM (i.e., when you callnova boot
). I won’t focus on the details of this request until it reachesnova-compute
and the spawn request has reached our bare-metal driver.
The following diagram illustrates this workflow:
bare-metal provisioning flow
dnsmasq
.nova-baremetal-service
(which provides a REST interface for that).nova-baremetal-agent
polls the nova-baremetal-service
REST service for tasks.nova-baremetal-service
sees a task for this node and sends a response with the task, which includes a URL for the image from Glance and the authentication token to be able to fetch it.nova-baremetal-agent
fetches an image from the URL specified in the task and ‘dd’s it to the hard drive and then informsnova-baremetal-service
that it’s done with the task.nova-baremetal-service
is notified about task completion, it informs the driver that it’s time to reboot the node.A typical configuration for the compute will look like this:
1
2
3
4
5
6
|
.
.
.
--
connection_type
=
baremetal
# baremetal support
--
baremetal_driver
=
generic
# target a generic hardware, i.e. IPMI management and PXE boot
--
networkmgr_driver
=
nova
.
virt
.
baremetal
.
networkmgr
.
juniper
.
JuniperNetworkManager
# use Juniper network manager
--
powermanager_driver
=
nova
.
virt
.
baremetal
.
powermgr
.
freeipmi
.
FreeIPMIPowerManager
# use freeimpi-based power management
.
.
.
|
But before the system becomes useful, we have to register switches and nodes. Information about them is stored in the database. We have created an extension for OpenStack REST API to manage these objects and two CLI clients for it:nova-baremetal-switchmanager
and nova-baremetal-nodemanager
. Let’s use them to show how to add new switches and nodes.
Switches could be added using a command like this:
1
|
nova
-
baremetal
-
switchmanager
add
<
ip
>
<
user
>
<
passwd
>
<
driver
>
<
description
>
|
You have to specify the IP address of the switch, credentials for the manager user, which switch driver to use, and an optional description.
nova-baremetal-switchmanager
also supports other essential commands like list and delete. Once we have at least one switch, we can start adding nodes:
1
|
nova
-
baremetal
-
nodemanager
add
<
ip
>
<
mac_addr
>
<
cpus
>
<
ram
>
<
hdd
>
<
ipmihost
>
<
ipmiuser
>
<
ipmipass
>
<
switchid
>
<
switchport
>
|
As you can see, it has a few more options: IP address of the node, MAC address of its first network interface (used to identify the node), number of CPUs, amount of RAM in Mb, HDD capacity in Gb, IPMI information, switch ID of the switch it’s connected to, and a name of the port on the switch.
After successful execution of this command, the specified node will be added to the pool. Withnova-baremetal-nodemanager
you can also list and remove nodes in the pool with list and delete commands respectively.
Bare metal has proved to be a useful and stable feature for our customers. It has other specific features, such as networking management and image preparation, that we will cover in upcoming posts.