Cloud Foundry BOSH Introduction

Cloud Foundry is the first open source PaaS in the industry. It supports multiple frameworks, multiple services and multiple cloud providers. BOSH was originally created in the context of the Cloud Foundry project. Nevertheless, it is a general tool chain for deployment and lifecycle management of large scale distributed services. In a few subsequent blogs, I will walk you through the process of installing the Cloud Foundry platform using BOSH.

Cloud Foundry contains a number of components. The most important ones are Cloud Controller, NATS, Router, Health Manager and DEA. An introduction of these components can be found here: http://blog.cloudfoundry.com/2011/04/19/cloud-foundry-open-paas-deep-dive/ . The components are designed in a way that makes the system horizontally scalable. This means a Cloud Foundry instance can have one or more copies of each component to meet the load needed by a cloud. The components can be distributively deployed on multiple nodes.

BOSH is the tool we use to deploy the components of Cloud Foundry onto distributed nodes. (In a virtualized environment, we use the term “node” interchangeably with “virtual machine” or VM). Before we move to the details of a real deployment, let’s introduce briefly how BOSH works when it deploys a system. We strongly suggest you read the official BOSH document here. (https://github.com/cloudfoundry/oss-docs/blob/master/bosh/documentation/documentation.md)

BOSH is a recursive acronym for Bosh Outter SHell. In contrast to the “Outter Shell”, the system being deployed and managed by BOSH is called the “Inner Shell”. The below diagram illustrates a simplified model of BOSH.

BOSH can be considered as a server or a robot which orchestrates the deployment process of a distributed system. There is a ruby tool which can interact with BOSH Command Line Interface (CLI). Before BOSH starts to deploy a system, it needs three prerequisites: a stemcell, a release (the software to be installed), and a deployment manifest. Let’s look at these three items in more detail.

 Stemcells:  In a cloud platform, VMs are usually cloned from a template. A stemcell is a VM template containing a standard Ubuntu distribution. A BOSH agent is also embedded in the template so that BOSH can take control of VMs cloned from the stemcell. The name “stemcell” originated from biological term “stem cells”, which refers to the undifferentiated cells that are able to grow into diverse cell types later. Similarly, VMs created by a BOSH stemcell are identical at the beginning. After inception, VMs are configured with different CPU/memory/storage/network, and installed with different software packages. Hence, VMs built from the same stemcell template behavior differently.

 Releases: A release contains collections of software bits and configurations which will be installed onto the target system. Each VM is deployed with a collection of software, which is called a job. Configurations are usually templates which contain parameters such as IP address, port number, user name, password, domain name. These parameters will be replaced at deploy time by the properties defined in a deployment manifest file.

 Deployments: A deployment is something that turns a static release into runnable software on VMs. A Deployment Manifest defines the actual values of parameters needed by a deployment. During a deployment process, BOSH substitutes the parameters in the release and makes the software run on the configuration as planned.

When the above 3 items are ready, they will be uploaded to BOSH by the BOSH CLI tool. After that, a BOSH installation of a distributed system typically has the following major steps:

1) If some packages in the release require compilation, BOSH first creates a few temporal VMs (worker VMs) to compile them. After compiling the packages, BOSH destroys the worker VMs and stores the binaries to its internal blobstore.

2) BOSH creates a pool of the VMs which will be the nodes where the release to be deployed on. These VMs are cloned from the stemcell with a BOSH agent installed.

3) For each job of the release, BOSH picks a VM from the pool and updates its configuration according to the Deployment Manifest. The configuration may include IP address, persistent disk size etc.

4) When the reconfiguration of the VM is completed, BOSH sends commands to the agent inside each VM. The commands tell the agent to install software packages. During the installation, the agent may download packages from BOSH and installs them. When the installation finishes, the agent runs the starting script to launch the job of the VM.

5) BOSH repeats step 3-4 until all jobs are deployed and launched. The jobs can be deployed simultaneously or sequentially. The value “max_in_flight” in the manifest file controls this behavior. When it is 1, it means the jobs are deployed one by one. This value is useful for a slow system to avoid timeout caused by resource congestion. While it is greater than one, it means jobs are deployed in parallel.

More on deploying Cloud Foundry on vSphere using BOSH: