The Inspiration Behind Open Source Project Harbor

About Project Harbor

Project Harbor is an enterprise-class registry server that stores and distributes Docker images. Harbor extends the open source Docker Distribution by adding the functionalities usually required by an enterprise, such as security, identity and replication.

The Inspiration Behind Project Harbor

When I attended container meetups and conferences in early 2014, I often heard people complaining about container image management challenges. They usually created various hacks or workarounds to solve their problems. When I saw pain points like these, I had a gut feeling there would be a great opportunity to create a solution addressing these challenges. Shortly after these discussions, we started a side project for managing container images. And that’s where Project Harbor began.

Why We Chose to Open Source Project Harbor

Originally, we dogfooded our project within the VMware China R&D Center. We used Harbor in a few internal projects and received positive feedback from our teams. In March 2016, we ultimately decided to open source Project Harbor on Github for larger adoption and more feedback.

How We Landed On the Name “Harbor”

We chose a name related to containers. Harbor is a place where containers are loaded on or unloaded from ships. Moreover, the word “Harbor” is simple and can be easily pronounced and remembered, making it a strong choice for project promotion.

The People Behind Project Harbor

At the beginning, only about six people were involved with the project—mostly engineers and interns in our Advanced Technology Center (ATC) team at VMware China R&D. Gradually, community users started to join forces with us, collaborating to help improve the project. Currently, there are approximately 50 contributors, and about two-thirds are outside of VMware.

Project Harbor Momentum

Since Project Harbor was open sourced last year, it gained significant traction in terms of adoption and new contributors. I think Project Harbor has seen substantial momentum due to many factors. First, it hit the pain points and solved many container user problems. Second, Harbor is open source and has an open community mindset; our actions reflect constant user feedback and suggestions to ensure improvement. We also work with partners in the ecosystem to build products or create solutions using Harbor. Third, we promoted Harbor through social media channels, like WeChat, blogs and Twitter.

I look forward to sharing more about Project Harbor, answering any community questions and sharing my thoughts on other open source projects in the future.

This blog was originally posted at

Project Harbor Reached Milestone of 2000 Stars on Github

About a year ago, I gave the first star to an open source project we created. In less than 13 months, the project has reached an exciting milestone of 2000 stars!  This project is called Harbor, an enterprise class registry server.

People from different countries starred Project Harbor on Github

Back in early 2014, when I attended Docker meetups and container conferences, I often heard people complaining about the challenges to manage container images. They usually created all kinds of hacks or workarounds to solve their own problems. When I saw pain points like these, my gut feeling told me that it must be a great opportunity to do something to it.

We then started to work on a side project to help people manage image effectively. This project became the prototype of Harbor. It was used by a few project teams and turned out to be quite helpful. In March 2016, we decided to open source it on Github for larger adoption. Since then, Project Harbor has taken off and been gaining more and more traction. We listened to feedback from the community and kept improving it. Community developers were enthusiastic and they contributed code, tools, documentation and even translation to multiple languages to the project. Two third of the contributors was actually from outside of VMware.

Gradually, Harbor becomes one of the most popular open source registries and has been widely used by people in the container space. VMware has also integrated Harbor into two products: vSphere Integrated Containers and Photon Platform. Many users run Harbor in their production, such as one of the largest internet companies, in China. Other companies also forked Harbor and used in their own products. Below are some statistics of Project Harbor.

Project Harbor Statistics

The current version of Harbor provides some important features to enterprise users, such as RBAC (Role Based Access Control), LDAP/AD authentication, image remote replication, management portal. In the coming new release, Harbor will be adding new features like Notary and a new admin UI.

One of my favorite features of Harbor: Remote replication (synchronization) of images

While we are celebrating the milestone of Harbor, it certainly serves as a new starting point to us. Thanks everyone who contributed to Harbor’s success. Your continuous support definitely motivates us to make Harbor the best home for your container images!

Survey based on user community, 53 responses

Related Topics:

Architecture of Harbor: An Open Source Enterprise-class Registry Server
Private Docker Registry Harbor Achieves HA based on Virtual SAN
Working with Harbor Registry REST API via Swagger

Private Docker Registry Harbor Achieves HA based on Virtual SAN

Recently, VMware released the Docker Volume Driver for vSphere 1.0 beta, which enabled a Docker host to create volumes directly on a vSphere datastore (Virtual SAN, VMFS, NFS, etc). The volumes can be directly mounted into Docker containers. The Docker volume solves the problem of storing persistent data of Docker containers. The Docker Volume of vSphere not only simplifies storage configuration, the volumes can also be associated with the Storage Policy Based Management (SPBM) of vSphere. For example, an administrator can set Fault To Tolerant (FTT) or Stripe Width (SW) of the data volume. Volumes with SPBM can achieve a higher data protection level and better performance. The docker volume driver of vSphere is an open-source project. It is downloadable at .

This blog walks through the steps of creating data volumes in VMware Virtual SAN (VSAN). As an example of a containerized application, the open source Harbor Registry is used to describe the usage of data volumes provisioned by VSAN, through which Harbor Registry achieves a higher data protection level and high availability (HA).

A little more background about Harbor Registry: it is another open-source project by VMware. A registry is one of the necessary components of a container’s build-ship-run lifecycle. Harbor helps users set up an enterprise private Docker registry service rapidly. Furthermore, it also provides enhanced features usually required by enterprises such as graphical user interface (GUI), role based access control, AD/LDAP integration and image replication. Harbor’s Github repo: .

vsanharborha1The architecture of the system is illustrated in the above figure. 3 ESXi hosts form a VSAN cluster. A Harbor registry VM is running on one of the hosts. Besides, there are three external Docker volumes created in the VSAN cluster, used for storing persistent data in Harbor. This cluster provides consolidated storage by local disks of each host. It can tolerate a failure of one physical host and still preserve data integrity and accessibility.

The configuration process is discussed as follows.
1.    First, set up a Virtual SAN cluster with 3 ESXi hosts. A photon OS VM ( ) is installed on one of the ESXi server as a Docker host. Of course, other Linux distributions like Ubuntu can be used as well, as long as it can run Docker Engine and Docker Compose.

t12.    On the release page of Docker Volume Driver for vSphere project (, download the plugin for ESXi host and for VMs respectively. For example, for 1.0 beta, the file names are:

3.    On each of the ESXi hosts, use the following commands to install the plugin (SSH of ESXi host must be enabled). After installation, no reboot is required.

# esxcli software vib install -d "/" \
--no-sig-check –f
Installation Result
Message: Operation finished successfully.
Reboot Required: false
VIBs Installed: VMWare_bootbank_esx-vmdkops-service_1.0.0-0.0.1
VIBs Removed:
VIBs Skipped:

4.    On the Photon VM, install the RPM package. For other Debian based OS, install the corresponding deb package.

# rpm -ivh docker-volume-vsphere-1.0.beta-1.x86_64.rpm
Preparing...                              ##################### [100%]
Updating / installing...
1:docker-volume-vsphere-0:1.0.beta-############################ [100%]
File: '/proc/1/exe' -> '/usr/lib/systemd/systemd'
Created symlink from /etc/systemd/system/\
docker-volume-vsphere.service to /usr/lib/systemd/system/

5.    After the ESXi plugin is installed, a management script is generated at /usr/lib/vmware/vmdkops/bin/ This script helps administrators manage the data volumes. For example, an administrator can create different storage policies. In Virtual SAN, the default storage policy has a Stripe Width setting of 1 (SW=1). We will create a new policy with SW=2 as an example.
To do this, just SSH into any of the ESXi hosts and run this command:

# /usr/lib/vmware/vmdkops/bin/ policy \
create --name SW=2 --content '(("stripeWidth" i2))'

The parameter ‘SW=2’ is the name of the policy. The key point here is to set the content of the policy and it is ‘((“stripeWidth” i2))’ in this example. Other settings are the same as the Virtual SAN policy parameters. The possible parameters and their description are as follows:spbm6.    Now Docker volumes can be created on the Docker host (the Photon OS VM). As an example, we first create two volumes with default storage policy and then create another volume with the newly created ‘SW=2’ policy.

# docker volume create --driver=vmdk --name=vsanvol1 -o size=50gb
# docker volume create --driver=vmdk --name=vsanvol2 -o size=20gb
# docker volume create --driver=vmdk --name=vsanvol3 -o size=20gb \
-o vsan-policy-name=SW=2

By specifying the ‘–driver=vmdk’ parameter, the external volume is created in the vSphere datastore. The volume is created in the same datastore where the Photon OS VM resides. In this example the Photon OS VM is stored in Virtual SAN, so are the Docker volumes. These volumes are stored in the form of VMDK. What is noteworthy here is that the volumes are not mounted to any VM by now. So if we navigate to the vSphere Web Client, we cannot find any information about these newly created volumes from the VM’s page.t2However, we can indeed find them in the dockvols directory in the Virtual SAN datastore.t3In subsequent sections, we are able to find the VMDKs through the VM’s page when the volumes are mounted to running containers.

7.    On the Photon OS VM, download the Harbor Registry source code. Before installing Harbor, we need to modify the harbor/Deploy/docker-compose.yml configuration file in order to use the newly created external volumes. We can then install Harbor by following the official Harbor installation guide.

Open the docker-compose.yml file. Find the ‘registry’ section, modify these lines:

  - /data/registry:/storage
  - ./config/registry/:/etc/registry/


- vsanvol1:/storage
- ./config/registry/:/etc/registry/

vsanvol1 is the external volume we just created.
Next, look for the ‘mysql’ section and modify these lines:

  - /data/database:/var/lib/mysql


  - vsanvol2:/var/lib/mysql

Similarly, vsanvol2 is another volume we just created.
Next, look for the ‘jobservice’ section and modify these lines:

  - /data/job_logs:/var/log/jobs
  - ./config/jobservice/app.conf:/etc/jobservice/app.conf


  - vsanvol3:/var/log/jobs
  - ./config/jobservice/app.conf:/etc/jobservice/app.conf

Similarly, vsanvol3 is another volume we just created.
In the end of the file, add the following lines:

    external: true
    external: true
    external: true

These lines indicate that these volumes have already been created and do not need to be created by Docker again. Keep other configurations unchanged in the docker-compose.yml. Then install Harbor as the official guide and bring up Harbor registry service.

8.    After Harbor is running, we can check the vSphere Web Client and confirm that these 3 external volumes are indeed mounted to the Photon OS VM. They are mounted as ‘Hard Disk 2’,‘Hard Disk 3’ and ‘Hard Disk 4’ in the VM respectively. In this beta version, there seems some bugs about displaying storage policy. For example, the storage policies for these VMDKs are displayed as ‘None’ while we can see that ‘Hard Disk 3’ is created as ‘SW=2’ policy and the other two VMDKs are created with the default storage policy. The below screenshot shows a storage policy of ‘Hard Disk 4’:t5There may be a problem where Virtual SAN cannot identify the storage policy created by ‘” Docker Volume Driver for vSphere’ correctly. This problem should be solved in newer version.

9.    Let’s upload two images to test if there is any data loss when a host fails.t610.    Enable vSphere HA on this Virtual SAN cluster, with default HA settings. Then we identify that the Photon OS VM is on the ESXi host with IP address    Power off the physical host with IP address Wait for a while after HA restarts the VM and check the state of Photon OS VM.t8The VM has been restarted on another heathy host. The original external volumes are mounted to the restarted VM. Because a host of the VSAN cluster is powered off, for each VMDK there will be a component shown as ‘absent’. However, with the default storage policy Virtual SAN can tolerate a host’s failure, so the access to the data is still successful.
12.    After Photon VM is restarted, check the status of Harbor. All the services and containers are running as normal.t1013.    Check Harbor UI, the 2 images we uploaded before are still intact. This indicates that there is no data loss.t11When vSphere HA restarted the Harbor VM on another healthy host, all the containers of Harbor are also restarted. They are connected to the original same volumes as in the figure:vsanharborha2This blog introduces an example of achieving Harbor registry HA by leveraging Virtual SAN and vSphere HA. Since Harbor is a multi-container application, this approach can also be applied to other container-based applications.

Related posts:

Architecture of Harbor: An Open Source Enterprise-class Registry Server

Working with Harbor Registry REST API via Swagger

Working with Harbor Registry REST API via Swagger

Swagger is the most popular RESTful API tool, it contains an entire set of codes, editors, code generators etc, and can be used in API descriptions, definitions, generation and visualization etc. For details about Swagger, see, where you can download its source code and integrate it with the project.

Harbor is an enterprise-class private registry server initiated by VMware( Harbor also offers RESTful API which provides easy integration with other container management platforms. This article describes how to use Swagger tools embedded in Harbor to test RESTful APIs.

First, let’s take a look at how Swagger creates descriptions and definitions for RESTful API. Swagger provides an online WYSIWYG editor at, users can enter Swagger-compatible YAML or JSON input on the left pane of the editor, and the result of the input will be shown on the right pane. If there are any input errors, there are alerts with amendment recommendations for the user, it’s very convenient! Refer to for instructions on writing definition files that are compatible with Swagger. This editor also supports the download of completed YAML to the local system, or conversion to JSON format. It can even help us auto-generate a Mockup Server or client.

Swagger Embedded in Harbor

Core functions of Harbor are implemented through RESTful API. A set of API rules that can be visualized was documented in Swagger during the development process and is provided for users as part of the project.

The Harbor Project utilizes two methods to let users present or control RESTFul API with Swagger.

The first is the “static” method, which only uses Swagger as the tool for presentations and reviews. Users only have to locate the swagger.yaml file from the directory docs/ of Project Harbor , and through the editor, open, select all, copy, and paste into the code pane on the left of Swagger online editor. The right pane will display a visualization of the Harbor RESTful API document page for review and reference.
article3_image1The second method is the “dynamic” method, which involves deploying Swagger UI and Harbor REST services in the same Server. Users can use Swagger to control and test Harbor RESTful APIs. This method may change data in the database, so it is not suggested to be used in production systems. Deployment procedures are illustrated in the figure below:article3_image2Under the directory docs/ of Harbor Project source codes, there is a script file named, which can help users carry out “dynamic” deployment. The following provides instructions on related steps. For detailed information, please refer to the file docs/

(1) Change the SERVER_IP value in the script file, set it to the IP address of the host system of currently deployed Harbor system, save changes and execute the script. The script will download the Swagger software package accordingly and decompress it to the directory of static resources of Harbor Project vendors; copy the swagger.yaml files under docs/ to the Harbor Project static resource directory resources/yaml; change/replace URL contents according to the SERVER_IP provided by the user.

(2) Switch to the Deploy directory, change the file named “docker-compose.yml”, mount the newly-added Swagger static resource directory onto Harbor UI Docker container through Volumes, letting SwaggerUI deploy together with Harbor UI after starting up, to provide external access.

(3) Use the docker-compose command to re-create Project Harbor, clear all content left on the server, restart the newly created Project Harbor image.

The figure below shows a screenshot of a deployed Swagger UI page.
RESTful API Authentication

When triggering Harbor RESTful API using Swagger UI, please be aware of “login status” issues, because some of API requires session information. There are two ways to configure a session.

Method 1: Open the UI with a browser (Note: Make sure that the IP address of the URL in the Harbor UI is the same as the value provided for SERVER_IP when deploying Swagger UI), complete the registration (if using for the first time) and login; then open a new (tab) in the same browser, enter the Swagger UI address below, this will ensure that HarborRESTful API is running when the user is logged in.


Method 2: Harbor RESTful API supports Basic Authentication mode. However, Swagger currently does not allow the input of usernames and passwords on its interface, so access becomes inconvenient. Those who are interested can follow this link and try to make Swagger accessible in Basic Authentication mode. Of course, the user can also use the below command to access API. In this way, the user does not have to log in to Harbor’s UI in order to test the API.

curl -u <username: password>

Related article:

Harbor Architecture Overview

Architecture of Harbor: An Open Source Enterprise-class Registry Server

About Project Harbor

VMware has initiated an enterprise-class Registry called Project Harbor, which helps users rapidly build a private enterprise-class registry service. It extends the open source Docker Distibution by adding the functionality usually required by an enterprise, such as management UI, Role Based Access Control(RBAC), AD/LDAP integration, image replication and auditing. The project has received over 1100 stars and been forked over 290 times since it was released 6 months ago. This article introduces the main modules of the Project Harbor and describes the operational principles behind Harbor.



As depicted in the above diagram, Harbor comprises 6 components:

Proxy: Components of Harbor, such as registry, UI and token services, are all behind a reversed proxy. The proxy forwards requests from browsers and Docker clients to various backend services.

Registry: Responsible for storing Docker images and processing Docker push/pull commands. As Harbor needs to enforce access control to images, the Registry will direct clients to a token service to obtain a valid token for each pull or push request.

Core services: Harbor’s core functions, which mainly provides the following services:

  • UI: a graphical user interface to help users manage images on the Registry
  • Webhook: Webhook is a mechanism configured in the Registry so that image status changes in the Registry can be populated to the Webhook endpoint of Harbor. Harbor uses webhook to update logs, initiate replications, and some other functions.
  • Token service: Responsible for issuing a token for every docker push/pull command according to a user’s role of a project. If there is no token in a request sent from a Docker client, the Registry will redirect the request to the token service.

Database: Database stores the meta data of projects, users, roles, replication policies and images.

Job services: used for image replication, local images can be replicated(synchronized) to other Harbor instances.

Log collector: Responsible for collecting logs of other modules in a single place.


Each component of Harbor is wrapped as a Docker container. Naturally, Harbor is deployed by Docker Compose.

In the source code (, the Docker Compose template used to deploy Harbor is located at /Deployer/docker-compse.yml. Opening this template file reveals the 6 container components making up Harbor:

proxy: Reverse-proxy formed by the Nginx Server.

registry: Container instance created from the official image of Docker distribution.

ui: Core services within the architecture. This container is the main part of Project Harbor.

mysql: Database container created from the official MySql image.

job services: Replicating images to a remote registry via state machines. Image deletion can also be synchronized to a remote Harbor instance.

log: Container that runs rsyslogd, used for collecting logs from other containers through the log-driver mode.

These containers are linked via DNS service discovery in Docker. By this means, each container can be accessed by their names. For the end user, only the service port of the proxy (Nginx) needs to be revealed.

The following two examples of Docker command illustrate the interaction between Harbor’s components.

docker login

Suppose Harbor is deployed on a host with IP A user runs the docker command to send a login request to Harbor:

$ docker login

After the user enters the required credentials, the Docker client sends an HTTP GET request to the address “”. The different containers of Harbor will process it according to the following steps:

docker login(a) First, this request is received by the proxy container listening on port 80. Nginx in the container forwards the request to the Registry container at the backend.

(b) The Registry container has been configured for token-based authentication, so it returns an error code 401, notifying the Docker client to obtain a valid token from a specified URL. In Harbor, this URL points to the token service of Core Services;

(c) When the Docker client receives this error code, it sends a request to the token service URL, embedding username and password in the request header according to basic authentication of HTTP specification;

(d) After this request is sent to the proxy container via port 80, Nginx again forwards the request to the UI container according to pre-configured rules. The token service within the UI container receives the request, it decodes the request and obtains the username and password;

(e) After getting the username and password, the token service checks the database and authenticates the user by the data in the MySql database. When the token service is configured for LDAP/AD authentication, it authenticates against the external LDAP/AD server. After a successful authentication, the token service returns a HTTP code that indicates the success. The HTTP response body contains a token generated by a private key.

At this point, one docker login process has been completed. The Docker client saves the encoded username/password from step (c) locally in a hidden file.

docker Push

article1_image4(We have omitted proxy forwarding steps. The figure above illustrates communication between different components during the docker push process)

After the user logs in successfully, a Docker Image is sent to Harbor via a Docker Push command:

# docker push

(a) Firstly, the docker client repeats the process similar to login by sending the request to the registry, and then gets back the URL of the token service;

(b) Subsequently, when contacting the token service, the Docker client provides additional information to apply for a token of the push operation on the image (library/hello-world);

(c) After receiving the request forwarded by Nginx, the token service queries the database to look up the user’s role and permissions to push the image. If the user has the proper permission, it encodes the information of the push operation and signs it with a private key and generates a token to the Docker client;

(d) After the Docker client gets the token, it sends a push request to the registry with a header containing the token. Once the Registry receives the request, it decodes the token with the public key and validates its content. The public key corresponds to the private key of the token service. If the registry finds the token valid for pushing the image, the image transferring process begins.

For more information about enterprise registry Harbor, take a look at Github:

Building Cloud Foundry on vSphere using BOSH Part 4

Installing Cloud Foundry

In previous blogs, we set up a micro BOSH and a BOSH. We are ready to start our installation of Cloud Foundry. First thing first, we create a resource plan for our deployment.

As we are writing this document, a complete installation of Cloud Foundry contains about distinct 34 jobs (VMs). Some of the jobs are core components and at least one instance must be installed, such as Cloud Controller, NATS and DEAs. Some jobs should have multiple instances depending on the actual need, such as DEAs and routers. Some jobs are optional, such as service gateways and service nodes. Therefore, before we install Cloud Foundry, we should decide which components are included in a deployment.  Once we have a list of components we want to deploy, we can plan for resources needed by each job. Typically, this includes IP address, CPU, memory and storage. Below is an example of a deployment plan.

Job Instances   IP
Memory CPU Disk(GB) Required?
debian_nfs_server 1 xx.xx.xx.xx 2GB 2 16 required
nats 1 xx.xx.xx.xx 1GB 1 8 required
ccdb_postgres 1 xx.xx.xx.xx 1GB 1 8 required
uaadb 1 xx.xx.xx.xx 1GB 1 8 required
vcap_redis 1 xx.xx.xx.xx 1GB 1 8 required
uaa 1 xx.xx.xx.xx 1GB 1 8 required
acmdb 1 xx.xx.xx.xx 1GB 1 8 required
acm 1 xx.xx.xx.xx 1GB 1 8 required
cloud_controller 1 xx.xx.xx.xx 2GB 2 16 required
stager 1 xx.xx.xx.xx 1GB 1 8 required
router 2 xx.xx.xx.xx 512MB 1 8 required
health_manager 1 xx.xx.xx.xx 1GB 1 8 required
dea 2 xx.xx.xx.xx 2GB 2 16 required
mysql_node(*) 1 xx.xx.xx.xx 1GB 1 8 optional
mysql_gateway 1 xx.xx.xx.xx 1GB 1 8 optional
mongodb_node 1 xx.xx.xx.xx 1GB 1 8 optional
mongodb_gateway 1 xx.xx.xx.xx 1GB 1 8 optional
redis_node 1 xx.xx.xx.xx 1GB 1 8 optional
redis_gateway 1 xx.xx.xx.xx 1GB 1 8 optional
rabbit_node 1 xx.xx.xx.xx 1GB 1 8 optional
rabbit_gateway 1 xx.xx.xx.xx 1GB 1 8 optional
postgresql_node 1 xx.xx.xx.xx 1GB 1 8 optional
postgresql_gateway 1 xx.xx.xx.xx 1GB 1 8 optional
vblob_node 1 xx.xx.xx.xx 1GB 1 8 optional
vblob_gateway 1 xx.xx.xx.xx 1GB 1 8 optional
backup_manager 1 xx.xx.xx.xx 1GB 1 8 optional
service_utilities 1 xx.xx.xx.xx 1GB 1 8 optional
serialization_data_server 1 xx.xx.xx.xx 1GB 1 8 optional
services_nfs 1 xx.xx.xx.xx 1GB 1 8 optional
syslog_aggregator 1 xx.xx.xx.xx 1GB 1 8 optional
services_redis 1 xx.xx.xx.xx 1GB 1 8 optional
opentsdb 1 xx.xx.xx.xx 1GB 1 8 optional
collector 1 xx.xx.xx.xx 1GB 1 8 optional
dashboard 1 xx.xx.xx.xx 1GB 1 8 optional
Total: 36 39GB 40 320

From the above table, we can come up with required resource pools:

Pool Name Size Configuration Jobs
small 30 RAM:1GB, CPU: 1, DISK: 8GB nats, ccdb_postgres, uaadb,
vcap_redis, uaa, acmdb, acm, stager, health_manager, mysql_node, mysql_gateway, mongodb_node, mongodb_gateway, redis_node, redis_gateway, postgresql_node, postgresql_gateway, vblob_node
vblob_gateway, backup_manager ,service_utilities, collector, dashboard, serialization_data_server
services_nfs, syslog_aggregator, services_redis, opentsdb
medium 4 RAM:2GB, CPU: 2, DISK: 16GB debian_nfs_server
cloud_controller , dea
router 2 RAM:512M, CPU: 1, DISK: 8GB router

From the above two tables, we can start to modify the manifest file. We name the manifest file as cf.yml. The following sections explain the fields in details.

This is the cloud foundry deployment name. We can name it arbitrarily.

The director uuid is the uuid of bosh director we just deployed in Part III. We can retrieve this value by command:

$ bosh status

The release name should be the same as the name you entered when creating the cf-release. The version was generated automatically when the release was created.

compilation, update, networks, resource_pools
These fields are similar to those in bosh.yml file. Refer to the previous part for more information.

Jobs are the components of cloud foundry. Each job runs on a virtual machine. Jobs are described as below.

debian_nfs_server, services_nfs: these two jobs are used as nfs server in Cloud Foundry. As they serve as file servers, we should make sure that the “persistent_disk” property indeed exists.

syslog_aggregator: this job is uses to collect system logs and store them in the database.

nats: NATS is the message bus of Cloud Foundry. It’s a core component in Cloud Foundry.

opentsdb: this is a database that stores the log information. Since it is a database, it also requires a “persistent_disk” property.

collector: this job collects system information and stores them in databases.

dashboard: this is a web based tool for monitoring and reporting of Cloud Foundry Platform.

cloud_controller, ccdb: cloud_controller controls all the Cloud Foundry components. “ccdb” is the database for cloud controller. “persistent_disk” property is required in ccdb.

uaa, uaadb: uaa is used for user authentication and authorization. uaadb is the database that stores the user information. “persistent_disk” property is required for uaadb.

vcap_redis, services_redis: these two jobs are used to store the internal key-value pairs for Cloud Foundry.

acm, acmdb: acm is short for Access Control Manager. The ACM is a service that allows cloud foundry components to implement access control features. “acmdb” is the database for acm. “acmdb” also requires a “persistent_disk” property.

stager: stager is a job that packs the source code and all the required packages of user’s application. When staging is completed, the app is passed to dea for execution.

router: router is used to route user’s request to proper destination in Cloud Foundry.

health_manager, health_manager_next: health_manager is the job that monitors the health status of all users’ apps. health_manager_next is the next-generation version of health_manager. They will be co-existing for some time.

dea: “dea” is short for droplet execution agent. All users’ apps are executed in dea.

mysql_node, mysql_gateway, mongodb_node, mongodb_gateway, redis_node, redis_gateway, rabbit_node, rabbit_gateway, postgresql_node, postgresql_gateway, vblob_node, vblob_gateway: these jobs are all services that Cloud Foundry supplies. Each service has a node that provisions resources. The corresponding gateway lies between the cloud_controller and a service node and it acts as the gateway for each service.

backup_manager: used to backup users’ data and databases.

service_utilities:  Utilities of service management.

serialization_data_server: a server used to serialize data in Cloud Foundry.


This is another important part in cf.yml file. We should pay attention that the IP addresses in this section should be in sync with those in the jobs field.  You should replace the password and tokens with your private secure password and tokens.

domain:  this is the domain name for user’s access. We should also create a DNS server to resolve the domain to the load balancer’s IP address. In our example, we set the domain name as cf.local, so users can use vmc target when pushing apps.

cc.srv_api_uri: This property usually takes the format of http://api.<yourdomain>. For example, we set domain as cf.local, the srv_api_uri would be

cc.password: this password must have at least 16 characters.

cc. allow_registration: if it is true, users can register an account by using vmc command. Set this to false to disable this behavior.

cc.admins: a list of admin users. Admin users can register through vmc command even the flag allow_registration is set to false.

Most of the ‘nfs_server’ in the properties should be set to the IP address of the job ‘services_nfs’.

mysql_node.production: If it is true, the memory of mysql_node must be at least 4GB. In an experimental environment, we can set it to false so that the memory of mysql_node can be set to less than 4GB.

Because the yml file may evolve as the new release of Cloud Foundry, there is an option of bosh command to validate the yml file. Type “bosh help”, you can see the usage and explanation of “bosh diff”:

$ bosh diff [<template_file>]

This command compares your current deployment manifest against the specified deployment manifest template. It helps you to keep your deployment configuration file up to date. A dev template can be found in deployments repos.

For example, you can run the following command to compare your yml file to the template file. Firstly, you must cd into the directory where your cf.yml file and the template file reside, and then use this command:

$ bosh diff dev-template.erb

This command will help you find any mistakes in the cf.yml file. If there are some fields missing, the command helps fill in it automatically. If there is a spelling mistake or other errors, the command reports a syntax error.

You can download a sample yml file from here:

When the manifest file is completed, we can now start to install Cloud Foundry.
1) In Part III, we have cloned CF repository from Gerrit by:

$ gerrit clone ssh://<your username>

2) Go to the directory and create a CF release.

$ cd cf-release
$ bosh create release

This will download all the packages, blob data and other resources needed. It will take several minutes depending on your network speed.

1. If you have edited code in cf-release, you may have to add –force option to bosh create release.
2. It is important to have direct internet connection when running this command.
3. If your network is slow or you do not have a direct connection to the internet, you may want to do this in a better environment. You can create the release on a machine with a good internet connection the option –with-tarball. Then you copy the generated tarball back to the system you want.

If nothing goes wrong, you can see a summary of this release like this:

Generating manifest...
Writing manifest...
Release summary
| Name                      | Version | Notes | Fingerprint                              |
| sqlite                    | 3       |       | e3e9b61f8cdc2610480c2aa841e89dc0bb1dc9c9 |
| ruby                      | 6       |       | b35a5da6c214d9a97fdd664931bf64987053ad4c |
… …
| debian_nfs_server         | 3       |       | c1dc860ed6ab2bee68172be996f64f9587e9ac0d |
| Name                      | Version  | Notes       | Fingerprint                              |
| redis_node                | 19       |             | 61098860eaa8cfb5dc438255ebe28db74a1feabc |
| rabbit_gateway            | 13       |             | ddcc0901ded1e45fdc4f318ed4ba8d8ccca51f7f |
… …
| debian_nfs_server         | 7        |             | 234fabc1a2c5dfbfd7c932882d947fed331b8295 |
| memcached_gateway         | 4        |             | 998623d86733a633c789849721e00d85cc3ebc20 |

Jobs affected by changes in this release
| Name             | Version  |
… …
| cloud_controller | 45.1-dev |

Release version: 95.10-dev
Release manifest: /home/boshcli/cf-release/dev_releases/cf-def-95.10-dev.yml

As you can see, the dev-releases directory contains the release manifest yml file (and a tarball file, if –with-tarball option is on).

3) Target BOSH CLI to the director of BOSH. If you don’t remember the director’s IP, you can find it in your BOSH deployment manifest in part III.

$ bosh target
Target set to `bosh_director ( Ver: 0.5.1 (release:abb3e7a4 bosh:2c69ee8c)

4) Upload cf- release by referring to the generated manifest file, e.g. cf-def-95.10-dev.yml in our example.

$ bosh upload release cf-def-95.10-dev.yml

This step will copy packages and jobs, and build them into a tarball, then verify this release to make sure files and dependencies are right. After verifying, it will upload release and create new jobs. Finally, you can see information telling you release uploaded:

Task 317 done
Started               2012-10-28 05:35:43 UTC
Finished              2012-10-28 05:36:44 UTC
Duration              00:01:01
Release uploaded

You can verify your release by:

$ bosh releases

You can see all the newly uploaded releases in the listing:

| Name   | Versions                                             |
| cf-def | 95.1-dev, 95.8-dev, 95.9-dev, 95.10-dev              |
| cflab  | 92.1-dev                                             |

5) Now that, we have uploaded release and stemcell (the same stemcell as in part III), and manifest is ready, set deployment to the manifest:

$ bosh deployment cf-dev.yml
Deployment set to `/home/boshcli/cf-dev.yml'

We can deploy Cloud Foundry now:

$ bosh deploy

This will create VMs for the jobs, compile the packages and install dependencies. It will take several minutes depending on the server’s hardware condition. You can see output like:

Preparing deployment
binding deployment (00:00:00)
binding releases (00:00:01)
… …
Preparing package compilation
… …
Compiling packages
… …
Preparing DNS
binding DNS (00:00:00)
Creating bound missing VMs
… …
Binding instance VMs
… …
Preparing configuration
binding configuration (00:00:03)
… …
Creating job cloud_controller
cloud_controller/0 (canary) (00:02:45)
… …
Done                    1/1 00:08:41

Task 318 done

Started               2012-10-28 05:37:52 UTC
Finished              2012-10-28 05:49:43 UTC
Duration              00:11:51
Deployed `cf-dev.yml' to `bosh_director'

To check your deployment, you can use this command:

$ bosh deployments
| Name     |
| cf.local |
Deployments total: 1

You can also verify every the running status of VMs:

$ bosh vms
| Job/index                 | State   | Resource Pool | IPs         |
| acm/0                     | running | small         | |
| acmdb/0                   | running | small         | |
| cloud_controller/0        | running | medium        | |
… …
VMs total: 40

At this moment, Cloud Foundry has been completely installed. If you cannot wait to verify the installation, you can use vmc command to target one of the routers’ IP address and deploy a test web app on it (see subsequent section). Because there is no DNS available, you need to have at least these two lines in the hosts file of a vmc client machine and the machine running a browser to test the web app:

<router’s IP address>
<router’s IP address>  <youtestapp>

If the above testing works fine, your Cloud Foundry instance is working. The last thing is to take care of the load balancer and DNS. These two components are not part of Cloud Foundry’s components. However, they need to be set up properly in a production environment. So we briefly talk about how to set them up.

You can deploy either a hardware or software load balancer (LB) to distribute the load evenly to multiple instances of router components. In our sample deployment we have two routers. For a software LB, you can use Stingray Traffic Manager.  It can be downloaded from here:

A DNS server is needed to resolve the domain of your Cloud Foundry instance. Basically, the DNS server resolves a wildcard name like * to the IP address of the load balancer. If you do not have a LB, you can set up DNS rotation to resolve the domain to routers in a round robin fashion.

When the LB and DNS is setup properly, you can start to deploy apps on your instance.

Cloud Foundry has a command-line tool known as VMC. It can perform most of the operations on Cloud Foundry, such as configuring your applications, deploying them to Cloud Foundry and monitor the status of your apps. To install VMC, you must install Ruby and RubyGems (a Ruby package manager) on the computer on which you want to run VMC. Currently Ruby 1.8.7 and 1.9.2 are supported.  After that, you can install VMC by the below command ( more on vmc installation

$ sudo gem install vmc

Now, specify the target to your Cloud Foundry instance, the URL should look like, for example:

$ vmc target

Log in with the admin user’s credential, which is specified in the deployment manifest:

$ vmc login

Initially, you will be asked to set password for your account. After logging in, you get the information of your Cloud Foundry instance:

$ vmc info

Now, let’s create and deploy a simple hello world Sinatra application to verify the instance.

$ mkdir ~/hello
$ cd ~/hello

Create a Ruby file called hello.rb with following contents:

require 'sinatra'

get '/' do
"Hello from Cloud Foundry"

Save this file and we are about to upload this application:

$ vmc push

Complete the prompts like below

Would you like to deploy from the current directory? [Yn]:
Application Name: hello
Detected a Sinatra Application, is this correct? [Yn]:
Application Deployed URL []:
Memory reservation (128M, 256M, 512M, 1G, 2G) [128M]:
How many instances? [1]:
Bind existing services to 'hello'? [yN]:
Create services to bind to 'hello'? [yN]:
Would you like to save this configuration? [yN]:

After a while, you will see the output:

Creating Application: OK
Uploading Application:
Checking for available resources: OK
Packing application: OK
Uploading (0K): OK
Push Status: OK
Staging Application 'hello': OK
Starting Application 'hello': OK

Now, go visit the application’s URL: in your browser. If you can see the text, your application has been successfully deployed.

Congratulations, your Cloud Foundry instance had been completely set up. It is functionally identical to the

More on deploying Cloud Foundry on vSphere using BOSH:

Building Cloud Foundry on vSphere using BOSH Part 3

Installing micro BOSH and BOSH

When installing BOSH CLI is completed, we now start the installation of a micro BOSH. As mentioned before, Micro BOSH can be considered as a miniature of BOSH. While a standard BOSH has its components spread across 6 VMs, a micro BOSH in contrast contains all components in a single VM. It can be easily set up and is usually used to deployed small releases, such as BOSH. In this sense, BOSH is deployed by itself. As put it by the BOSH team, it is referred as “Inception”.

The below steps are based on the official BOSH document by adding more implementation details.

1) In the BOSH CLI VM, install the BOSH Deployer ruby gem.

$ gem install bosh_deployer

Once you have installed the deployer, you will see some extra commands appear after typing bosh on your command line.

$ bosh help
micro deployment [<name>]      Choose micro deployment to work with
micro status                   Display micro BOSH deployment status
micro deployments              Show the list of deployments
micro deploy <stemcell>        Deploy a micro BOSH instance to the currently selected deployment
--update                       update existing instance
micro delete                   Delete micro BOSH instance (including persistent disk)
micro agent <args>             Send agent messages
micro apply <spec>             Apply spec

NOTE: The bosh micro commands must be run within a micro BOSH deployment directory

2) In vCenter, under the view Home->Inventory->VMs and Templates, make sure the folders for virtual machines and templates are already created (see part II). These folders are used in the deployment configuration.

3) From the view Home->Inventory->Datastores, choose the NFSdatastore datastore we created and browse it.

Right click on the root folder and create a sub folder for storing virtual machines. In this example, we name it “boshdeployer”. This folder name will be the value of the “disk_path” parameter in our deployment manifest.
NOTE: If you do not have a shared NFS storage, you may use the local disks of the hypervisors as datastore. (However, please be aware that local disks are only recommended for an experimental system.) You can name the datastores as “localstore1” for host 1, “localstore2” for host 2, and so on. Later in the manifest file, you can use a wildcard pattern like “localstore*” to specify the datastore of all hosts. The “boshdeployer” folder should be created on all local datastores.

4) Download public stemcell

$ mkdir -p ~/stemcells
$ cd stemcells
$ bosh public stemcells

The output looks like this:

| Name                            | Url                                                   |
| bosh-stemcell-0.5.2.tgz         | |
| bosh-stemcell-aws-0.5.1.tgz     | |
| bosh-stemcell-vsphere-0.6.4.tgz | |
| micro-bosh-stemcell-0.1.0.tgz   | |
To download use 'bosh download public stemcell <stemcell_name>'.For full url use --full.

Download the stemcell of micro BOSH using below command:

$ bosh download public stemcell micro-bosh-stemcell-0.1.0.tgz

NOTE: The stemcell is 400-500MB in size. It may take a long time to download in a slow network. In this case, you can download using any tool (e.g. Firefox browser) which can resume a transmission from failures. Use the –full argument to display the full URL for download.

5) Configure your deployment (.yml) file, and save it under a folder with the same name defined in your .yml file. In our example, it is “micro01”.

$ cd ~
$ mkdir deployments
$ cd deployments
$ mkdir micro01

In the yml file, there is a section about vCenter. Enter the name of folders we created in Part II. The “disk_path” should be the folder we just created in the datastore (NFSdatastore).  The value of datastore_pattern and persistent_datastore_pattern is the shared data store name (NFSdatastore). If you use local disks, this could be the wildcard string like “localstore*”.

- name: vDataCenter
vm_folder: vm_folder
template_folder: template
disk_path: boshdeployer
datastore_pattern: NFSdatastore
persistent_datastore_pattern: NFSdatastore
allow_mixed_datastores: true

Here is a link of a sample yml file of micro BOSH:

6) Set the micro BOSH Deployment using:

$ cd deployments
$ bosh micro deployment micro01

Deployment set to '~/deployments/micro01/micro_bosh.yml'

$ bosh micro deploy ~/stemcells/micro-bosh-stemcell-0.1.0.tgz

If everything goes well, micro BOSH will be deployed within a few minutes. You can check the deployment status by this command:

$ bosh micro deployments

You will see your micro BOSH deployment listed:

| Name    | VM name                                 | Stemcell name                           |
| micro01 | vm-a51a9ba4-8e8f-4b69-ace2-8f2d190cb5c3 | sc-689a8c4e-63a6-421b-ba1a-400287d8d805 |

Installing BOSH

When the micro BOSH is ready, we can now use it to deploy BOSH, which is a distributed system with 6 VMs. As mentioned in previous section, we need to have three items: a stemcell as the VM template, a BOSH release as the software to be deployed, and a deployment manifest file for deployment-specific definition. Let’s work on them one by one.

1) First, we target our BOSH CLI to the director of the micro BOSH. The BOSH director can be thought of as the controller or orchestrator of BOSH. All BOSH CLI commands are sent to the director for execution. The IP address of the director is defined in the yml file we used to create micro BOSH. The default credential of BOSH director is admin/admin. In our example, we use the below commands for targeting micro BOSH and authentication:

$ bosh target
$ bosh login

2) Next, we download the bosh stemcell and upload to micro BOSH. This step is similar to downloading a stemcell of micro BOSH. The only difference is that we choose the stemcell for BOSH instead of micro BOSH.

$ cd ~/stemcells
$ bosh public stemcells

A list of stemcells is displayed; choose the latest stemcell to download:

$ bosh download public stemcell bosh-stemcell-vsphere-0.6.4.tgz
$ bosh upload stemcell bosh-stemcell-vsphere-0.6.4.tgz

If you have created a Gerrit account in Part II, skip step 3-7.

3) Sign up for the Cloud Foundry Gerrit server at
4) Set up your ssh public key (accept all defaults)

$ ssh-keygen -t rsa

Copy your key from ~/.ssh/ into your Gerrit account
5) Create and upload your public SSH key in your Gerrit account profile
6) Set your name and email

$ git config --global "Firstname Lastname"
$ git config --global

7) Install out gerrit-cli gem
8) Clone the release code from Cloud Foundry repositories using Gerrit. The below commands get the code of BOSH and Cloud Foundry, respectively.

$ gerrit clone ssh://<yourusername>
$ gerrit clone ssh://<yourusername>

We then create our own BOSH release:

$ cd bosh-release
$ ./update
$ bosh create release  --with-tarball

If there are local code conflicts, you can add “–force” option:

$ bosh create release  --with-tarball --force

This step may take some time to complete depending on the speed of your network. It first downloads binaries from a blob server. It then builds the packages and generates manifest files. The command’s output looks like below:

Syncing blobs…
Building DEV release
Please enter development release name: bosh-dev1
Building packages
Generating manifest...
Copying jobs...

At last, when the release is created, you will see something like below. Notice the last two lines indicate the manifest file and the release file.

Generated /home/boshcli/bosh-release/dev_releases/bosh-dev1-6.1-dev.tgz

Release summary
| Name           | Version | Notes | Fingerprint                              |
| nginx          | 1       |       | 1f01e008099b9baa06d9a4ad1fcccc55840cf56e |
| ruby           | 1       |       | c79b76fcb9bdda122ad2c52c3604ba950b482681 |

| Name           | Version | Notes       | Fingerprint                              |
| micro_aws      | 1.1-dev | new version | fadbedb276f918beba514c69c9e77978eadb65ac |
| redis          | 2       |             | 3d5767e5deec579dee61f2c131982a97975d405e |

Release version: 6.1-dev
Release manifest: /home/boshcli/bosh-release/dev_releases/bosh-dev1-6.1-dev.yml
Release tarball (88.8M): /home/boshcli/bosh-release/dev_releases/bosh-dev1-6.1-dev.tgz

9) Upload the created release to micro BOSH’s director.

$ bosh upload release dev_releases/bosh-dev1-6.1-dev.tgz

10) Configure BOSH deployment manifest. First, we get the director’s UUID information by doing:

$ bosh status
Updating director data... done
Target         micro01 ( Ver: 0.4 (00000000)
UUID           7d72eb71-9a98-4081-9857-ad7c7ff4ee33
User           admin
Deployment     /home/boshcli/bosh-dev1.yml

Now we are moving to the trickiest part of this installation: modifying the deployment manifest file. Since most BOSH deployment errors are caused by improper settings in the manifest file, we explain this in more details.

To get started, let’s get the manifest template from here:

Since the official BOSH document provides the specification of the manifest file, we assume you have gone through it before reading this article. We won’t go into every detail of this file; instead, we discuss some important items in the manifest file.


Below is an example of the network section.

networks:            #define networks- name: default
- reserved:        #ips you don’t want to allocate
- -
static:          #ips you will use
- -
cloud_properties: #the same network as all other vms.
name: VM Network

static: contains the IP addresses of BOSH VMs.
reserved: IP addresses BOSH should not use. It is very important to exclude any IP addresses which have been assigned to other devices on the same network, for example, storage devices, network devices, micro BOSH and vCenter host. During the installation, micro BOSH may spin up some temporal VMs (worker VMs) for compilation. If we do not specify the reserved addresses, these temporal VMs may have conflicts of IP address with existing devices or hosts.
cloud_properties: name is the network name we defined in vSphere (see part II).

Resource Pool

This section defines the configuration (cpu, memory, disk and network) of VMs used by jobs. Usually, jobs of an application vary in resource consumption. For example, some jobs require more memory than others, while some jobs need more vCPUs for computing-intensive tasks. Based on the actual requirements, we should create one or more resource pools. One thing to note is that the size of all pools should be equal to the total number of job instances defined in the manifest file. When deploying BOSH, since there are 6 VMs (6 jobs) altogether, the size of all pools should add up to 6.

In our manifest file, we have 3 resource pools:

Pool Name Size Configuration Jobs
small 3 RAM:512MB, CPU:1, DISK:2GB nats,  redis, health_monitor
medium 2 RAM:1GB, CPU: 1, DISK: 8GB postgres, blobstore
director 1 RAM:2GB, CPU: 2, DISK: 8GB director


This section defines the worker VMs created for package compiling. In a system with limited resource, we should reduce the number of concurrent worker VMs to ensure a successful compilation. In our example, we define 4 worker VMs.


This section contains a very useful parameter: max_in_flight. It tells BOSH the maximum jobs can be installed in parallel. In a slow system, try to reduce this number. If you set this number to 1, it means jobs are deployed sequentially. For BOSH deployment, we recommend setting this number to 1 to ensure BOSH can be installed successfully.


There are six jobs in the BOSH release. Each job occupies a VM. Depending on the nature of a job and the resource consumption, we allocate jobs to various resource pools. One thing to note is that we need to assign persistent disks to three jobs: postgres, director and blobstore. Without persistent disks, these jobs will not work properly as their local disks get full very soon.

It is a good idea to fill in a spreadsheet like below to plan your deployment. Based on the spreadsheet, you can modify the deployment manifest.

Job Resource_pool IP
nats small
postgres medium
redis small
director director
blob_store medium
health_monitor small

Based on the above table, we created a sample deployment manifest, you can download it from here:

11) After updating the deployment manifest file, we can start the actual deployment by below commands:

$ bosh deployment bosh_dev1.yml
$ bosh deploy

This may take some time to finish depending on your network condition and available hardware resources. You can also check out the vCenter console to see VMs being created, configured and destroyed.

Preparing deployment
Compiling packages
Binding instance VMs
postgres/0 (00:00:01)
director/0 (00:00:01)
redis/0 (00:00:01)
blobstore/0 (00:00:01)
nats/0 (00:00:01)
health_monitor/0 (00:00:01)
Done                    6/6 00:00:01

Updating job nats
nats/0 (canary) (00:01:14)
Done                    1/1 00:01:14
Updating job director

director/0 (canary) (00:01:10)
Done                    1/1 00:01:10

If everything goes well, you will eventually see something like this:

Task 14 done
Started                         2012-08-12 03:32:24 UTC
Finished       2012-08-12 03:52:24 UTC
Duration      00:20:00
Deployed `bosh-dev1.yml' to `micro01'

This means you have successfully deployed BOSH. You can see your deployment by doing:

$ bosh deployments
| Name  |
| bosh1 |

You can check all virtual machine’s status by doing:

$ bosh vms

If nothing goes wrong, you will see status of VMs like:

| Job/index        | State   | Resource Pool | IPs          |
| blobstore/0      | running | medium        | |
| director/0       | running | director      | |
| health_monitor/0 | running | small         | |
| nats/0           | running | small         | |
| postgres/0       | running | medium        | |
| redis/0          | running | small         | |
VMs total: 6

Building Cloud Foundry by BOSH:

Building Cloud Foundry on vSphere using BOSH Part I

First, let’s discuss the hardware and software prerequisites of a Cloud Foundry installation.

1) 64-bit Ubuntu 10.04 LTS, better in ISO format.
2) vSphere V4.1 or V5.x,
3) vSphere client
4) vCenter (installed on a Windows 2008 R2 64bit or a Windows 2003 server, physical or virtual machine)

Provided all nodes are VMs, the below table shows the number of VMs required:

# of nodes OS can be physical machine
BOSH CLI 1 Ubuntu Y
vCenter+vSphere Client 1 Win2008 Y, can be split into two nodes
micro BOSH 1 Ubuntu N
BOSH 6 Ubuntu N
Cloud Foundry 34 Ubuntu N, see notes
Total: 43

It should be noted that the number of nodes of Cloud Foundry in the above table reflects the minimum number of nodes required. This number may vary depending on the actual scale of a Cloud Foundry deployment. There are generally two rules to be considered when choosing of the hardware configuration:

1) The total number of vCPUs should not be more than twice the total number of physical cores. In a production system, this ratio should better be below 2.
2) The total memory of all VMs should be less than the physical memory of all hypervisors.

Here is an example configuration of hardware provided each VM has 4 GB memory and 1 vCPU (assume vCPU : CPU ratio is 1):

6 x servers, each has 8 CPU cores and 32GB RAM.

For an experimental system, we saw a successful deployment on a single server with the below configuration (assume each VM has 256 MB memory):

1 x server, 8 CPU cores and 16GB RAM

Besides the servers, storage is another key factor in a cloud platform. The storage should better have 200GB or more usable space to keep the images of all VMs. It is recommended the use of fast shared storage in a production system. NFS is the most commonly used protocol to share the storage between the hypervisors. In an experimental environment, a Linux-based NFS server can be used in lieu of a dedicated storage. Though local disks on hypervisors may work in a POC type environment, it is generally not recommended for any production system.

The last thing we should plan for is the network. In a lab environment, we could simply place all nodes on the same network. However, in a production system, components of Cloud Foundry should be properly allocated to various VLANs for security and management purposes. In this article series, we are not going into the details of networking. For illustration purpose, we will have five VLANs during our deployment:

VLAN Nodes
Management VLAN Hypervisors and NFS storage
CF-internal VLAN BOSH VMs and VMs of Cloud Foundry
CF VLAN BOSH VMs, and VMs of cloud foundry
Service VLAN For LB, dual-home routers
Public VLAN For LB, incoming requests

The installation of a Cloud Foundry instance has the following four parts:

1) Install the BOSH CLI tool in an Ubuntu 10.04 OS. This could be either a physical or virtual machine.
2) Install the micro BOSH. The micro BOSH is a VM that contains all components of BOSH. It has the same functions of a standard BOSH. However, it has limited disk space to store multiple releases. The purpose of having a micro BOSH is to install the BOSH, which is a distributed system itself.
3) Install BOSH by micro BOSH. BOSH usually consists of 6 VMs with each component resides on a node. One of the nodes called blobstore has a big disk size which can hold larger releases.
4) Install the Cloud Foundry instance by BOSH.

More blogs on installing Cloud Foundry using BOSH:

Cloud Foundry BOSH Introduction

Cloud Foundry is the first open source PaaS in the industry. It supports multiple frameworks, multiple services and multiple cloud providers. BOSH was originally created in the context of the Cloud Foundry project. Nevertheless, it is a general tool chain for deployment and lifecycle management of large scale distributed services. In a few subsequent blogs, I will walk you through the process of installing the Cloud Foundry platform using BOSH.

Cloud Foundry contains a number of components. The most important ones are Cloud Controller, NATS, Router, Health Manager and DEA. An introduction of these components can be found here: . The components are designed in a way that makes the system horizontally scalable. This means a Cloud Foundry instance can have one or more copies of each component to meet the load needed by a cloud. The components can be distributively deployed on multiple nodes.

BOSH is the tool we use to deploy the components of Cloud Foundry onto distributed nodes. (In a virtualized environment, we use the term “node” interchangeably with “virtual machine” or VM). Before we move to the details of a real deployment, let’s introduce briefly how BOSH works when it deploys a system. We strongly suggest you read the official BOSH document here. (

BOSH is a recursive acronym for Bosh Outter SHell. In contrast to the “Outter Shell”, the system being deployed and managed by BOSH is called the “Inner Shell”. The below diagram illustrates a simplified model of BOSH.

BOSH can be considered as a server or a robot which orchestrates the deployment process of a distributed system. There is a ruby tool which can interact with BOSH Command Line Interface (CLI). Before BOSH starts to deploy a system, it needs three prerequisites: a stemcell, a release (the software to be installed), and a deployment manifest. Let’s look at these three items in more detail.

 Stemcells:  In a cloud platform, VMs are usually cloned from a template. A stemcell is a VM template containing a standard Ubuntu distribution. A BOSH agent is also embedded in the template so that BOSH can take control of VMs cloned from the stemcell. The name “stemcell” originated from biological term “stem cells”, which refers to the undifferentiated cells that are able to grow into diverse cell types later. Similarly, VMs created by a BOSH stemcell are identical at the beginning. After inception, VMs are configured with different CPU/memory/storage/network, and installed with different software packages. Hence, VMs built from the same stemcell template behavior differently.

 Releases: A release contains collections of software bits and configurations which will be installed onto the target system. Each VM is deployed with a collection of software, which is called a job. Configurations are usually templates which contain parameters such as IP address, port number, user name, password, domain name. These parameters will be replaced at deploy time by the properties defined in a deployment manifest file.

 Deployments: A deployment is something that turns a static release into runnable software on VMs. A Deployment Manifest defines the actual values of parameters needed by a deployment. During a deployment process, BOSH substitutes the parameters in the release and makes the software run on the configuration as planned.

When the above 3 items are ready, they will be uploaded to BOSH by the BOSH CLI tool. After that, a BOSH installation of a distributed system typically has the following major steps:

1) If some packages in the release require compilation, BOSH first creates a few temporal VMs (worker VMs) to compile them. After compiling the packages, BOSH destroys the worker VMs and stores the binaries to its internal blobstore.

2) BOSH creates a pool of the VMs which will be the nodes where the release to be deployed on. These VMs are cloned from the stemcell with a BOSH agent installed.

3) For each job of the release, BOSH picks a VM from the pool and updates its configuration according to the Deployment Manifest. The configuration may include IP address, persistent disk size etc.

4) When the reconfiguration of the VM is completed, BOSH sends commands to the agent inside each VM. The commands tell the agent to install software packages. During the installation, the agent may download packages from BOSH and installs them. When the installation finishes, the agent runs the starting script to launch the job of the VM.

5) BOSH repeats step 3-4 until all jobs are deployed and launched. The jobs can be deployed simultaneously or sequentially. The value “max_in_flight” in the manifest file controls this behavior. When it is 1, it means the jobs are deployed one by one. This value is useful for a slow system to avoid timeout caused by resource congestion. While it is greater than one, it means jobs are deployed in parallel.

More on deploying Cloud Foundry on vSphere using BOSH: