Recently I was reading Mining the social web book and I was greatly impressed by both code and how easy it was to set up my environment.

Here is book's code repository with instructions how to setup virtual machine.

Setting up machine is done with Vagrant - virtualization wrapper (and more) and Chef - configuration management framework.

I'm used to using shell scripts and fabric for automation. I've also used vagrant for some of my development work and for trying out new crazy things without the risk of breaking my OS or polluting it.

Vagrant is an amazing tool. It solves a big "works on my machine" problem by easy virtualization for developers. In few minutes you can create virtual machine, same as your production environment, and start development. And guess what? You can share those machines as boxes with your team members. No more project configuration taking DAYS and struggling with cryptic problems every time.

Vagrant boxes are configured using Vagrantfile with special API in ruby. But you even don't have to use ruby, because it's so well documented and there are so many examples in the internet.

Easy to setup, easy to learn and use. In the past I had some minor problems with configuration on SabayonLinux, but when I switched to more mainstream distribution - Ubuntu 14.04 - it works smoothly.

However up until recently I haven't used vagrant with proper provisioner. In vagrant lingo Configuration Mangagement (CM) are called provisioners.

This excellent book, I mentioned earlier, inspired me to dive into automated provisioning with vagrant and automate it for one of my projects called "Hypergraph".

I decided to use ansible instead of chef which is simpler but also powerful. Here you can read comparison between the two or why ansible is much better than shell scripts.

For me ansible is much simpler and cleaner than shell scripts. And if I had to scale my app... no problem. There is another really big advantage of using it: ansible operations are idempotent. So you can do them how many times you want, if something is already done, it won't be executed again, no more mess from executing scripts too many times.

So... how my automation went?

If you are impatient, take a look.

It went exceptionally well. Ansible was easier to learn than I thought. I didn't have any bigger problems with installing things. The most troublesome thing for me was automating starting long running processes, such as IPython notebook server. But I eventually found a good and easy way to do it by using upstart.

How to do it?

Install all the things:

  • virtualbox
  • vagrant
  • ansible

On Ubuntu:

$ sudo apt-get install virtualbox
$ sudo apt-get install vagrant
$ sudo apt-get install ansible

Choose a proper box. If you are not sure what is box, read about them in vagrant boxes documentation. You can check out list of boxes or vagrant cloud.

I've chosen Ubuntu 14.04 because it's LTS and the same distro as I use every day on my laptop.

  config.vm.provider :virtualbox do |vb, override| = "trusty"
      override.vm.box_url = "link to box"
      vb.memory = 2048 

Setup network, synced folders, machine properties (hostname, RAM).

Adding ansible provisioning is as simple as:

config.vm.provision "ansible" do |ansible|
    ansible.playbook = "provision/vagrant.yml"

It was super easy. You can see full Vagrantfile here.

With ansible:

  • install basic packages needed for python development
  • install packages needed for IPython and matplotlib
  • pull the hypergraph sources from github
  • create virtualenv and install all python dependencies
  • create an upstart job
  • start ipython notebook server on private network

To run ipython notebook I used upstart. What it is? Upstart is an event-based replacement for the /sbin/init daemon, which handles:

  • starting of tasks and services during boot
  • stopping them during shutdown
  • supervising them while the system is running.

Why upstart? Because it was already there, Ubuntu ships with upstart and creating scripts in upstart is really easy.

This is my script for running IPython Notebook server.

description "IPython Notebook"

stop on runlevel[06]

respawn limit 10 5 # respawn up to 10 times, waiting 5 seconds each time

chdir /code
exec /env/bin/ipython3 notebook\
 --no-browser --ip=

Ansible is configured in yaml. It's almost too easy to be true.

Here is an example of my ansible tasks:

- name: Install Ipython dependencies                                                        
  sudo: yes                                                                                 
  apt: pkg= state=installed update_cache=yes                                      
  with_items: ipython_packages                                                              

- name: Pull sources from the repository.                                                   
  git: repo="link to repo" dest="/code/"

How to use it to set up my project?

To try out my project you only have to do this:

$ git clone\
$ cd hypergraph-provisioning && mkdir hypergraph
$ vagrant up

Wait for it, wait for it... it might take several minutes to install all dependecies and compile all scientific libraries such as scipy.

Voila! Go to browser and look at your shiny ipython notebook server :).