I am happy to announce that we recently open-sourced under the BSD license, the tools I’ve developed and used for my research at UCSB, to automatically deploy and configure in high availability mode a full big data stack in the cloud. The tools automate the deployment and configuration of Apache Mesos cluster manager and Spark, Hadoop, Storm, Kafka, and Hama data processing engines in Eucalyptus private cloud. High Availability mode is also supported (A full functioning Zookeeper cluster, and Mesos masters/ secondary masters are setup automatically).
These tools have been severely tested with Mesos, Spark, Map-Reduce, and Storm for the specific versions specified on the readme file. They also provide the option to deploy a Spark standalone cluster on Eucalyptus if you don’t need Apache Mesos. The only prerequisite is that you have a running Eucalyptus cloud and root access to your cluster. Everything else is very easily configurable on the scripts and you only need to run a simple command with arguments and wait until everything is done for you!
If you want to use on Amazon EC2 you will need to change the connector (Notice though that if you only care about Spark/ Mesos deployment on EC2 a better starting point might be this github repo instead). Similarly, if you need to use with more recent versions you’ll need to modify a couple of lines on the configuration files. I’ll try to support any reasonable requests but in general you are on your own 🙂