# terraform-spark #

## Installation: ##

Use pip!

``` pip install git+ssh://git@git-aws.internal.justin.tv/ids/terraform-spark.git ```


To submit jobs or connect in a REPL (the `sparkadmin submit` and
`sparkadmin shell` commands) you'll need to have spark installed:

```
brew install apache-spark
```

## commands ##

Clusters are identified by name. This name is prefixed in AWS
resources, and it is used to identify the cluster in a terraform
statefile. Statefiles, by default, are stored in the working directory
from which `sparkadmin` is invoked.

### example ###

```shell
# set auth-related environment variables
$ export AWS_ACCESS_KEY_ID=<take a wild guess>
$ export AWS_SECRET_ACCESS_KEY=<guess again>

# set name of keypair to use when turning on new instances
$ export SPARK_KEYPAIR=Dev/spencer

# set path to keypair for SSH access when provisioning
$ export SPARK_SSH_KEY=~/.ssh/aws/Devspencer.pem

# Create a cluster with 5 workers
$ sparkadmin create examplecluster -n 5
...<lots and lots of terraform gobbledygook redacted>...

# Connect to the fresh new cluster
$ sparkadmin shell examplecluster
scala>

# play with the scala console
scala> print("hello world");
hello world

# exit and destroy the cluster
scala> exit
$ sparkadmin destroy -d examplecluster
destroying cluster examplecluster
Do you really want to destroy?
  Terraform will delete all your managed infrastructure.
  There is no undo. Only 'yes' will be accepted to confirm.

Enter a value: yes

# get on with your life
$ exit
```

### usage ###

    Usage: sparkadmin [OPTIONS] COMMAND [ARGS]...

    Options:
    --dir DIRECTORY  directory to store state files (default is current working
    directory)
    -q, --quiet      silence debug output
    --help           Show this message and exit.

    Commands:
    create   launch a new spark cluster
    destroy  destroy a spark cluster
    list     list clusters with statefiles to stdout
    plan     print terraform's plan for the given inputs
    resize   resize an existing spark cluster
    shell    connect to a spark shell
    submit   submit a job to a cluster
    webui    open the web UI administration page for this...
    whereis  get the URI of the spark master for a cluster...


### environment variables ###

If you don't want to pass in your SSH key and keypair name all the
time, you can set environment variables:

- `SPARK_KEYPAIR`: the AWS keypair name to use when provisioning new
  cluster nodes
- `SPARK_SSH_KEY`: the SSH key corresponding to use when SSHing onto
  new cluster nodes for provisioning
- `SPARK_STATEFILES`: a directory to store cluster statefiles in
  instead of the current working directory


### more details ###

`sparkadmin --help` will list commands available. `sparkadmin
<command> --help` will describe a particular command.
