## Login Failures ##

This directory houses scripts which counts up SSH login failures by
source IP, target user, and target host. They work on Spark, pulling
data from S3.

The primary utility of these is really didactic - the script is
written in both Scala and Python to highlight how to write jobs in
both cases.

*Try to use Scala if you can*. Scala is *much* more performant than
Python. These two scripts are nearly identical, but the Python
version is about *7x slower*. That's the difference between a 15
minute job and a 2 hour job. We also found that matching the scala
version to the version on the cluster slightly improved performance;
at time of writing, that was scala 2.10.

### Running the Python script ###

If you've got a Spark master node at `$SPARK_MASTER`, you'd do this:

    scp ./python/sshd_login_failures.py root@$SPARK_MASTER:~/
    ssh root@$SPARK_MASTER "/root/spark/bin/spark-submit sshd_login_failures.py <aws_access_key_id> <aws_secret_key>"


### Running the Scala job ###

If you've got a Spark master node at `$SPARK_MASTER`, you'd do this:

    cd scala
    sbt package
    scp ./scala/target/scala-2.10/ssh-login-analysis-in-scala_2.10-1.0.jar root@$SPARK_MASTER:~/
    ssh root@$SPARK_MASTER \
        "/root/spark/bin/spark-submit \
        --class \"SSHDLoginFailures\" \
        ssh-login-analysis-in-scala_2.10-1.0.jar \
            <aws_access_key_id> \
            <aws_secret_key>"


### Questions? ###
Email spencer@twitch.tv for explanations of anything going on in here.
