Running datacollection to post to Clio
======================================

**This version of datacollection has been forked to make log_min_duration_statement to 1ms.**


**Run directly on box:**
```usr/bin/stdbuf -oL -eL /var/lib/postgresql/datacollection.sh ${schema} ${time_type} ${collection_time} $::fqdn ${database_name} | logger -p local2.info -t datacollection```

######examples of variables:
* schema = justintv_dev
* time_type = m (minutes), t (hours)
* collection_time} = 1
* $::fqdn = fqdn of database
* database_name = sitedbprod
* database_type = master or replica

**Run Via puppet**

Branch and Change run datacollection variable to true 

```$run_datacollection = true```

Run Puppet branch

healthcheck
===========

There are two steps to a healthcheck:

1. collecting the data from the server.
2. processing it on a separate server to generate graphs and reports.

Step 1 is handled by the code in datacollection/ , step 2 by the code in Tuneup/
Each has its own README.md. Briefly, here's the end-to-end process for using them
together:


How to Run
----------

1. Copy datacollection/datacollection.py and datacollection/datacollection_startup.py to the machine where you want to gather data. You'll run these scripts as postgres user, so put those files in the ~postgres directory and make sure they are owned by postgres.

1. Follow the instructions in datacollection/README.md
```/var/lib/postgresql/datacollection.sh ${schema} ${time_type} ${collection_time} $::fqdn ${database_name}```

1. At the end, datacollection.py will tell you where it placed its logs.
Unless you're CPU-starved, it's a good idea to compress them before copying
them to the box where you will run Tuneup.

cd YOUR_LOG_DIR
for FILE in ```datacollection-*.csv; do echo $FILE; nice gzip $FILE; done```

1. Copy the datacollection-*.gz files to the Tuneup box. Recommended command:

    ```nice rsync -vaP -e 'ssh -c arcfour' datacollection-*gz YOUR_USERNAME@YOUR_TUNEUP_HOSTNAME:/YOUR_DIRECTORY```


1. Wait for Tuneup to complete.

1. Download the output to your computer, so you can view the graphs. Skip the sar-related files, the monitor directory, and especially the log directory, because they're big. If you have rsync on your computer, you can do this as follows:

    ```rsync -va --exclude monitor --exclude '*sar*' --exclude 'logs' YOUR_TUNEUP_HOSTNAME:/YOUR_TUNEUP_DIRECTORY/OUTPUT .```


Postgres Logs Not Showing Up 
-------------

1. Log into data-dev box, Located on the twitch-web-aws.
2. In the ~/postgres/healthcheck directory remove the analysis.pid file.
3. The datadev box will start to analyze the files in the SQS queue.

