DATABASE HEALTH CHECK DATA COLLECTION
=====================================

These scripts help you collect data so that PostgreSQL Experts can do a full analysis on your current query load, slow queries, resource usage, and database statistics.

PREREQUISITES
==============

In order to do this, some preparation is required. This is all checked at the beginning of data_collection. If you are
not confident about any of the below, please carefully watch the beginning of the script.

It is strongly recommended that you run using Python 2.6 or 2.7. The datacollection script has not been tested with other Python versions.

In addition, the script will fail if it detects any of the following is not followed.

* You must have logging_collector turned "on" in your PostgreSQL configuration.  This requires a restart of the PostgreSQL server,
so plan it accordingly.

* psql and pg_dump must already be in the $PATH for the postgres user.

* Make sure the partition for the activity logs for postgres (generally either PGDATA or /var/log) has several gigabytes
of space available. Sanity-check that this path exists and is writeable by the postgres user.

* The postgres user must be able to rewrite the postgresql.conf file.

* The utility "sar" must be installed.  On Linux, this is part of the "sysstat" package.

* On the PostgreSQL server box, it is desirable to have screen installed (or some other means disconnecting, and then
later coming back to, persistent connections). This allows you to reconnect and manage the collection process as it proceeds.
This will not cause the script to fail if you do not have it.

Note that running the data collection will temporarily change what and how you log database traffic.  If you have system
monitoring which is watching the database activity logs, this may trigger alerts.

RUNNING THE SCRIPT
==================

There is a helper script called datacollection_startup.py. It is purely optional and will give prompts inorder to build the
line (without extra flags) to call the main script.

Copy the file/s into a directory postgres has access to and can write to.

From that directory, as the postgres user, run:

python datacollection_startup.py  (This is an optional file and does not need to be copied over.

After answering some settings questions, this will give you the line to copy and paste to start the script. Always spot check this line.


If you would prefer to do it manually you may do the following:

	python datacollection.py -d DATABASENAME -c CONFIGFILE -t/-m TIME

Where DATABASENAME is the name of your production database(see below for muliple databases, CONFIGFILE is the full path to the postgresql.conf file, and
TIME is the amount of hours(-t)/minutes(-m) for the data collection run.

If the server has more than one database type ** as the database name. (This looks at all the databases on the server except
template0, template1 and postgres)

If you have more than one database and need to customize which databases will be included, use "*+" as the database name.
Then create an OUTPUT directory and create a document called OUTPUT/check_databases.txt with each database name on a new line.

The following are extra, optional flags that may be added.
=========================================================

-p PASSWORD

This option allows you to add a password. If your postgres installation requires a password, use this option.

-o or --oldpg

If the database you are running this on is pre 9.1 please add the -o or --oldpg flag when running the script.

-r

If we need a reset.

--clear

Use this option if you are running the program a second time and do not want to delete the data created by the last run.
This will remove and OUTPUT/ data created (except for the check_databases.txt file) and move any sar data into the OUTPUT/
dir.

Finishing Up
============

While the script is running, monitor it for error messages, or for any effect on production database usage.  If you need
to abort, press CTRL-C

Further documentation on the script is available using the --help command switch.

You can scp or rsync the OUTPUT/ directory off the server. Please remember to go into the log file compress datacollection-*.csv and scp
or rsync those as well. Once we safely have all this data, you may delete it from your server.

Please see the README-output.md file to see what files are created by this script.

License
=======

Datacollection.py and its associated help scripts and documentation are Copyright 2014 PostgreSQL Experts Inc., all rights reserved.  Current PostgreSQL Experts clients are granted a limited, revokable license to use datacollection.py on their own servers.
