Debugging
=========

Each section in this document is a symptom that you may be observing. The content of the section gives hints to common causes and tools to investigate the issue.

Connections failing for unknown reasons
---------------------------------------
- A common cause is exhausting connection pools.
- HAProxy connection pools, you can check the HAProxy stats endpoint
  The "Sessions" group is interesting, particularly the "Cur" and "Max" columns
  - haproxy-tmi: http://{hostname}:2001/stats
  - haproxy-backend: http://{hostname}:2015/stats

- Linux only is able to handle 65535 connections to any (source-ip, source-port, dest-ip, dest-port)
  connection table. So sometimes this gets exhausted.
  - `netstat -tn | grep -E ":{port}[^0-9]" | wc -l` (this will double-count since it includes ingoing/outgoing conns)

- Check the state of TCP connections for a process `sudo lsof -p {pid} | grep -i tcp`

Slow response times for one of our http services
------------------------------------------------
- Use graphite to verify slow response times and narrow your investigation by endpoint and time period
- If you observe that our service is not slow, it's possible that we are having DNS performance issues and the external service is doing an internal DNS lookup (internal lookups are not cached)
  - To verify and measure DNS latency, ssh into the external service which is calling us and curl our host:
    - `curl --trace-time -v http://tmi.twitch.tv/`
    - Look at timings prior to connecting to our server
