# Video on demand

*See also: [web/vod](/web/vod.md)*

A **video on demand** (commonly **VOD** or **VoD**) is a piece of [video](README.md) that isn't [live](live.md). Usually it was live at one point, and was stored as a past broadcast or highlight for later playback.

## HLS VOD Storage

### Provisioning
In the [video/swift_management][1] (git.internal warning) repo there are two main scripts used for provisioning, `prep_devices.py` and `add_devices.py`.

`prep_devices.py` preps new devices and formats them. Formatting can also be used to wipe drives. A typical usage might look like:

```bash
python prep_devices.py --prep --drive-file object.list swift-object-025.sfo01.justin.tv swift-object-026.sfo01.justin.tv
```

The `--prep` flags runs `swift-drive-prep` before formatting the drives.All drives on the listed hosts that are enumerated in object.list are formatted.`object.list` contains a newline delimited set of drives to format, for example:

    sdc
    sdd
    sde
    sdf
  
Swift-drive-prep only needs to be run the first time a drive is formatted.
 
After a drive is formatted, it can be added to a ring with add_devices.py:

```bash
python add_devices.py --master-node swift-proxy-001.sfo01.justin.tv --ring-type object --drive-list object.list swift-object-025.sfo01.justin.tv swift-object-026.sfo01.justin.tv
```

`--master-node` specifies the host on which the swift master API is running, `--ring-type` specifies the ring that you're adding to and the `--drive-list` directive is the same as above.This adds all the newly formatted drives to the object ring, with zone being a hash of their rack name and weight being the size of the device in GB. When devices are added in this way, they slowly build up to their final weight starting from weight 0, such that writes are relatively evenly balanced between new and old devices.

To remove a device from a ring, simply do something like:

    DELETE http://swift-proxy-001.sfo01.justin.tv:8077/swift/object/7?instant=true

This deletes the device with id 7 in the object ring, and does so instantaneously.If you want a gradual drain of a device, leave the instant parameter off.

[1]: https://git.xarth.tv/video/swift_management

### Stats and Capacity Planning
Dashboard: http://graphite.internal.justin.tv/dashboard#swift_dashboard

### Monitoring
In addition to standard node/service checks, we will have the following checks (more to be added):

```bash
python check_graphite_data.py --warn=:1 --crit=:10 --host=graphite.internal.justin.tv --metric='stats.counters.vod_pusher.lost_chunk.count'
```

Checks to see if there have been any chunks lost.

```bash
python check_graphite_data.py --warn=:1000 --crit=:3000 --host=graphite.internal.justin.tv --metric='stats.timers.swift.proxy-server.object.PUT.201.timing.upper_90'
```

Checks to see that response times on PUT requests are within acceptable thresholds.

When things go wrong with the swift cluster, the culprit can be any number of things:
* Services being down on any of the proxy, object, or account nodes.
* Timeouts on any of the layers.  The most common timeout is the object server trying to talk to the container server.
* A large number of devices being dead or unmounted on a node causing the object server to be slow.
* Memcached being down on a proxy server.
* Application error.

In general, the logs for each server can be found in `/var/log/jtv/{service}.log`, where `service` is something like `object_server`, `proxy_server`, etc.

The swift logs are in general very readable and contain information about whether or not requests timed out and which upstream servers are being talked to.
 
## Architecture

### Swift
We use a standard swift architecture with each rack being a zone, with n=3 redundancy. Weights are set to be equal to device size in GB. Currently we have 16 proxy boxes, 8 container boxes and 24 objects boxes. Machines have hostnames of the format `swift-{role}-%03d.sfo01`. Devices are formatted xfs. We run the standard disk auditor to automatically unmount bad drives. All vods are stored under the 'system' account.

### Past Broadcasts
To obtain the container name for any broadcast, we hash the directory name of the live broadcast, which is of the form `{channel}_{broadcast_id}_{job_id}`, to a 4 digit hex string. The container name is vod_{digest}. The directory structure under the container is exactly the same as live, with the main subdirectory being the live directory name and each format being a subdirectory of that. As the live broadcast is recorded, each format's chunks are concatenated into buffers which are written to swift periodically in bundles of some multiplication factor (currently 15). A manifest is written that uses url parameters to reference byte ranges in these super-chunks to present smaller chunks to the player. Url parameter byte offsets are interpreted by a varnish layer sitting in front of the swift proxies which convert them into range headers that swift understand. Varnish also converts 206 responses from swift back into 200s.  A thumbnail directory is also created with up to (currently) 4 thumbnails randomly distributed throughout the broadcast.

### Highlights
The highlight worker can be found under vod_highlight_worker.py in the video/workers repository.
Highlights are processed by the VOD highlight worker. An external service (currently Rails) creates a new VOD row with attributes copied from the parent archive, except with type set to highlight and offset and duration set to the highlight's offset and duration. The id of the new VOD is passed to the worker through rabbit which pings the service api for the details of the highlight and generates highlight manifests of the form highlight-{vod_id}.m3u8 (where vod_id is the id of the new highlight VOD). A manifest is created for each format and uploaded to the same directory as the parent archive.

### Deletion
The deletion worker can be found at the [video/workers][2] (git.internal warning) repository at [`vod_deletion_worker.py`][3] (git.internal warning).

The deletion worker takes a list of VOD ids to delete.  Which VODs to delete is completely managed by an external service (Rails).  To delete a VOD, we ping the service api to get the swift subdirectory to remove, then:

[2]: https://git.xarth.tv/video/workers
[3]: https://git.xarth.tv/video/workers/blob/master/vod_deletion_worker.py

#### Archives
We delete the archive manifests and the chunks which are not represented in at least one highlight manifest.  If there are no highlight manifests, we delete the thumbnails as well.

#### Highlights
We delete the corresponding highlight manifests and then, if the archive manifest is no longer present we do a sweep of archive deletion.
