# Object Level Encyption

OLE provides a Go library to encrypt and decrypt objects using an AWS KMS CMK and envelope encryption using data keys generated by KMS. OLE encodes encrypted objects using protobuf to include metadata required for decrypting the data.

OLE will reuse data keys used for envelope encryption and will cache plaintext data keys in memory, local to each process, in order to avoid having to call out to KMS for each encrypt/decrypt call. In order to use cached data keys, use the same encryption context for all data that can be encrypted using a reused data key.

Ole uses [envelope encryption](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#enveloping).

OLE will not manage CMKs nor authorization grants in KMS nor IAM access. Authorization grants for decrypting encrypted data keys is provided by KMS grants and the encryption context as constructed by the publisher (the service encrypting the data).

Service owners are responsible for managing their own KMS CMK, grants, and passing in the correct encryption context and CMK configuration to OLE.

## Contributing to OLE

Take a look at the wiki article for getting started with Amazon build systems
and contributing to OLE:

https://wiki.twitch.com/display/SSE/Contributing+to+OLE

## Using OLE

Take a peek at
[`example_kms_test.go`](https://code.amazon.com/packages/TwitchOLE/blobs/mainline/--/ole/example_kms_test.go)
for an example on how to initialize the client and use it to encrypt and
decrypt an object.

* `ole.KMSClient` manages the key caches and should be instantiated once per
process and reused.

* `Read()` and `Write()` calls on each encryptor instance are not threadsafe

* users should call `NewEncryptor()` and `NewDecryptor()` for each object to
  encrypt/decrypt.

* Each encryptor/decryptor should only be written to / read from once.

## Encryption Context

Encryption contexts are used in OLE to specify what keys are used to
encrypt/decrypt objects. When using the same encryption context to encrypt
objects, OLE will try to reuse keys cached in memory.

Encryption contexts also provides granular permissions to decrypt objects by
using [Grant Constraints with AWS
KMS](https://docs.aws.amazon.com/en_pv/kms/latest/developerguide/grants.html#grant-constraints).
For example, for an object with many fields that have different levels of
classification may look like this:

```
message ApplaudEvent {
  sint64 Pixels
  string TransactionID
  string ToUserID
  string PublicMessage
  oneof from {
    User User // public applaud event
    User AnonymousUser // anonymous applaud event
  }
}
```

One consumer may have access to decrypt the `AnonymousUser` and `ToUserID` fields
when another consumer only have access to decrypt the `ToUserID` field. This
can be accomplished by including a key in the encryption context like this:

```
{
  "Message": "ApplaudEvent",
  "MessageField": "AnonymousUser"
}
```

In combination with a grant constraint that specifies the `Message` and
`MessageField` values.

## Key management

Users of OLE should grant `kms:Decrypt` access to consumers and
`kms:GenerateDataKey` access to producers via IAM, and use [Grant
Constraints](https://docs.aws.amazon.com/en_pv/kms/latest/developerguide/grants.html#grant-constraints)
to establish granular access to individual objects.

Below is an example to decrypt blobs with Encryption Context MessageField=AnonymousUser

```
{
  "Sid": "Enable IAM Consumer Decrypt",
  "Effect": "Allow",
  "Action": "kms:Decrypt",
  "Resource": "arn:aws:kms:us-west-2:505345203727:key/SOME-ID",
  "Condition": {
    "ForAnyValue:StringEquals": {
      "kms:EncryptionContext:MessageField": "AnonymousUser"
    }
  }
}
```

## Metrics

OLE allows service owners to bring their own telemetry reporter. The client can be configured using `ole.KMSOleClientConfig`.
OLE library will emit following metrics.

### Encryption Cache Metrics
* OLECacheEncryptionKeyHit - key was found in the cache and used
* OLECacheEncryptionKeyMiss - key was not found in the cache, KMS GenerateDataKey was called to generate the DEK
* OLECacheEncryptionKeyTimeEviction - keys were evicted from cache due to time eviction policy
* OLECacheDecryptionKeyUsageEviction - keys were evicted from cache due to usage policy

### Decryption Cache Metrics
* OLECacheDecryptionKeyHit - key was found in the cache and used
* OLECacheDecryptionKeyMiss - key was not found in the cache, KMS Decrypt call was made to get DEK
* OLECacheDecryptionKeyTimeEviction - keys were evicted from cache due to time eviction policy
* OLECacheDecryptionKeyUsageEviction - keys were evicted from cache due to usage policy

## Release

Below details what validation is required for any release of Twitch OLE. All of
this is done automatically via the [TwitchOLE Pipeline].

### Change Request Validation

CRs (think GitHub Pull Requests) are hosted on GitFarm as the package
[TwitchOLE]. All CRs must:

- Have 80% test coverage.
- Be approved by a member of [twitch-secdev].
- Be approved by a member of [twitch-ole-stakeholders].

### Production Release

All changes to [TwitchOLE] are staged for release to [Twitch/live].
[Twitch/live] represents the live dependencies that the entire Twitch
organization is pulling in. This version is also published immediately to
[Github Enterprise] via the [Fulton GHE AutoSync] on merge to Twitch/live.
Before being merged to Twitch/live, the release must pass the following in the
[TwichOLE Pipeline].

#### Integration Tests

[Integration tests] that test basic integrated functionality must be passed.

#### Regression Tests

[TwitchOLE] has a [regression test cluster] that runs permanently encrypting and
decrypting plaintext behaviors based on **common customer usage patterns**.
TwitchOLE uses these behaviors to build continuous integration validation of
every release to ensure these key customer use cases do not see a regression of
performance.

##### Pattern: Encryption Heavy Payload

A service that primarily encrypts data with low entropy in the [DEKs]
encountered.

- New encryption contexts are never introduced.
- Decryption is checked every 1000 requests.
- There will be heavy hash collision in encryption keys.
- Data keys will expire after 1 minute to validate cache implementation.

##### Pattern: Decryption Heavy Payload w/ Predictable Encryption Actor

A service that primarily decrypts data with low entropy in the [DEKs]. This
means this service mostly decrypts data from small number of encryption actors.

- New encryption contexts are never introduced.
- There will be heavy hash collision in decryption keys.
- Payloads are re-decrypted 1000 times.
- Data keys will expire after 1 minute to validate cache implementation.

##### Pattern: Decryption Heavy Payload w/ Unpredictable Encryption Actor

A service that primarily decrypts data with high entropy in [DEKs]. Typical use
case is a worker processing encrypted payloads from many publishers.

- Max cache size is set to 1000.
- A batch of 1000 encryption contexts are used and randomly chosen.
- All DEKs are pre-warmed before benchmark.
- Payloads are re-decrypted 1000 times.
- Data keys will expire after 1 minute to validate cache implementation.
- This will see cache dropping due max cache size limits over time.

##### Behavior Validation

Metrics from that cluster can be seen on the [TwitchOLEIntegration Dashboard] -
only available via WPA2.

After deployment of a release candidate to this cluster,
that each usage pattern must meet the following criteria after 30 minutes of
stabilization:

- Success RPS is greater than 20K per vCPU average for the past 15 minutes.
- Error RPS is 0 for the past 30 minutes.
- Average CPU utilization is greater than 95%.
- Average memory utilization is less than 5%.

[DEKs]: https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#data-keys
[Github Enterprise]: https://git-aws.internal.justin.tv
[Twitch/live]: https://code.amazon.com/version-sets/Twitch/live
[Integration tests]: https://code.amazon.com/packages/TwitchOLEIntegration/blobs/mainline/--/cmd/TwitchOLEIntegration/twitcholeintegration_integration_test.go
[TwitchOLE Pipeline]: https://pipelines.amazon.com/pipelines/TwitchOLEIntegration
[TwitchOLE]: https://code.amazon.com/packages/TwitchOLE
[TwitchOLEIntegration Dashboard]: https://cw-dashboards.aka.amazon.com/cloudwatch/dashboardInternal?accountId=574099051984&name=TwitchOLEIntegration-beta#dashboards:name=TwitchOLEIntegration-beta
[twitch-secdev]: https://permissions.amazon.com/a/team/twitch-secdev
[twitch-ole-stakeholders]: https://permissions.amazon.com/a/team/twitch-ole-stakeholders
[regression test cluster]: https://code.amazon.com/packages/TwitchOLEIntegration/blobs/c1d7596e91bacae09cb967ecb15918783a57f878/--/cmd/TwitchOLEIntegration/benchmark_command.go
[Fulton GHE AutoSync]: https://docs.fulton.twitch.a2z.com/docs/gitfarm.html#syncing-from-gitfarm-to-github-enterprise
