# D8A Migration

The D8A Migration toolset is a group of experimental tools developed for moving data
 out of SiteDB and into a microservice's satellite database.  This library
 includes both code generation tools and a self-contained replay application.
 
## Usage

D8A Migration operates by marking up your data-backend's interface with code
 generation comments.  The goal is to record all calls to the interface, as well
 as responses and runtimes, then later replay those calls against a different
 instance of the interface.  We can compare runtimes and return values, and
 record stats to graphite. D8A Migration also provides a mirroring tool for
 realtime comparison between two instances of the interface.
 
In order to use D8A Migration, first install D8A Migration & clipperhouse's 
 `gen` library - in your project's directory, enter the following:

```
go get github.com/clipperhouse/gen
go get code.justin.tv/d8a/migration
```

 Once you're done, find your data backend's interface (or create
 an interface to wrap your databackend struct).  Navigate to the folder which
 contains the interface and enter the following:

 ```
 gen add code.justin.tv/d8a/migration/gen
 go install
 ```

This will build your project and attempt to add a new file `_gen.go` to your
 codebase.  This file is necessary to let `gen` know about what generators you're
 interested in, but gen adds two generators by default that we don't care about-
 so open _gen.go and remove both `clipperhouse` packages from the import statement.
 
`go install` is necessary because `gen` uses `go/importer`, which
 is run off of cached binaries, to find information about types which your interface
 uses. 
 
 You'll then want to add the following to your interface:
 
```go
//To add a producer that records calls to the interface:
// +gen migrationproducer
//To add a consumer that replays calls to the interface:
// +gen migrationconsumer
//To mirror calls to a secondary instance in realtime:
// +gen migrationsecondary
//To do all three (only one +gen line is permitted per type):
// +gen migrationproducer migrationconsumer migrationsecondary
type MyBackend interface {
    SomeReadCall() error
    SomeWriteCall(thing int) error
}
```

Depending on what stage of your migration you are on, you can use some, none, or
 all of these.
 
After this is done, run `gen` from the command line to generate the necessary
 code files.  When the interface changes, you'll need to rerun `gen`.  Gen
 doesn't work like `go generate` - it's not intended to be run during build
 and is instead intended to be managed and inspected by the developer directly.

## Provisioning

In order to write and replay to and from a kinesis stream, you'll need to provision
 it in AWS.  This is easy to do with terraform!

```terraform
variable "aws_secret_key" {}
variable "aws_access_key" {}

module "kinesis" {
    source = "git::git+ssh://git-aws.internal.justin.tv/d8a/migration.git//terraform/replay-kinesis.20160607"

    aws_access_key = "${var.aws_access_key}"
    aws_secret_key = "${var.aws_secret_key}"
    environment = "staging"
    stream_name = "test-migration-stream"
}
```

You'll need to use the `stream_name` you chose later, in order to initialize your producer
 and to run the replay process.  You may find it helpful to version the stream name, for instance,
 with a date.  This will allow you to create new streams later if your first attempt at replay fails.

The above code will produce a kinesis stream that is rated for up to 1,000 requests per second.  Your
 service may receive more than that, or if you make multiple calls to your backend interface per request,
 your RPS capacity may be lower than advertised.  In order to handle more throughput, you can add an
 additional field to the kinesis module:

```terraform
variable "aws_secret_key" {}
variable "aws_access_key" {}

module "kinesis" {
    source = "git::git+ssh://git-aws.internal.justin.tv/d8a/migration.git//terraform/replay-kinesis.20160607"

    aws_access_key = "${var.aws_access_key}"
    aws_secret_key = "${var.aws_secret_key}"
    environment = "staging"
    stream_name = "test-migration-stream"

    shard_count = "4"
}
```

The shard count should be *at least* the peak write RPS of your service divided by 5, rounded up.  If
 your backend interface can receive more than one call per request, you should multiply the shard
 count by the average calls per request, rounded up.  Likewise, if your service receives a significant
 amount of traffic that does not make calls to the backend interface, that traffic can be ignored.

The above terraform code will allow you to create a kinesis stream which you can access from your local
 development box.  However, when it comes time to push to production, you'll want to grant access to
 your app boxes so that they can actually write to the stream.  To do this, add an `access_roles` field
 with the roles granted to your app boxes during provisioning:

```terraform
variable "aws_secret_key" {}
variable "aws_access_key" {}

module "kinesis" {
    source = "git::git+ssh://git-aws.internal.justin.tv/d8a/migration.git//terraform/replay-kinesis.20160607"

    aws_access_key = "${var.aws_access_key}"
    aws_secret_key = "${var.aws_secret_key}"
    environment = "staging"
    stream_name = "test-migration-stream"

    add_access_roles = 1
    access_roles = "${aws_iam_role.production.name},${aws_iam_role.canary.name}"
}
```

This will add a policy to the given roles which grants access to your new stream. 

## PartitionKeyBuilder

Kinesis uses a concept of partition keys to break down messages into shards.  For 
 this reason, two interface calls which are assigned the same partition key are
 guaranteed to be replayed FIFO.  Two interface calls with different partition keys
 may be replayed out of order.  In order to allow developers to find a good balance
 between replay speed and replay consistency, consumers of the migration toolset are
 responsible for assigning their own partition keys to individual calls.

You can accomplish this by implementing the PartitionKeyBuilder interface:

```go
type PartitionKeyBuilder interface {
	PartitionKey(method string, params map[string]interface{}, results []interface{}) string
}
```

Using the method being called, parameters, and results, you can generate a partition
 key which meets your requirements.  If any single partition key receives more than a
 few hundred events per second, that partition may act as a bottleneck, preventing timely
 replay of your traffic.  In order to prevent this, make sure that you are partitioning
 intelligently.  On the other hand, if your traffic is light enough (< 1000rps) to only require one
 shard, you should feel free to put all traffic onto a single partition key.

Examples:

```go
type lowThroughputKeyBuilder struct {}

func (self *lowThroughputKeyBuilder) PartitionKey(method string, params map[string]interface{}, results []interface{}) string {
    return "partitionkey"
}

type highThroughputKeyBuilder struct {}

func (self *highThroughputKeyBuilder) PartitionKey(method string, params map[string]interface{}, results []interface{}) string {
    val, ok := params["vodId"]
    if !ok {
        return "no_id"
    }
    id, ok := val.(int)
    if !ok {
        return "no_id"
    }

    return strconv.Itoa(id)
}
```

## ComparePreprocessor

After a method call is mirrored or replayed, the original return values and new return value are both converted
 to json and compared field by field.  Sometimes, return values between the primary and replayer/secondary are
 different in foreseeable ways.  By implementing a ComparePreprocessor, you can modify or eliminate comparisons
 for some fields.

Custom ComparePreprocessors are not usually necessary- you only need to implement one when the default behavior,
 of doing a field-by-field comparison of the serialized json of your return values, will not produce the results
 you want.

Replay and secondary initialization methods all accept a ComparePreprocessor instance.  You can pass in `nil` for
 this value to simply use the field-by-field comparison.  If you'd like to implement a new ComparePreprocessor,
 it's easy.  Here is an example:

```go
Preprocess(methodName string, resultIndex int, fixture *PreprocessFixture) (bool, error) {
    if methodName != "MethodToChange" {
        return true, nil
    }

    if resultIndex != 0 {
        return true, nil
    }

    values := fixture.ExtractValues("ValueToChange")[0]
    if (values.Old == nil) != (values.New == nil) {
        return false, nil
    }

    oldText, ok := pair.Old.(string)
    if !ok {
        return false, fmt.Errorf("Unexpected non-string!")
    }
    newText, ok := pair.New.(string)
    if !ok {
        return false, fmt.Errorf("Unexpected non-string!")
    }

    return (newText[0] == oldText[0]), nil
}
```

The key is the `fixture.ExtractValues` method- it accepts any number of strings,
 forming an xpath-like path to the field you would like to extract.  It returns
 an array of values that match the path.  Each parameter passed to `ExtractValues`
 can be a field, map key, or array index.  In the case of an array, you can use the
 parameter `*` to indicate all indices of an array, or a numeral to indicate a specific
 one:

 ```go
 Preprocess(methodName string, resultIndex int, fixture *PreprocessFixture) (bool, error) {
     if methodName == "SomeMethod" && resultIndex == 0 {
         for _, pair := fixture.ExtractValues("SomeArray","*","TheField") {
             //Do nothing
         }
     }

     return true, nil
 }
 ```

 `ExtractValues` does not just allow you to compare the two values: it also removes the 
  extracted fields from the two json responses, so they will not be compared by the default
  comparison logic.  This allows you to both change the comparison logic, but also eliminate
  some fields from the logic entirely.

## Consuming the Generated Code
 
Once the classes are generated, you can use them to wrap your live backend instance
 to add the migration functionality.  For instance-
 
```go
  realBackend := //Make backend
  environment := "staging"
  partitionKeyBuilder := &highThroughputKeyBuilder{}
  awsProfile := ""
  region := "us-west-2"
  repo := "web/service"
  streamName := //stream name
  realBackend = CreateMyBackendProducer(realBackend, statsdHostPort, environment, repo, partitionKeyBuilder, awsProfile, region, streamName)
```

The code above calls the generated `CreateMyBackendProducer` method to wrap
 the "real" backend with code that records all calls to the backend to kinesis.  The `awsProfile` parameter
 should be either the name of a local aws-cli profile, `"default"` to use the local aws-cli credentials, or
 `""` to use the role of the local machine (this option is best for staging and production boxes in AWS, while
 a specified profile is best for tests run on a developer's local machine).

 The secondary migration function works very similarly:
 
```go
  realBackend = CreateMyBackendSecondary(realBackend, statsdHostPort, environment, repo, map[string]BackendInterface {
      "Secondary1": secondary1,
      "Secondary2": secondary2,
  })
```

The replayer works very differently, because it is supposed to run
 as a dedicated application.  Instead, there is a package `code.justin.tv/d8a/migration/replay`
 which will take care of the complexities of replay, given a consumer.
 
```go
import (
    "code.justin.tv/d8a/migration/replay"
    "log"
)

func main() int {
    realBackend := //Create your backend
    awsProfile := "twitch-web-aws"
    region := "us-west-2"
    repo := "web/service"
    streamName := //Stream name
    consumerGroupName := "test1"
    replayConsumer := CreateMyBackendConsumer(realBackend)
    err := replay.ExecuteReplay(replayConsumer, statsdHostPort, environment, repo, awsProfile, region, streamName, consumerGroupName)
    if err != nil {
        log.Println(err)
        return 1
    }
    return 0
}
```

When replaying, parameters originally sent to the backend are json-serialized and sent again to the
 replay calls.  The one exception is context objects.  A fresh context object is used and context
 objects are not serialized to Kinesis.

The consumerGroupName parameter is an arbitrary key that allows you to same your replay progress
 and resume from the same point in the future.  Changing the code will allow you to restart from the
 beginning of the stream, keeping it the same between runs will allow you to pick up where you left
 off.  At this time, it is not possible for multiple machines using the same consumer group code to
 share load.  Re-provisioning the kinesis stream will also reprovision the dynamodb table where progress
 is stored.

##Diagnosis

D8A Migration writes errors and problems to your log output, but perhaps more
 helpful, D8A maintains a "D8A Migration" grafana dashboard which
 your generated code will write to.  From this dashboard, you can see outbound
 writes from your producers, comparative timings for both your consumers & mirrored
 transactions, serious errors in your secondary (which do not hurt the health
 of your application), replay progress, and other statistics.  Ideally, you'll
 be able to easily assess the status of your migration through this dashboard.  

You can find the dashboard at https://grafana.internal.justin.tv/dashboard/db/d8a-migration
 
##Non-Catastrophic Errors

Some applications use error objects to indicate 400, 422, etc. errors- in these cases,
 you'll likely want to verify that the secondary/replay are receiving the same
 results instead of bailing or logging serious errors when an error is returned
 from the backend interface.
 
In these cases, you can add the `errcheck` tag to the generator directive:

```go
// +gen migrationproducer:"errcheck" migrationconsumer:"errcheck" migrationsecondary:"errcheck"
type MyBackend interface {
    SomeReadCall() error
    SomeWriteCall(int thing) error
}
```

This will modify the generated constructor to accept an `ErrorCheck` object:
 
```go
type ErrorCheck interface {
	SeriousError(err error) bool
}
```

If a backend method returns an error, it will only be treated as a serious failure if
 `SeriousError` returns true.  Otherwise, it will be treated as ordinary data to be
 compared to the replay or secondary value.
 
## Ignoring Non-Data Methods

Sometimes, a backend interface contains several methods that shouldn't be recorded or replayed.
 These can include healthcheck methods or methods that service your service's architecture instead
 of the data backend.  You don't want these methods to be replayed or mirrored.  Instead, they
 should just be passthroughs to the "real" underlying backend.  
 
You can accomplish this by separating the backend interface into a single data interface and one
 or more non-data interfaces, then unioning them together to make up the backend.  Then you can use
 the `backend` tag to specify the data interface.  Only methods on the data interface will be marked
 up with writes & mirrored.  Additionally, the `migrationsecondary`'s constructor method will accept
 instances of the backend interface as secondaries for maximum flexibility. 

```go
type DataBackend interface {
    SomeReadCall() error
    SomeWriteCall(thing int) error
}

type NonDataBackend interface {
    GetName() string
    SetName(newName string) error
    IsHealthy() bool
}

// +gen migrationproducer:"backend[DataBackend]" migrationConsumer:"backend[DataBackend]" migrationsecondary:"errcheck backend[DataBackend]"
type MyBackend interface {
    DataBackend
    NonDataBackend
}
```
