# E2ML Discovery

The design of E2ML requires every message stream to fan out from a single authoritative source to ensure that all messages are develired in their original order.  Client writer and subscription requests must all indirectly access this single source of truth, and in order to make that possible E2ML's [pathfinder](../../services/pathfinder/README.md) cluster includes a system to map each requested stream address to a single instance of the [source](../../services/source/README.md) server.  This system is housed in the `stream/discovery` library.

## Hosts and Available Scopes

All instances of the [audience](audience.md) library register themselves as a [host](../../libs/discovery/broker/pick/host.go) for potential sessions to serve addresses. As part of initialization, each service sends the discovery system a list of available [scopes](address.md) that it can supply. In practice, all of the current servers used by E2ML are designed to serve the wildcard scope, making any service instance eligible for any address.  In the future if E2ML is adapted for more purposes, it would make sense to convert the [source](../../services/source/README.md) server to explicitly serve `ext@1` and `loadtest@1` and assign other scopes to appropriate back ends.

## Sources vs Relays

In addition to reporting the list of supported scopes, hosts indicate their current status using a set of flags.  One important flag is `Source`, which indicates that the server is meant to be the origin for data being served.  When this is set, all data for a stream should eventually flow through that server.  When the flag is not set (e.g. for the [threshold](../../services/threshold/README.md) service) the host will accept connections but is expected to forward requests to an actual data source. There is no limit to the number of allowed relay hops in a data path, and it is possible to register relays and sources for the same data stream in the same load balanced instance -- though care should be taken to not create a cycle of two relays forwarding to each other in an infinite loop.  The current production system uses two discovery services arranged into tiers: [greeter](../../services/greeter/README.md) serves only relays, and those threshold service relays query [pathfinder](../../services/pathfinder/README.md) to resolve to a data source.

## Pick Lists and Handshake

When the discovery system is asked to handle client requests for an address, it consults a dictionary by address to find a `channel` structure that houses a `pick.List` to perform a load weighted random choice from existing options serving that address.  If the channel is found and the list is present, the discovery system will ask the service for a session ticket to verify that it is available.  The ticket is returned to the requestor and later used to form a direction connection to the host so that normal message flow doesn't need to pass through the discovery systems.

If a channel isn't found or its list is empty, the system will look at the [ancestors](address.md) of the address to find a host that claims to be able to serve a scope that includes the new address.  If the source must be a data source, this search takes the form of a [PAXOS election](../../libs/discovery/broker/election) to safely choose a single source across the entire cluster.  After a host is chosen, a request is made for the host to allocate a [history](history.md) for the stream, and on success the host is added to the pick list for future queries.

## Host Lifecycle

When hosts initialize, they will send a signal saying they are available and announcing the top level scopes they are willing to serve.  As part of the graceful shutdown process, hosts update their status with the `Draining` flag -- this removes them from pick list eligiblity and ensures that they don't chosen for new sessions or elected for new addresses.  They should explicitly report when they are done serving addresses during shutdown, but the system also automatically removes them from picklists on connection closure.

# Collisions

If a host drops its connection to the discovery server, it doesn't stop serving existing message sessions. This allows a temporary network interruption to have minimal imapact to collected clients. The discovery cluster, however, detects closure and may elect a new source while the current one is disconnected to minimize the impact of an outright service crash.  If a new election has taken place before the previous host heals its discovery connection, it's possible that a pick list will contain two different sources, which illegally partitions message flow.  This is referred to as a collision in E2ML metrics and code.  When this condition is detected, the discovery system attempts to automatically resolve it by telling redundant hosts to stop serving the address. When this happens, requests to the host to access that address begin returning an `address_moved` error indicating that the discovery system must be consulted to find the updated host.
