# TwitchFICLoadTest
Load testing tool for MLFS Feature Ingestion Client

## Framework

This load test tool uses an AWS Glue job to ingest features into the OFS. It uses testing features that
are registered in the [feature registry](https://code.amazon.com/packages/TwitchVXFeatureRegistry/trees/mainline) and ddb tables to store the feature values

The load job reads entity IDs and feature values from a dataset in S3 and plumbs them through to the DDB tables using 
FIC's `batch_write_features` method. Currently, the `load_job.py` script does not implement ingesting features using FIC's `put_features` method. 

Some observations from the initial load test can be found in this [report](https://docs.google.com/document/d/1iGFCNbRTbYaYv1aU9Nnc91mQZz-kDYqKWy71-bMeXtU/edit?usp=sharing)

## How to run this tool

- `load_job.py` is the script that the Glue job runs. It first reads the feature configurations of the dataset from `feature_config.csv` and then ingests data into the DDB tables (`FICVerificationFeaturesTable`, `FICMultipleEntitiesVerificationFeaturesTable` and `fic_load_test_3`)
    - If you've made changes to the `load_job.py`, make sure to update the code the glue job will run as well
        - `AWS_PROFILE=<profile-name> make upload_script`
    - If you want to run a test with a different set of features, make sure to update the `feature_config.csv` file. The first line of the file are the S3 paths for the dataset. The rest are the features to be ingested. The features are of the format `feature_id@version,data_type,data_shape`
        - The following features are already registered and should be ready to ingest:
            1. test_avg_ccu_7_days@0
            2. test_avg_ccu_7_days@1
            3. fic_loadtest_1_1@0
            4. fic_loadtest_1_2@0
            5. fic_loadtest_1_3@0
            6. fic_loadtest_1_4@0
            7. fic_loadtest_1_5@0
            8. test_prev_5_queries@0
            9. fic_loadtest_2_1@0
            10. fic_loadtest_2_2@0
            11. fic_loadtest_2_3@0
            12. fic_loadtest_3_1@0
    - If you've made changes, make sure to update the feature configuration file in s3
        - `AWS_PROFILE=<profile-name> make upload_feature_config`

- The glue job and associated resources are created using cdk. On updating the cdk by adding / modifying resources, make sure to deploy:
   - `AWS_PROFILE=<profile-name> make deploy_glue_job NumWorkers=<number of glue job workers>`

-  To start a run of the glue job
    - `AWS_PROFILE=<profile-name> make start_glue_load_job` 
   
Finally, you can check the ingestion time on the [AWS console](https://tiny.amazon.com/gc0zecz8/IsenLink) on the execution time tab
   
## Dataset

The dataset exists in the S3 bucket `fic-load-test-data`. The data folder `test-100M_7` holds the dataset for all the above registered features in a hive-like directory structure. 
```
--test-100M_7
  |--test-50M_1
  |   |--test-10M_1
  |   |   |--test-1M_1
  |   |   |   |--test-500k_1
  |   |   |      |--sample_data_0.csv
  |   |   |      |-- sample_data_1.csv
  |   |   |   |--test-500k_2
  |   |   |      |--sample_data_0.csv
  |   |   |      |--sample_data_1.csv
  |   |   |--test-1M_2
  |   |   |   |--test-500k_1
  |   |   |      |-- ..
  |   |   |   |--test-500k_2
  |   |   |      |-- ..
  |   |   |--test-1M_3
  |   |   |   ..
  |   |   |   ..
  |   |   |--test-1M_10
  |   |--test-10M_2
  |   |   ..
  |   |   ..
  |   |--test-10M_5
  |--test-50M_2
  |   |--test-10M_1
  |   |..
  |   .. 
```        
To test out for different loads, modify the first line of the `feature_configs.csv` to a different S3 path in the directory. 
For example, to ingest 20M rows replace it with: `s3://fic-load-test-data/test-100M_7/test-50M_1/test-10M_1/, s3://fic-load-test-data/test-100M_7/test-50M_1/test-10M_2/`. 
This will take the 10M rows from each directory and simulate a load against the feature ingestion client