# Toloka-kit usage examples
## _Data collection, markup, aggregation, and other applications_

Why it may be usefull:
- Easily reuse projects by just copying and pasting code. No need to configure parameters in the interface over and over again.
- Train your ML models and run your data labeling projects in the same environment.
- Take advantage of open-source code that anyone can use and contribute to.

## Table of content

| Example | Abstract | Key words |
| ------ | ------ | ------ |
| [Learn the basics](https://github.com/Toloka/toloka-kit/tree/main/examples/0.getting_started/0.learn_the_basics) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/0.getting_started/0.learn_the_basics/learn_the_basics.ipynb) | The very first example explains the basics of working with Toloka and toloka-kit. Everything is explained by the example of the project on the classification of cats and dogs. |```Getting Started```,  ```Classification```|
| | **Computer Vision** |
| [Image collection](https://github.com/Toloka/toloka-kit/tree/main/examples/1.computer_vision/image_collection) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/1.computer_vision/image_collection/image_collection.ipynb) | The goal for this project is to collect a dataset  of dogs' and cats' images. Performers will be asked to take a photo of their pet and specify its species. |```CV```,  ```Classification```, ```Collecting```, ```Dataset```|
| [Image classification](https://github.com/Toloka/toloka-kit/tree/main/examples/1.computer_vision/image_classification) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/1.computer_vision/image_classification/image_classification.ipynb) | An example of binary image classification, made on a dataset with cats and dogs. We ask performers to look at the pictures and decide what animal is in the picture. |```CV```,  ```Classification```|
| [Object detection](https://github.com/Toloka/toloka-kit/tree/main/examples/1.computer_vision/object_detection) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/1.computer_vision/object_detection/object_detection.ipynb) | Example of solving the classic problem of annotating images for training detection algorithms. In real-world tasks, annotation is usually done with a polygon. We chose to use a rectangular outline to simplify the task so that we can reduce costs and speed things up. |```CV```, ```Segmentation```,  ```Detection```, ```Bounding boxes```, ```Street```, ```Traffic sign```, ```Verification Project```|
| [HTR image gathering](https://github.com/tardis-forever/Handwriting-gathering-with-Toloka) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/tardis-forever/Handwriting-gathering-with-Toloka/blob/main/handwriting-gathering.ipynb) | This is an example of simple handwriting images gathering pipeline. Resulting dataset can be used to train and evaluate HTR models. | ```CV```, ```HTR```,  ```Texts```, ```Verification project```, ```Collecting```, ```Dataset```|
| [Blood cells classification](https://github.com/oleg-cat/blood-test/blob/main/blood-test.ipynb) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/oleg-cat/blood-test/blob/main/blood-test.ipynb) | In this project, we will show an image of a blood cell and a brief instruction for Toloka performers. Then, we will ask performers to choose which type of white blood cell they see on this image. | ```CV```,  ```Classification```, ```Medicine```, ```Benchmark```|
| [Video collection](https://github.com/Toloka/toloka-kit/tree/main/examples/1.computer_vision/video_collection) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/1.computer_vision/video_collection/video_collection.ipynb) | The goal is to collect a set of video recordings where people show certain gestures, similar to popular emojis. There are several emoji combinations and we ask Tolokers to record a video similar to those emojis, meeting certain criteria about recording quality. |```CV```,  ```Video```, ```Collecting```, ```Dataset```|
| [Text Recognition](https://github.com/Toloka/toloka-kit/tree/main/examples/1.computer_vision/text_recognition) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/1.computer_vision/text_recognition/text_recognition.ipynb) | We have a set of water meter images. We need to get each water meter’s readings. We ask performers to look at the images and write down the digits on each water meter. |```CV```,  ```OCR```|
| | **NLP** |
| [Text classification](https://github.com/Toloka/toloka-kit/tree/main/examples/5.nlp/text_classification) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/5.nlp/text_classification/text_classification.ipynb) | We have a set of news article headlines. We need to get these classified according to whether they are clickbait or not. | ```NLP```, ```Classification```, ```Texts```|
| [Questing answering on SQuAD](https://github.com/Toloka/toloka-kit/tree/main/examples/SQUAD2.0) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/SQUAD2.0/SQUAD2.0_processing.ipynb) | Solving the problem of question answering on SQUAD2.0 dataset. Collects and validates answers for questions by human performers. One of the most popular tasks in natural language processing. | ```NLP```, ```Questing Answering```, ```Texts```, ```Benchmark```, ```Verification Project```|
| [Sentiment analysis](https://github.com/Toloka/toloka-kit/tree/main/examples/5.nlp/sentiment_analysis) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/5.nlp/sentiment_analysis/sentiment_analysis.ipynb) | We have a set of customer reviews, and we need to classify them as “Positive” or “Negative”. We ask performers to read a review and decide which category it belongs to. | ```NLP```, ```Classification```, ```Text```|
| [Intent classification](https://github.com/Toloka/toloka-kit/tree/main/examples/5.nlp/intent_classification) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/5.nlp/intent_classification/intent_classification.ipynb) | We need to define which class the search query belongs to and distribute the queries between several categories inside the class. There’s a list of queries (related to travel and dining), each with an unknown class and category. | ```NLP```, ```Intent```, ```Classification```, ```Texts```|
| | **Audio analysis** |
| [Audio collection](https://github.com/Toloka/toloka-kit/tree/main/examples/3.audio_analysis/audio_collection) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/3.audio_analysis/audio_collection/audio_collection.ipynb) | We have a set of texts, and we need to get voice recordings of these texts. We ask performers to read the texts aloud and record themselves. Recordings like these are used for training voice assistants. |```ASR```, ```TTS```,  ```Collecting```, ```Dataset```|
| [Audio classification](https://github.com/Toloka/toloka-kit/tree/main/examples/3.audio_analysis/audio_classification) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/3.audio_analysis/audio_classification/audio_classification.ipynb) | We have a set of voice recordings from different people. We need to get these classified according to the speaker’s gender. We ask performers to listen to the recordings and decide whether it is a man or a woman speaking. |```ASR```, ```TTS```,  ```Classification```|
| [ASR/TTS based on Wikipedia articles](https://github.com/noath/asr-datasets-pipeline) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/noath/asr-datasets-pipeline/blob/main/ASR_pipeline.ipynb) | This example contains full speech data collecting pipeline from extracting raw texts to labeling and validating speech records. | ```ASR```,  ```TTS```, ```Texts```, ```Verification project```, ```Audio samples collection```|
| [Audio transcription](https://github.com/Toloka/toloka-kit/tree/main/examples/3.audio_analysis/audio_transcription) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/3.audio_analysis/audio_transcription/audio_transcription.ipynb) | We have a set of audio recordings. We need to obtain a transcription of each recording. We ask performers to listen to the recordings and type what they hear. |```ASR```, ```Transcription```,  ```Pipline```, ``` Post-acceptance```|
| | **Ranking** |
| [Side-by-side image comparision](https://github.com/Toloka/toloka-kit/tree/main/examples/4.ranking/side_by_side_image_comparision) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/4.ranking/side_by_side_image_comparision/side_by_side_comparision.ipynb) | We have a set of 6 icons. We need to find out which icon people prefer and determine the top icon out of the set. We show performers two icons each and ask them to choose the one they prefer. Then we aggregate these results to obtain the top icon. |```Ranking```,  ```Side-by-side```|
| | **Spatial Crowdsourcing** |
| [Simplest Spatial Crowdsourcing](https://github.com/Toloka/toloka-kit/tree/main/examples/2.spatial_crowdsourcing/0.simplest_example) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/2.spatial_crowdsourcing/0.simplest_example/spatial_crowdsourcing.ipynb) | In this example, we will collect pictures of the Moscow metro entrances. This example also can be reused for production tasks such as monitoring the state of objects, checking the presence of an organization or other physical object. |```Spatial Crowdsourcing```, ```Outdoor monitoring```, ```Collecting```|
| | **Survey** |
| [Simplest survey](https://github.com/Toloka/toloka-kit/tree/main/examples/7.survey/simplest_survey) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/7.survey/simplest_survey/simplest_survey.ipynb) | The goal is to collect some information about how people manage stress and if they are ready to get a meditation app to do that. There is a survey where we ask some questions about stress level and management, meditation practices and users' habits concerning paid apps. |```Survey```,  ```Collecting```|
| | **Pipelines** |
| [Simple Toloka+ML pipeline on Prefect](https://github.com/Toloka/toloka-kit/tree/main/examples/9.toloka_and_ml_on_prefect) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/9.toloka_and_ml_on_prefect/example.ipynb) | This example illustrates how crowdsourcing using Toloka can be made easier and cheaper by integrating an ML model. Furthermore, it shows how to run the whole project in the cloud using Prefect, which makes workflow orchestration much simpler. |```Prefect```, ```ML```, ```Autohelper```|
| [Building streaming pipelines in Toloka](https://github.com/Toloka/toloka-kit/tree/main/examples/6.streaming_pipelines) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/6.streaming_pipelines/streaming_pipelines.ipynb) | Let's solve the following task: find the goods in the online-store by given image and aggange found results by relevance. In this example we unite 3 different Toloka projects into one useful Pipeline. |```Pipeline```, ```Collecting```, ```Dataset```|
| | **Relevance** |
| [Search relevance](https://github.com/Toloka/toloka-kit/tree/main/examples/8.search_relevance) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/8.search_relevance/search_relevance.ipynb) | We have a set of search queries and products on a website. We need to determine the extent to which each query is relevant to the corresponding product on the website. We ask performers to look at the search query and the product image from the website and rate the relevance level. |```Relevance```|
| [Ad relevance](https://github.com/oleg-cat/checkadv/blob/main/checkadv.ipynb) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/oleg-cat/checkadv/blob/main/checkadv.ipynb) | In this example we aim to explore webpages containing ads and their descriptions. We will run the project using new Toloka Ready-to-go solutions (App Services). |```Relevance```|
| | **Benchmarks** |
| [Image classification](https://github.com/Toloka/toloka-kit/tree/main/examples/benchmarks/image_classification_cinic10.ipynb) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/benchmarks/image_classification_cinic10.ipynb) | Image classification on CINIC-10. Minimal configuration to achieve the described levels of quality. Accuracy on Test = 88% |```Benchmark```, ```CV```,  ```Classification```|
| [Text classification](https://github.com/Toloka/toloka-kit/tree/main/examples/benchmarks/text_classification_imdb.ipynb) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/benchmarks/text_classification_imdb.ipynb) | Text classification on IMDB movie reviews. Minimal configuration to achieve the described levels of quality. Accuracy on Test = 89% |```Benchmark```, ```NLP```,  ```Classification```|
| | **Metrics** |
| [Jupyter dashboard](https://github.com/Toloka/toloka-kit/tree/main/examples/metrics/jupyter_dashboard.ipynb) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/metrics/jupyter_dashboard.ipynb) | An example of using jupyter dashboard to collect and display Toloka metrics inside jupyter notebook. | ```Metrics```, ```Visualization```
| [Graphite](https://github.com/Toloka/toloka-kit/tree/main/examples/metrics/graphite.ipynb) <br/><br/> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Toloka/toloka-kit/blob/main/examples/metrics/graphite.ipynb) | `MetricCollector` usage example. In this notebook you will learn how to collect Toloka metrics and send them to Graphite metrics server simultaneously. | ```Metrics```, ```Logging```, ```Graphite```


# Need more examples?
If you have an example of data labeling using toloka-kit, do not hesitate to send it. Add a link to your GitHub repository and a description to this table via a [pool request](https://github.com/Toloka/toloka-kit/pulls).

Ideally, a great example should contain the following aspects:
- Problem statement;
- How to set up a project;
- Where to get the data for the example;
- What to pay attention to when writing instructions;
- How to set up quality control;
- What is the final quality;
- Visualization of the obtained results;

You may also ask any question or ask for a new example using [issues](https://github.com/Toloka/toloka-kit/issues)
