# Use cases¶

## Introduction¶

Once in a project, you can go to the “use case” page using lateral navigation and start creating new use cases or explore already existing ones.

Regarding the problematic and the data type you have, several training possibilities are available in the platform :

Training type / Data type | Tabular | Timeseries | Images | Definition | Exemple |
---|---|---|---|---|---|

Regression | Yes | Yes | Yes | Prediction of a quantitative feature | 2.39 / 3.98 / 18.39 |

Classification | Yes | No | Yes | Prediction of a binary quantitative feature | « Yes » / « No » |

Multi Classification | Yes | No | Yes | Prediction of a qualitative feature whose cardinality is > 2 | « Victory » / « Defeat » / « Tie game » |

Object Detection | No | No | Yes | Detection from 1 to n objects per image + location | Is there a car in this image ? If so, where ? |

Text Similarity | Yes | No | No | Estimate the similarity degree between two text.Find texts that are similar in context and meaning with your queries | « a tool for screws » should lead to a a screwdriver description |

Then, for each data type, you will have to choose between several usecase types demanding a specific configuration for each.

## Create a new usecase¶

In order to create a new usecase using the interface, three possibilities are available :

- In the usecase menu by clicking on the “new usecase” button top right of the screen
- By clicking the actions button of the dataset list and clicking on the “create usecase” button
- On a dataset page by clicking on the “actions” button and select “create usecase” on the menu

Then you will land on the new usecase page and will have to choose the datatype and the training type regarding your problem.

As training types requires specific configuration, all information needed to start the training of a usecase will be explain on each training type dedicated chapters

## versioning of a usecase¶

In the prevision.IO platform you can create multiple versions of one usecase allowing you to search for optimal performance training and, deploy and switch any model from any version of the same usecase.

In order to do that, several possibilities :

- From the usecase list, by clicking on the “action button” of an entry and selecting “new version”
- one a usecase page, by clicking on the “action” button and selecting “new version”
- On the version menu from a usecase and selecting “new version” in the list action button

Then, you will be redirected to the “new usecase” page but with limited option. First of all, you can not change the datatype and training type between version

## duplication of a usecase¶

In order to duplicate a usecase, there is two options :

- by using the action button right side of the usecase list
- by using the “action button” on top right of any usecase page and select “duplicate usecase”

By doing this, the new usecase screen will appear keeping the duplicated usecase configuration.

## models pages¶

Each model page is specific to the datatype/training type you choose for the usecase training. Screens and functionality for each training type will be explained in the following sections. You can access a model page by two ways :

- by clicking on a graph entry from the general usecase page
- by clicking on a list entry from the models top navigation bar entry

Then you will land on the selected model page splitted in different parts regarding the training type.

### tabular usecases - general information¶

For each kind of tabular training type, the model general information will be displayed on the top of the screen. Three sections will be available.

- Model information : information about the trained model such as the selected metric and the model score
- Hyperparameters : downloadable list of hyperparameters applied on this model during the training
- Selected feature engineerings (for regression, classification & multi-classification) : features engineerings applied during the training
- Preprocessing (for text similarity usecases) : list of pre-processing applied on textual features

Please note that for following usecases types, the general information parts is different than from others :

- Image detection usecases : no feature engineering
- text similarity usecases : preprocessing are displayed instead of feature engineering

### Model page - Graphical analysis¶

In order to better understand the selected model, several graphical analyses are displayed on a model page. Depending on the nature of the usecase, the displayed graphs change. Here an overview of displayed analysis depending on the usecase type.

Tabular regression | Tabular classification | Tabular multi-classification | Tabular text similarity | Time series regression | Image regression | Image classification | Image multi-classification | Image detection | |
---|---|---|---|---|---|---|---|---|---|

Scatter plot graph | Yes | No | No | No | Yes | Yes | No | No | No |

Residual errors distribution | Yes | No | No | No | Yes | Yes | No | No | No |

Score table (textual) | Yes | No | No | No | Yes | Yes | No | No | No |

Residual errors distribution | No | No | No | No | No | No | No | No | No |

Score table (overall) | No | No | Yes | No | No | No | No | Yes | No |

Cost matrix | No | Yes | No | No | No | No | Yes | No | No |

Density chart | No | Yes | No | No | No | No | Yes | No | No |

Confusion matrix | No | Yes | Yes | No | No | No | Yes | Yes | No |

Score table (by class) | No | Yes | Yes | No | No | No | Yes | Yes | No |

Gain chart | No | Yes | No | No | No | No | Yes | No | No |

Decision chart | No | Yes | No | No | No | No | Yes | No | No |

lift per bin | No | Yes | No | No | No | No | Yes | No | No |

Cumulated lift | No | Yes | No | No | No | No | Yes | No | No |

ROC curve | No | Yes | Yes | No | No | No | Yes | Yes | No |

Accuracy VS K results | No | No | No | Yes | No | No | No | No | No |

### Model page - graphs explanation¶

Then the feature graphs will be displayed (not for text similarity) allowing you to see the influence of features for the selected model. Two graphs are accessible through the two features tabs :

- Feature importance : graph showing you the importance of the dataset features. By clicking on the chart, you will be redirected to the dedicated feature page.
- Feature engineering importance : showing you the importance of selected feature engineering.

Please note that the feature importance graph also takes into account the feature engineering importance. For example, if a feature n°1 has not so much influence by itself regarding the model but, after feature engineering has a great influence, it will be represented on the feature importance graph.

- Scatter plot graph : This graph illustrates the actual values versus the values predicted by the model. A powerful model gathers the point cloud around the orange line.

- Residual errors distribution : This graph illustrates the dispersion of errors, i.e. residuals. A successful model displays centered and symmetric residues around 0.

- Score table (textual) : Among the displayed metrics, we have:
- The mean square error (MSE)
- The root of the mean square error (RMSE)
- The mean absolute error (MAE)
- The coefficient of determination (R2)
- The mean absolute percentage error (MAPE)

Please note that you can download every graph displayed in the interface by clicking on the top right button of each graph and selecting the format you want.

- Slider : For a binary classification, some graphs and scores may vary according to a probability threshold in relation to which the upper values are considered positive and the lower values negative. This is the case for:
- The scores
- The confusion matrix
- The cost matrix

Thus, you can define the optimal threshold according to your preferences. By default, the threshold corresponds to the one that minimizes the F1-Score. Should you change the position of the threshold, you can click on the « back to optimal » link to position the cursor back to the probability that maximizes the F1-Score.

- Cost matrix : Provided that you can quantify the gains or losses associated with true positives, false positives, false negatives, and true negatives, the cost matrix works as an estimator of the average gain for a prediction made by your classifier. In the case explained below, each prediction yields an average of €2.83.

The matrix is initiated with default values that can be freely modified.

- Density chart : The density graph allows you to understand the density of positives and negatives among the predictions. The more efficient your classifier is, the more the 2 density curves are disjointed and centered around 0 and 1.

- Confusion matrix : The confusion matrix helps to understand the distribution of true positives, false positives, true negatives and false negatives according to the probability threshold. The boxes in the matrix are darker for large quantities and lighter for small quantities.

Ideally, most classified individuals should be located on the diagonal of your matrix.

- Score table (graphical) : Among the displayed metrics, we have:
- Accuracy: The sum of true positives and true negatives divided by the number of individuals
- F1-Score: Harmonic mean of the precision and the recall
- Precision: True positives divided by the sum of positives
- Recall: True positives divided by the sum of true positives and false negatives

- Gain chart : The gain graph allows you to quickly visualize the optimal threshold to select in order to maximise the gain as defined in the cost matrix.

- Decision chart : The decision graph allows you to quickly visualize all the proposed metrics, regardless of the probability threshold. Thus, one can visualize at what point the maximum of each metric is reached, making it possible for one to choose its selection threshold.

It should be noted that the discontinuous line curve illustrates the expected gain by prediction. It is therefore totally linked to the cost matrix and will be updated if you change the gain of one of the 4 possible cases in the matrix.

- lift per bin : The predictions are sorted in descending order and the lift of each decile (bin) is indicated in the graph. Example: A lift of 4 means that there are 4 times more positives in the considered decile than on average in the population.

The orange horizontal line shows a lift at 1.

- Cumulated lift : The objective of this curve is to measure what proportion of the positives can be achieved by targeting only a subsample of the population. It therefore illustrates the proportion of positives according to the proportion of the selected sub-population.

A diagonal line (orange) illustrates a random pattern (= x % of the positives are obtained by randomly drawing x % of the population). A segmented line (blue) illustrates a perfect model (= 100% of positives are obtained by targeting only the population’s positive rate).

- ROC curve : The ROC curve illustrates the overall performance of the classifier (more info: https://en.wikipedia.org/wiki/Receiver_operating_characteristic). The more the curve appears linear, the closer the quality of the classifier is to a random process. The more the curve tends towards the upper left side, the closer the quality of your classifier is to perfection.

- Accuracy VS K results : this graph shows the evolution of accuracy and MRR for several value of K results