RapidMiner Classification (Part 5): Cross Validation

Поделиться
HTML-код
  • Опубликовано: 9 сен 2024
  • In this lesson on classification, we introduce the cross-validation method of model evaluation in RapidMiner Studio. Cross-validation ensures a much more realistic view of the model performance. This is achieved by testing the model k times and each time the available data is split into k parts or folds, where k-1 folds are then used for model training and the remaining 1 fold is used for its validation. Subsequently the model average performance is returned. The video also mentions the Leave-One-Out method of validation where a single observation is used for testing and the rest of data is used to build the model, which is not suitable for large data sets, such as 3000 workers compensation claims used in this case of predicting the possibility of claim subrogation, i.e. recovery of insurance payout due to the claim irregularities. At the end of the lessons we explore different ways of improving the model accuracy, first by varying the k-NN model parameters (the number of neighbors "k") and then by replacing the k-NN model with the Gradient Boosted Trees.
    This video is best to watch as part of the series on classification:
    * • RapidMiner Classificat...
    The data for this lesson appeared in a number of tutorials for text mining. However, in this video it will be used to predict various aspects of workers compensation claims based on structured variables only. The data for the video can be obtained from:
    * jacobcybulski.c...
    * jacobcybulski.c...
    The original source of the data does not seem to be available online anymore.
    Videos in data analytics and data visualization by Jacob Cybulski, jacobcybulski.com.

Комментарии • 8