Prediction Model for Tax Assessments Using Data Mining | Anthony Sampa | Talks | Data Mining Series

Поделиться
HTML-код
  • Опубликовано: 26 май 2024
  • Anthony Willa Sampa gave a talk titled "Developement of a Prediction Model for Tax Assessments Using Data Mining and Machine Learning Tools" on April 15, 2024, as part of his Master of Science in Computer Science public defence.
    Anthony Willa Sampa [1] was a Master of Science in Computer Science student at The University of Zambia [2] and was supervised by Assoc Prof. Jackson Phrii [3].
    Video Timeline
    00:00:00 Presentation Overview
    Title of Talk
    Developement of a Prediction Model for Tax Assessments Using Data Mining and Machine Learning Tools
    Abstract
    Tax administration remains an integral part of a country’s economic growth. Most tax administrations across the world face similar challenges in the tax collection process, the most common of which is the compliance. It is therefore important to be able to detect revenue leakages as much as possible in order to increase overall collection. In this research we reviewed the tax audit and assessment process that attempts to detect revenue leakages due to under declarations, fraud and declaration errors. We developed a machine-learning model using supervised learning to detect declaration audit and assessment selections that are likely to lead to a significantly high revenue collection arising from these leakages. Due to the large volumes of audit cases generated by audit selection methods, some of which yield very little collections after the audits, it is important to intelligently separate cases that are likely lead to smaller insignificant revenue collection from the cases that yield significant revenue in comparison with resources used to perform audits and assessments. We developed three models using the Random Forest, AdaBoost and Support Vector Machine algorithms using training and test data extracted from the tax administration system. The models were evaluated in order to find the model that performed best using four evaluation techniques. The score method library, confusion matrix, ROC curve and the logarithmic loss were used for the model evaluations. From the score method evaluation, the results showed that the Random Forest generated model produced the highest score of 0.835975 followed by AdaBoost which produced a score of 0.829875. Support Vector Machine performed the least with a score of 0.81555. The Confusion matrix for the Random Forest model produced the highest score with an overall score of 4.96 while SVM and AdaBoost each scored 4.91. The AUC score of the ROC curve for random forest produced 0.914593597161 while AdaBoost and SVM produced 0.908772518228 and 0.89976136873 respectively. The Logarithmic Loss from the Random Forest model produced 5.667887808845081 while AdaBoost produced a result of 5.856129877308906. The Support Vector Machine model produced 6.42512989166967. Results from the Logarithmic Loss showed that Random forest performed better than the other two models followed by AdaBoost. The results from the experiments showed that the RF model would adequately help the revenue authority effectively increase revenue collections with the same amount of resources.
    [1] / anthony.sampa
    [2] unza.zm
    [3] scholar.google.co.za/citation...

Комментарии •