Studi Perbandingan Analisis: Evaluasi Kinerja Algoritma Klasifikasi pada Dataset Terbatas
Keywords:
classification, small data, multi algorithm, machine learningAbstract
This research aims to evaluate and compare the performance of various classification algorithms under conditions of limited data quantity. Six algorithms were tested, including Naïve Bayes, Support Vector Machine (SVM), Decision Tree, Random Forest, AdaBoost, and k-NN against a small-scale public dataset from different domains. Experiments were conducted using a cross-validation technique (k-fold=5), and evaluation was based on confusion matrix to measure performance in terms of accuracy, precision, recall, F1-score, AUC, and model computation time. The results of the comparison of accuracy, precision, and recall performance show that the Naive Bayes algorithm consistently exhibits optimal performance with a value of 0.896. Timing tests show that the Naive Bayes algorithm demonstrates the fastest time in building the model, while the Random Forest algorithm shows the worst time. The AUC test results indicate that the Naive Bayes algorithm excels, followed by k-NN. Meanwhile, SVM shows the worst AUC performance. Based on the f1-score test, the Random Forest and Naive Bayes algorithms demonstrate the best performance, while the Tree algorithm shows the worst performance. This is because the Naive Bayes algorithm has ease of implementation, speed in calculations, and its ability to work well with large, medium, and limited data, as well as with many features. Each user should choose the algorithm tailored to the data used. In addition, the use of Cross-validation has proven to provide a more reliable performance estimate. These findings offer practical recommendations for researchers and practitioners in selecting effective classification algorithms for small-scale datasets, as well as highlighting the importance of validation techniques and data processing in enhancing model generalization under data limitation conditions


