Comparison of Information Gain and Chi-Square Selection Features For Performance Improvement of Naive Bayes Algorithm On Determining Students With No PIP Recipients at SMKN 1 Brebes

  • Magus Sarasnomo Master of Informatics Engineering, Faculty of Computer Science, Dian Nuswantoro University, Semarang, Indonesia
  • Muljono Muljono Master of Informatics Engineering, Faculty of Computer Science, Dian Nuswantoro University, Semarang, Indonesia
  • M. Arief Soeleman Master of Informatics Engineering, Faculty of Computer Science, Dian Nuswantoro University, Semarang, Indonesia
Keywords: Information Gain; Chi Square; Algoritma Naïve Bayes; PIP

Abstract

All policies of the Smart Indonesia Program (PIP) through the form of the Smart Indonesia Card (KIP) are issued by the government under the auspices of the Ministry of Education and Culture (Kemendikbud) through the National Team for the Acceleration of Poverty Reduction (TNP2K). Helping to alleviate the poor category of students in order to obtain a proper education, prevent children dropping out of school, and fulfill their school needs are the goals of the program. This assistance can be used by students to meet all school needs such as transportation costs to go to school, the cost of buying school supplies, and school pocket money. This study aims to compare the Information Gain and Chi-Square selection features to improve the performance of the Naive Bayes algorithm in determining poor students who are recipients of the Smart Indonesia Program (PIP) at SMKN 1 Brebes, to determine the accuracy of the Naive Bayes, Information Gain and Chi-Square algorithms. and compare the level of accuracy and determine the attributes that affect the accuracy. At this stage, collecting relevant and useful research data, which is collected in the form of literature and data, and processed as research material. Sources of data used in this study in the form of primary data collection and secondary data. The primary data collection technique used in this study was a questionnaire or questionnaire, while the secondary data obtained in this study was through document files. At this stage, preliminary data processing is carried out, the data used is student data of SMKN 1 Brebes in 2021. The initial data collection obtained was 703 data, but not all records were used because they had to go through several stages of initial data processing (data preparation). The results of the Naive Bayes algorithm accuracy of 90.31% with an AUC of 0.967, after the addition of the Information Gain selection feature the accuracy becomes 90.88% with an AUC value of 0.970. The addition of the Information Gain selection feature can help improve the classification performance of the Naive Bayes algorithm even though the accuracy is not maximized. The accuracy of the Naive Bayes algorithm is 90.31% with an AUC of 0.967, after the addition of the Chi-Square selection feature the accuracy becomes 90.88% with an AUC value of 0.970. The accuracy results are not maximized but the addition of the Chi-Square selection feature can also improve the classification performance of the Naive Bayes algorithm. The accuracy of the Naive Bayes algorithm is 90.31% with an AUC of 0.967, after the addition of the Information Gain selection feature and the Chi-Square selection feature the accuracy becomes 90.88% with an AUC value of 0.970. The results of the same accuracy in the use of the Information Gain and Chi-Square selection features to increase the performance of the Naive Bayes algorithm by 0.57% although the accuracy results are still less than optimal.

Downloads

Download data is not yet available.
Published
2022-04-05