Syntax Literate: Jurnal Ilmiah Indonesia pï¿½ISSN: 2541-0849

e-ISSN: 2548-1398

Vol. 7, No. 3, Maret 2022

GRADIENT ANALYSIS IN IMPLEMENTATION OF B-TREE INDEXING IN REPORTING ANNUAL TAX DATABASE

Samidi, Shofinurdin, Andra Setiadi, Danar Darmawan, Dika Andharu

Universitas Budi Luhur, Indonesia

Email: [email protected], [email protected], [email protected], [email protected], [email protected]

Abstract

A query is a syntax or command used in a database system to access and display data. Queries can be used to make data interact with each other. To display query results in the database, of course, requires execution time which is usually denoted in seconds. Execution time is directly proportional to the amount of data to be displayed and the level of complexity of the database. To speed up query execution time, the term database optimization is known. One of the database optimization methods is to use the b-tree indexing technique the database. This study aims to compare the execution time of databases that have not been indexed or that have been indexed with data objects in the MySQL database MPN-Info application at the Jakarta Palmerah Tax Office (KPP Pratama Jakarta Palmerah). This application was developed to supervise taxpayers which is used in almost all tax service offices throughout Indonesia. The data used is active employee status taxpayer data registered until 2020 with the reporting of Annual Personal Income Tax Returns in 2019 and 2020. The study uses an experimental method by applying a select query, then the execution time is recorded well for unindexed and indexed databases. The gradient method will be used for compare the results of the two. The results obtained are after taking the population of taxpayers registered employees up to year In 2020 at KPP Pratama Jakarta Palmerah, 31,238 data were compared with the annual reporting data, the average execution time before indexing was 255.585 seconds and the average time after indexing was 1.341 seconds and the gradient value before indexing was 0.0211 and after indexing was 0.0001. This proves that the indexing technique has a significant impact in accelerating the query execution process.

Keywords: queries; mysql; b-tree indexing; gradient; database optimization

Introduction

Today there are many applications that use databases, either free or paid, such as MySQL, Postgree SQL, Oracles and others. One of the functions of the database is to group data and simplify the process of identifying data. The database will display data according to user requests with a fast process using Database Management Systems (DBMS) software. Another function is to facilitate access, edit, add, delete and store data.

Among the applications that use the database is the MPN-info (State Revenue Module) application. MPN-Info is an application developed by Ichdyan Thalasa and Yogi Iskandar and used by Account Representative (Tax Supervisor) in supervising taxpayers. This application is used in almost all tax service offices throughout Indonesia. The desktop-based MPN-info application is deployed using C++ language while the database uses MySQL.

Taxpayer supervision is carried out, among others, by supervising taxpayer reporting compliance. This reporting compliance data can be retrieved from the mpn-info database by executing a select query that pairs the masterfile table and the annual_report table. When tested on the query turns out to take a long time. To retrieve a data value of 100 records takes 18.5 seconds and for 30,000 data records it takes 650 seconds. This causes dissatisfaction and the question arises whether the execution time can be accelerated or not.

Some theories state that using an index, namely giving an identifier to the data in the database can streamline the search process, including (Elmasri & Navathe, 2010) explains that index used in the database can speed up data retrieval when running queries. Balasubramanian et al (Balasubramanian & Sabharwal, 2013) also argue that information retrieval can be made more efficient by using indexes to provide quick access to databases. Furthermore, Guzun et al (Guzun & Canahuate, 2016) suggested that to support query efficient mechanism is needed right indexing.

In the indexing technique there are also several methods that can be used, one of which is the B-Tree method. B-Tree is a one-dimensional indexing method and is a nested hierarchical indexing method for data access from peripheral data stores. B-Tree is a tree data structure where each leaf has the same height. B-Tree was first created by Rudolf Bayer and Ed McCreight in 1972. B-Tree was made possible to store a lot of data in one node, the number of subtrees can also be very large. For this reason, B-Tree is very suitable for use in managing data on disk (Mushofan, 2014).

Several studies that have discussed the use of indexing include: Putra et al (Putra, Darwiyanto, & Gozali, 2015) who implemented B-Tree to perform full text indexing on electronic documents. In his research, Putra et al conducted indexing of text documents. Then Dongoran et al (Dongoran, Saleh, & Gozali, 2015) also conducted research related to indexing. In their research, Dongoran et al discusses the application of indexing to graph databases. Graph database itself is a database representation using a different graph with a general relational database. Next Huda Ayesh Mashaan Alrashidi (Alrashidi & Farhan, 2011) on his research six index techniques, namely B-tree, reverse, organization, clustered, non-clustered and bitmap on Oracle and MS SQL Server. Then Ammar et al (A. Ammar, M. Zainuri Sarringan, S. A-mostafa, A. Mustapha, 2020) in their research discussed the differences in the application of indexing with the B-Tree and Hash Map methods using the MySQL and PostgreSQL platforms. Aminudin et al (Mostafa, 2020) also conducts research related to indexing. In their research, Aminudin et al implements B-Tree indexing for PostgreSQL databases. Amminudin et al recommend further investigation such as improved algorithm of existing B-Tree where B-Tree tends to have longer time against large amount of data.

The studies mentioned above have implemented indexing techniques but none have compared the slope/ gradient value to query execution time with different data both before indexing and after indexing. As for this study, besides comparing the difference in the average execution time before and after indexing with the B-Tree indexing technique, it also analyzes the value of the slope/trend of the relationship between the amount of data and its execution time using gradient analysis.

Gradient analysis is a method for comparing trend patterns from several locations by observing the level of slope of the line connecting the two variables. The gradient or direction coefficient (m) is a constant that indicates the level of slope of a line (Handajani, 2009). Look at the following picture:

Figure 1

Gradient

The commonly used mathematical formula is y = mx + c, where m, c ∈ R, c are constants (intercepts), where m represents the gradient (slope) of the coefficient of the straight line.

The purpose of this study is to prove the database indexing theory by comparing the query execution speed in the mpn-info database before indexing with after indexing, and to see the difference in the slope/trend of execution speed for querying data with different amounts using the gradient analysis method.

Research Method

This study uses an experimental method by applying a select query to the MPN-Info application database and then recording the execution time for both unindexed and deindexed databases for later analysis.

The scope of this research is to retrieve data from the MySQL database in the MPN-Info application. The query used is a query to find taxpayer data on the status of active employees registered in 2020 and earlier with annual SPT reporting 2019 and 2020, the query is run to generate results record 100, 500, 1000, 5000, 10000, 15000, 20000, 25000 and 30000 data, the index technique used is the index with B-Tree, the method used to compare the level of slope is the gradient analysis method with the mathematical formula y = mx + c.

This research was conducted using a macbook pro mid 2012 laptop with specifications for Intel Core i5-3210M CPU @ 2.50GHz, 16 GB RAM, Windows 10 Pro 64 bit OS, while the tools used are SQLYog Professional for GUI databases, Microsoft Excel for visualization, Jupyter notebook for visualization and modelling. The steps taken in this study after understanding the problem and literature review are data collection, query creation, query result analysis before indexing, indexing, query analysis after indexing and gradient analysis.

Figure 1

Research Framework

Discussion

A. Data Collection

This study uses the MySQL database MPN-Info application at KPP Pratama Jakarta Palmerah (Jakarta Palmerah Tax Office). Data retrieval is taken by backuping manually using the SQLYog application.

B. Query Creation

The selection query to retrieve individual taxpayer data on employee status registered up to 2020 and annual tax returns for the 2019 and 2020 tax years is by juxtaposing the taxpayer masterfile table and the annual report transaction table.

Figure 3

Query Selection

Furthermore, from the query execution, the results can be seen in the following figure:

Figure 4

Query Execution Results

C. Analysis of Query Results before indexing

Sequentially by executing queries for databases that have not applied indexing to different amounts of data, the execution time can be seen in the graphic image as follows:

Figure 5

Graph of Query Results Before Indexing

Based on the graphic image, we can see that the more data displayed, the more query execution time is required.

D. Giving Index

Giving an index on the masterfile table with the name tax_id_idx_masterfile with the command:

Giving an index to the annual_report table with the name tax_id_idx_report with the command:

E. Query Analysis After Indexing

After indexing the database, the query execution is carried out and the results can be seen in the graphic image:

Figure 6

Graph of Query Results After Indexing

Based on Figure 6, we can see that the more data displayed, the more query execution time is needed, this looks the same as the query execution results before indexing, but for more details it can be seen the average execution time of both in the following table and graph:

Table 1

Query execution time

*Amount of data retrieved*		*Time (seconds)*
*(records)*	*After indexing*		*Before indexing*
100		0.015	18.49
500		0.057	19,265
1000		0.082	39,511
5000		0.328	103
10000		1,267	197
15000		1,791	316
20000		2,466	418
25000		3.010	539
30000		3.052	650
Average		1.341	255.585

Figure 7

Comparison Graph of Average

Query Execution Time

It can be seen in table 1 and figure 6 that the average execution time before indexing is 255.585 seconds and the average execution time after indexing is 1.341 seconds. There is a significant time difference of 254.244 seconds.

F. Testing With Gradient Analysis

As previously explained, the query execution time before and after indexing increases when more data is to be displayed. To test the increase in both times, the researcher chose the gradient analysis method to test it. The researcher uses a linear regression mathematical model approach to find the gradient value. By using scatter plot library In the python programming language, the results of the distribution of data and gradient lines are as follows:

Figure 8

Distribution of Data and Gradient

Lines Before Indexing

Figure 9

Distribution of Data and Gradient

Lines After Indexing

From the two images it is not clear the difference between the two, the researcher then uses Microsoft excel to display the two linear lines in one graph to see the difference as shown in the picture:

Figure 10

Gradient Comparison

From the figure 10 the equation of the mathematical formula before indexing can be taken Y = 0.0211X + 5.9271 and after indexing Y = 0.0001X + 0.0032. It can be seen that the gradient and constant values after indexing are smaller than before indexing with a significant difference, namely 0.021 for gradient and 5.9239 for constant.

Conclusion

From the results of this study, it was found that using indexing in the database can speed up query execution time compared to not using indexing, with an average execution time difference of 254.244 seconds with average breakdown 1.341 seconds after indexing and average 255.585 seconds before indexing. In this study also obtained a significant difference in the results of the gradient 0.021 of the query data before and after using indexing where the mathematical formula is obtained before indexing: Y = 0.0211X + 5.9271 and after indexing : Y = 0.0001X + 0.0032. This proves that the indexing technique has a significant impact in accelerating the query execution process Researchers suggest for management and developers of MPN-info applications to use the B-Tree indexing technique so that the query execution process will be faster, and for further research to compare the B-Tree indexing method with other methods.

BIBLIOGRAFI

A. Ammar, M. Zainuri Sarringan, S. A-mostafa, A. Mustapha, and S. Hamad Khaleefah. (2020). Analyzing the Effect of Data Size Variation on the Performance of B-Tree and Hash Map Indexing in MySQL. 7(12). Google Scholar

Alrashidi, Huda Ayesh Mashaan, & Farhan, Hazim A. (2011). A Comparative Study of Indexing Techniques for Relational Database Management Systems. Middle East University. Google Scholar

Balasubramanian, Meiyalagan, & Sabharwal, Rohit. (2013, July 16). Dynamic integrated database index management. Google Patents. Google Scholar

Dongoran, Emir Septian Sori, Saleh, W. Kemas Rahmat, & Gozali, Alfian Akbar. (2015). Analysis and implementation of graph indexing for graph database using GraphGrep algorithm. 2015 3rd International Conference on Information and Communication Technology (ICoICT), 59ï¿½64. IEEE. Google Scholar

Elmasri, Ramez, & Navathe, Shamkant B. (2010). Fundamentals of Databases. Addison-Wesley. Google Scholar

Guzun, Gheorghi, & Canahuate, Guadalupe. (2016). Hybrid query optimization for hard-to-compress bit-vectors. The VLDB Journal, 25(3), 339ï¿½354. Google Scholar

Handajani, Mudjiastuti. (2009). Analisis Gradien Kepadatan Penduduk dan Konsumsi BBM. Jurnal Teknik Sipil Dan Perencanaan, 11(2), 141ï¿½148. Google Scholar

Mostafa, Salama A. (2020). A Case Study on B-Tree Database Indexing Technique. Journal of Soft Computing and Data Mining, 1(1), 27ï¿½35. Google Scholar

Mushofan, A. (2014). ï¿½B-trees and their application in databases,.ï¿½ Google Scholar

Putra, Diken Pradana, Darwiyanto, Eko, & Gozali, Alfian Akbar. (2015). Implementasi Fulltext Indexing pada Dokumen Elektronik dengan Algoritma B-Tree. EProceedings of Engineering, 2(1). Google Scholar

Copyright holder:

Samidi, Shofinurdin, Andra Setiadi, Danar Darmawan, Dika Andharu (2022)

First publication right:

Syntax Literate: Jurnal Ilmiah Indonesia

This article is licensed under: