Post-Processing for discretization

[ Post-Processing for discretization ]

1. Introduction

Bioinformatics and data mining require data analysis schemes. Many methods of analysis, such as those focusing on entropy, have been developed and assume that the input data has discrete values. Therefore, when using continuous data, discretization needs to be performed before analysis can begin. Many discretization algorithms have been proposed, and these discretize a given dataset attribute-by-attribute. Although such methods assume that the attributes are independent from each other, in reality these attributes interact with and influence the results of the analysis as a group, not individually. In this paper we propose a post-processing method that can improve the quality of discretization. After the normal discretization process, we adjust the boundary point of the discretization for each attribute, and then after evaluating the group effect of the adjusted point, we update the original boundary point by adjusting it if it has a positive influence on the attribute. The results of the empirical experiments show that the adjusted dataset improves the classification accuracy.

2. Supplementary Materials

post.discretizeLib.R : includes functions for post discretization

test.post.discretizeLib.R : Test code for post.discretizeLib

3. Reference
(to be added)