[ Concave Hull ]

1. Introduction

DAMC-MC is a method contains both feature selection and classification scheme. It can be applied for any classification algorithm. The divide-and-merge technique is a technique which deals with problem in a way to separate the problem into multiple task and work on each separated problem. Then merge solved separate problem to make a final decision.

2. Method

Steps of DAMC-MC classification:

  • Prepare class-adjusted datasets from the original dataset(the left side of Figure 1 )
  • Apply feature selection and make feature-selected datasets (the right side of Figure 1 )
  • Calculate support degrees of each class using new datasets. (Figure 2 and 3 )
  • Merge support degrees and decide the class of the unclassified instance.


Figure 1. Prepare class-adjusted datasets from the original dataset



Figure 2. Calculating support degree for each class and merge process


Figure 3. An example of calculating support degrees


3. Programs

Folder description:

- DamClassifiers.java
- FeatureRanking.java
- FeatureScoring.java
- kfoldTestDAMC.java
- kfoldTest.java
- parameter.txt - User defined txt File
- runDAMC.bat - executable Batch file

Note
    • parameter.txt file contains user defined variables
      Note: put the numbers after each colon in each line

      line 1: NO_OF_FOLD: [number of fold]
      line 2: NO_OF_REPEAT: [number of repeat for Kfold test]
      line 3: CLASSIFIER: [KNN-1, KStar-2, SVM-3, NaiveBayes-4, LogisticRegression-5]
      line 4: FEATURE_SELECTION: [Relief-1, GrainRatio-2, FS-SVM-3, FS-KNN-4]
      line 3: SELECTED_FEATURE: [number of selected feature]
      line 4: KNN: [number of nearest neighbors for KNN classification]
    • runDAMC.bat file executable batch file

      Format of Execution:
      java kfoldTestDAMC [input data file]
  • normal: Contains Normal classification and Feature selection methods

- Classifiers.java
- FeatureRanking.java
- FeatureScoring.java
- kfoldTestNormal.java
- parameter.txt - User defined txt File
- run.bat - executable Batch file
DAMC-MC and Normal methods use these packages. You should put them into C -> Program Files -> Java -> JRE -> lib -> ext.
  • dataset: Contains 10 datasets: 5 Gene expression and 5 other biological data

    Gene expression datasets: GDS1027, GDS2545, GDS2546, GDS2547, GDS3715
    Other biological datasets: DLBCL, Lung, MLL, multi_tissues, SRBCT

Note
  1. Dataset should be .csv format.
  2. Only numerical vaule is allowed for dataset.
  3. First column of dataset should be class(category) data and shold have continuous value beginning with 0.(0,1,2,3,...)

4. Results of Classification test using Normal and DAMC-MC methods

    The numbers 5, 10, 20, 30, and 40 in the tables denote the number of selected features for the test. The 'Normal' column means the normal feature selection method, which does not apply the DAMC approach.

*** Contact email: sejongoh @ dankook . ac . kr