BIO SOFTWARE

developed by BIO-IT Lab.
Dankook University, Korea

AmRMR
  Minimum Redundancy - Maximum Relevance (mRMR) is one of well-known feature selection algorithms that selects features by calculating redundancy and relevance between features and class vector. mRMR uses mutual information as a measure of redundancy and relevance. In this study, we proposed a method to improve the performance of mRMR feature selection by using Pearson's correlation coefficient as a redundancy measure and using R-value as a relevance measure. We selected features by original mRMR and proposed method from various datasets, and performed classification test. From the results, we confirmed that the proposed method showed significant improvement in classification accuracy in many cases.

R-value
 The quality of dataset has a profound effect on classification accuracy, and there is a clear need for some method to evaluate this quality. R-value is a new dataset evaluation method. This proposed method is based on the ratio of overlapping areas among categories in a dataset. A high R-value for a dataset indicates that the dataset contains wide overlapping areas among its categories (classes), and classification accuracy on the dataset may become low. We can use the R-value measure to understand the characteristics of a dataset, the feature selection process, and the proper design of new classifiers.

RFS
 We propose a new efficient feature selection method based on the R-value. The original R-value was designed to evaluate the entire dataset, but we also found that it could be applied to the feature selection task using the modified R(D). The R-value-based feature selection (RFS) method scores the overlapping areas of each feature in candidate features, and then selects features that have low R-value.  Proposed idea is simple, but powerful for feature selection.

Concave Hull
 The convex hull indicates the boundary of the minical convex set containing a given nonempty finite set of point in the plane. The concave hull approach is a more advanced approach used to capture the exact shape of the surface of a dataset. It can increase performance of accuracy in machin learning areas. Our new concavehull algorithm is n-dimensional concave whereas previous researches suggest for 2-dimension datasets. Additionaly our concaveness measure and graph can use to abtain information of geometric boundary.

UniPrimer
Primer design for comparative analysis of the primate genomes.

DAMC-MC
 Classification is one of the paramount techniques in machine learning and computational biology. Various successful classification schemes have been proposed for datasets which have binary classes and a few features. If a dataset has multiple classes and huge features like microarray data, classification accuracy may be low, even though feature selections are applied to reduce the dimensions of the dataset. Here we introduce our new classification algorithm called "DAMC-MC" which stands for "Divide-and-Merge Classification for Multi-Class datasets".

Spinal Cord

CBFS
High performance feature selection algorithm based on feature clearness

AGM
 We are suggesting a new method AGM (artificial gene making) to improve classification accuracy. The role of artificial gene is to leave space among different classes of gene selection result. Advantage of artificial gene is to reduce ambiguous or congested areas among classes, which leads to improved classification accuracy.

boostMDR
 Boosting method that reduces the execution time of multifactor dimensionality reduction by using pre-evaluation measurements to remove gene sets that have low interaction before applying the reduction to the remaining sets.

postDiscretization

haploFinder

Find Biomarker
 Data mining approach for finding biomarker genes based on microarray dataset