Finding Biomarker

[ Feature Selection Library for Biomarker Identification ]

1. Introduction
One of important application of gene expression data is to find significant genes related with specific disease such as lung cancer. These significant genes may be used as biomarker to diagnose the disease. Data mining approach is useful to find the biomarker and perform diagnosis. Feature selection and classification schemes are widely used for the purpose. In this study, we suggest data mining approach for finding biomarker genes based on microarray dataset. Proposed approach contains procedural steps to finding and testing biomarker genes. We also implement the proposed method as R library for general usage. We test 12 microarray datasets using implemented library, and suggest best biomarker genes and proper predictor (classifier) for each of them. One of identified biomarker genes is compared with differentially expressed gene list including biological network analysis.

1. Process of disease diagnosis using biomarker

2. Overall steps of procedure to find biomarker

3. Usage

Format of input dataset:

The dataset should be .csv format. Columns are genes and rows are samples in the data table. First column expresses phenotype of samples for target disease. The dataset may contain gene name or gene id in the first row.

Execution

- modify main.R to designate file name of dataset and parameter setting
- User can choose sort of algorithms that he/she wants to test through parameter setting
- Execute main.R

Result

- view RESULT_TOTAL.csv file in 'result' directory

3. Download R source code

published_code_20150121.zip

Sample Dataset

ALL.csv