* News


Identification of a small number of highly predictive genes for disease status

Dr. Md. Rafiul Hassan
University of Melbourne, Australia

May 14, 2009
JEC 3117, 4:00 p.m. to 5:00 p.m.
Refreshments at 3:30 p.m.


Current micro-array data can have dimensions up to approximately 20,000. It is anticipated that this dimension could reach up to one million (SNP Array 6.0 has more than 946,000 probes for the detection of copy number variation). This, in turn, shows that the capacity of micro-arrays is growing into the hundreds of thousands of dimensions. The large dimension of the data introduces the problem of the ‘curse of dimensionality’, which, in turn, implies that an exponential number of instances are needed to model the distribution (e.g. for a dataset with 100 dimensions with each feature containing two values, the required data instances are of the order of 2100 in order to represent the underlying distribution, which is highly impractical). Fortunately, some of these features are independent and many do not contribute in discriminating the dataset. In this talk we describe robust feature selection techniques based on area under the receiver operating characteristics curve to effectively classify the disease status using gene expression data.


Dr. Md. Rafiul Hassan is working as a research fellow at the Department of Computer Science and Software Engineering in the University of Melbourne, Australia. He received a PhD in 2007 from the University of Melbourne and a B.Sc (Engg.) in Electronics and Computer Science in 2000 from Shah Jalal University of Science and Technology, Bangladesh. His research interests include data mining, support vector machine, feature selection and Receiver Operating Characteristics Curve (ROC) with a particular focus on developing machine learning tools for classifying gene expression data. Before joining the University of Melbourne, Dr. Hassan worked as an Assistant Professor at the Department of Computer Science and Engineering in Shah Jalal University of Science and Technology, Bangladesh. He is currently involved in research and development projects for effective classification of Bioinformatics data. He is the author of around 20 papers in recognized international journals and conferences. He is a member of Australian Society of Operations Research (ASOR), IEEE and IEEE Computer society; and is involved in several Program Committees of international conferences. He also serves as the reviewer of a few renowned journals such as BMC Cancer, Information Sciences, Digital Signal Processing, and Computer Communications.

Hosted by: Dr. Mohammed Zaki (x6340)

Last updated: May 1, 2009