Research

My research focuses on building algorithms for analyzing biological data using various biochemical networks. In the last decade, the advancements in technologies to detect, identify and quantify biological molecules, enabled life scientists to obtain biological data fast and cheap. It took 13 years and ~$1b to sequence the first genome, right now, it takes a day and ~$1k. This unprecedented advancement and inevitable interest in understanding biology of genomes and genetic diseases have resulted in accumulation of vast amounts of information. Consequently, biosciences have faced the problem of “big data”. Today, the bottleneck in the bio-research is the lack of computational power and algorithms that can efficiently analyze the data and make discoveries.  Central dogma in molecular biology dictates the information flow from DNA --> RNA  --> Protein --> Metabolite. Each layer introduces 20k, 100k, 1m, and 3k variables respectively. The search space for even a basic pattern discovery is clearly intractable.  Biological networks come handy for associating biologically relevant variables, as a means of abstraction for modeling and for pruning the search space significantly. I am doing research on designing machine learning algorithms that incorporates (1) biological datasets (e.g., gene expression, metabolite), and (2) biological networks (e.g., genome-scale reconstructed metabolic networks, protein interaction networks) to predict compounds/genes/motives that are biomarkers for a disease. National Human Genome Research Institute projects the main research focus of this decade to be on understanding the biology of the genetic diseases and the next to be advancing the science of medicine. Thus, the time is right for knowledge discovery on genetic diseases. I am also working on designing web-based software systems that enable researchers from all around the world to use the algorithms I develop. Even though computational biology is my main focus I had done research in database and data mining fields as well, such as predicting information leak in privacy preserving database sharing and estimating information gain in secure multi party computation in distributed databases. Aside from computational biology I am interested in any application of machine learning as well.