Research Interests

Our research focus on the development of computational tools via machine learning especially deep learning for genomic and transcriptomic data analysis on long-read sequencing data. We are particularly interested in Nanopore sequencing data analysis.

Modification detection

Modifications, such as 5-methylcytosine (5mC) and N6-methyladenine (6mA), widely exist in human genome and transcriptome, and play fundamental roles in biological processes including human diseases including cancers and neurological diseases. However, traditional techniques have limited power to accurately detect base modifications, and make it difficult, if not impossible, to study the relationship of base modifications to disease pathology at the genomic/transcriptomic scale. Nanopore sequencing enables large-scale detection of base modification via Nanopore voltage signal analysis. We are developping deep-learning methods to decipher modifications from Nanopore signals.

Long-read data analysis

Long-read sequencing has been quickly developing to provide longer reads and decreasing cost. The longer reads benefit the detection of structural variants in DNA genomes and full isoforms in transcriptomics. We are thus interested in the development of computational tools to (i) detect different variants from DNA genomes and (ii) detect variants in isforms (gene fusions and alternative splicing).