Understanding the rules of life

Bioscience for an integrated understanding of health

Category: Standard Studentships

Development of a method for the inference of protein function and application to Mycobacterium tuberculosis

Project No.2263

Primary Supervisor

Prof Mark Wass- University of Kent

Co-Supervisor(s)

Prof Martin Michaelis- University of Kent

Dr Simon Waddell – University of Southampton

Summary

With the genomes of many species now sequenced there are millions of genes and their protein products present in databases.

For example, UniProt, the international database of proteins, contains more than 200 million proteins and this number continues to increase rapidly. However, we do not know the function of the vast majority of proteins, with less than 1% of the proteins in UniProt having an experimentally verified function. This gap in knowledge has driven the development of computational methods and this will form the initial focus of this project. We have recently

developed a new approach that combines many different bioinformatics methods to infer protein functions and we have applied it successfully to the minimal bacterial genome. That method relied on manual combination of data to make predictions, so in this project we will fully automate this approach using the latest machine and deep learning methods. This will enable us to apply this approach on a large scale to the many millions of functionally uncharacterised proteins. We will apply the resulting method towards the discovery of new drugs for the major human bacterial pathogen Mycobacterium tuberculosis (M.tb). Tuberculosis kills 10 million people every year, and despite gains in TB control in recent years, this number is set to increase due to the impact of COVID on worldwide health systems. Treatment for tuberculosis requires 4 drugs for 6 months; novel drug discovery efforts to reduce the length of therapy and counter drug resistance are hampered by a lack of druggable pathways to target. The M.tb genome contains ~4,200 genes, a third of which have no predicted function. We will combine our functional prediction tools with RNAseq datasets derived from M.tb in the murine lung to reveal new insight into host-pathogen in vivo interactions and to discover new pathways for drug discovery.