Machine Learning for Small Molecules

Small molecules have a pivotal role in many applications in life sciences, including biomedicine and drug discovery, environmental sciences and biotechnology. At the same time, the emergence of large open datasets is fuelling the development of new machine learning technologies.  I will discuss a generic machine learning task of predicting the compatibility score F(x,y) of a pair of objects x and y, or in general a set of multiple interacting objects. Machine learning tasks such as predicting structured output, link prediction in networks as well as multi-variate association analysis in paired datasets falls under this umbrella. 

 

I will show how this generic task manifests in applications with small molecules, including  small molecule identification from mass spectrometric data (Bach et al. 2022, Brogat-Motte et al. 2022 ), biomarker discovery (Huusari et al. 2021, Uurtio et al. 2019) and drug combination prediction (Wang et al. 2021).

 

References:

 

Bach, E., Schymanski, E.L. and Rousu, J., 2022. Joint structural annotation of small molecules using liquid chromatography retention order and tandem mass spectrometry data. Nature Machine Intelligence, 4(12), pp.1224-1237.

 

Brogat-Motte, L., Flamary, R., Brouard, C., Rousu, J. and d’Alché-Buc, F., 2022, June. Learning to predict graphs with fused Gromov-Wasserstein barycenters. In International Conference on Machine Learning (pp. 2321-2335). PMLR.

 

Huusari, R., Bhadra, S., Capponi, C., Kadri, H. and Rousu, J., 2021. Learning primal-dual sparse kernel machines. arXiv e-prints, pp.arXiv-2108.

 

Uurtio, V., Bhadra, S. and Rousu, J., 2019, May. Large-scale sparse kernel canonical correlation analysis. In International Conference on Machine Learning (pp. 6383-6391). PMLR.

 

Wang, T., Szedmak, S., Wang, H., Aittokallio, T., Pahikkala, T., Cichonska, A. and Rousu, J., 2021. Modeling drug combination effects via latent tensor reconstruction. Bioinformatics, 37(Supplement_1), pp.i93-i101.