High-throughput sequencing is now routinely applied across a wide range of research topics in biology and medicine, generating massive datasets (genotyping, transcriptomics, proteomics, metabolomics, metagenomics). Such data is generally imbalanced (a large number of variables and a small number of samples) and multi-class, requiring the development of data-driven learning methods. During my scientific career, I have developed and applied several approaches inspired by machine learning, probabilistic models, and multi-objective optimisation to address different biological problems, including protein function annotation, metagenomics, and immunoinformatics, in organisms ranging from bacteria to humans, with a particular focus on feature selection, dimensionality reduction, clustering and classification of high-dimensional and heterogeneous multi-omic data.