Two machine learning frameworks developed in Topazium, GFPrint™ and PredLung™, have been used to extract important information from cancer patients, in particular from lung cancer patients, that may serve to design better therapies, generate virtual biomarker signatures and predict patient outcomes. Three case studies are presented, focused on i) the discovery of potential novel targets and biomarkers for epidermoid non-small cell lung cancer; ii) the identification of super-responders to current lung cancer therapies; iii) and the creation of molecular signatures in cancer patients that can be exploited for personalized medicine.
The MLF tools developed by Topazium, GFPrint™ and PredLung™, have proved efficient in exploiting biological information from cancer patients to obtain useful insights such as discovering potential novel targets and biomarkers of worse prognosis, identifying super-responders to therapies based on hematological parameters or depicting molecular signatures that may enable personalized therapies. They operate following a similar strategy: first, functional features are extracted from high dimensional data to create a synthetic representation of patients; second, the extracted embedding representation is analyzed with different unsupervised models to determine nonlinear relationships in the data that ultimately detect different clusters of patients showing a different clinical outcome; finally, the lower-dimensional representations are analyzed by mapping back to the original data to identify critical features that influence the clinical outcome, like the aggregated mutational burden or blood test parameters.
These tools have been tested and validated with public datasets that are affected by important limitations. The data obtained from the TCGA dataset proceed from different origins and the clinical information associated to them may have been processed in many different ways: therefore, their interoperability is not fully granted. Data from the SQUIRE trial are not affected by this drawback because, even if they were originated by different researchers, they are owned by a single sponsor and thus they are expected to have been processed and curated under homogeneous specifications. However, it would have been desirable to expand the number of patients further, e.g. including patients from other arms in the trial whose data were not available. Finally, in vitro data obtained with tumor cells corresponded to the inhibitory effect observed at one single concentration of compound, a very limited observation: it would have been more reliable to have full concentration-response data that would have informed better on the effect and pharmacology of the compounds. Nonetheless, despite these limitations both MLFs have performed well in revealing important attributes that had remained unnoticed so far, and this has served to generate sound hypothesis whose real value must be confirmed with experimental work. Therefore, we believe that these machine learning frameworks are useful tools that will contribute to foster research in lung cancer, ultimately aiming to enhance patients’ health and wellbeing.