Machine learning (ML) tools are suitable to dive vast amounts of clinical and genetic information in order to identify genetic biomarkers of worse survival and potential new molecular targets. This study has investigated the ability of one such tool to identify genetic biomarkers associated with higher risk of mortality in breast cancer, biomarkers that may end up becoming novel molecular targets for future pharmacological intervention.
MATERIAL & METHODS
Whole-exome sequencing and clinical information from 945 breast tumours were obtained from Broad Institute of MIT and Harvard. Data were downloaded from http://gdac.broadinstitute.org. Sequencing data from each tumour was encoded using a proprietary encryption system that generated individual vectors of 10,240 positions and 2% sparsity containing all the mutations present in each sample. This vector collection was then input into a ML framework (MLF) to identify subgroup of patients based on their genetic similarities. The resultant subpopulations were correlated with overall survival at 7 years. using the Kaplan-Meier method/log-rank test. Differences were considered significant if the p-value was < .05. Genetic markers significantly contributing to their differences were identified and the biological pathways that were affected were assigned using the KEGG pathway database.
The MLF identified two different subpopulations: SP0 (n=358) and SP1 (n=587). Stratification analysis demonstrated no association between any of the subpopulations with clinical (age, race, ethnic and staging), pathological (hystotypes and pTNM), nor molecular subtypes. Patients in SP1 had a higher risk of death at 7 years compared to those in SP0 (hazard ratio (HR): 1.5; 95% confidence interval (CI): 1.01-2.25; p=.04). Patients from SP1 presented a higher tumour mutation burden (SP1: 5520 vs. SP0: 1491 mutations; p<.001) and a selective contribution from the following KEGG pathways: PI3K/Akt, calcium, oxytocin and Rap1 signalling, focal adhesion, regulation of actin cytoskeleton, axon guidance and protein digestion. Of those, only PI3K/Akt (hsa04151; HR: 1.63; CI95%: 1.07-2.48; p=.02), axon guidance (hsa04360; HR: 1.89; CI95%: 1.16-3.08; p=.01) and regulation of actin cytoskeleton pathways (hsa04810; HR: 2.17; CI95%: 1.38-3.41; p=.0008) were related to a higher risk of death at 7 years. When patients harbouring gene mutations associated to these 3 pathways were discarded from the analysis, the unfavourable survival of SP1 patients was lost corroborating the role of these 3 pathways in their worse prognosis. Genes mutated in the PI3K/Akt pathway were COL4A2, ERBB3, IGF1R, COL6A2, PPP2R5B, PDGFRA, NTRK2, FGFR2, IL7R, PDGFB and CSF1R. Genes mutated in the regulation of actin cytoskeleton pathway were APC, DOCK1, SSH3, PDGFRA, HRAS, INSRR, ITGB5, MYLK2 and ACTN1. Genes mutated in the regulation of axon guidance pathway were BMPR2, SMO, PLXNA4, L1CAM, SSH3, PLXNB2 and EFNA3.
CONCLUSION AND FUTURE DIRECTIONS:
Our MLF has identified various genes involved in the process of PI3K/Akt and cell adhesion and migration as being critically related to an impaired survival. This methodology should be validated in other genetics datasets. Various of these genes should be investigated as potential new drug discovery targets.