Predicting response in cancer patients using machine learning models

Sanjay Rathee received his PhD in Computer Engineering from the Indian Institute of Technology Mandi. He is a postdoctoral researcher in the Bioinformatics Hub (led by Dr. Francesca Buffa), a collaborative bioinformatics research service, recently launched by the Institute for Radiation Oncology. The Hub’s mission is to provide collaborative support for Bioinformatics projects, also providing ad-hoc training to biologists and clinical fellows. Its expertise range from experimental design, high-throughput Next Generation Sequencing (NGS) analysis, single cell RNA-seq analysis, and microarray analysis, as well as other common bioinformatics analysis techniques. 

Clinical decision making for colorectal cancer (CRC) patients is based on a relatively small number of clinical and pathological hallmarks. Developments in the cost effectiveness and robustness of genome-wide molecular phenotyping greatly expand the number of features that can feed into decision making. Rathee and fellow researchers aim to develop and validate multi-gene models, based on gene expression, which can predict treatment response for CRC patients undergoing chemotherapy, radiotherapy, and oxaliplatin treatment using data from the S:CORT consortium. If successful, this would allow treatment to be individually tailored, meaning that patients receive the most effective treatment in the first instance. Looking at concordant predictions between different methods, Sanjay addresses the challenge of discovering decision-making genes with minimum noise, ensuring models are general and reliable. Several steps are used to generate predictions:

  1. Probes carrying low signals in most of the samples are filtered out.
  2. Differentially expressed genes are discovered using Limma (PMID:25605792), a Bayesian approach, and SAMR (PMID:11309499), based on permutation.
  3. Models of increasing complexity are used to discover decision-making genes. Specifically, instead of selecting a particular machine learning methodology for feature selection, Rathee’s model iterates to select decision-making genes using general linear and logistic regression, elastic net, lasso, random forest and support vector machines. The results for all models are thus checked for model accuracy, specificity and sensitivity.

In the first analysis of S:CORT samples (131 rectal cancer samples) from patients treated with radiotherapy and capecitabine, the model gives an accuracy of 89.68%. Whilst initial accuracy is quite high on retrospective cohorts, the results needs to be validated in external, independent cohorts. If validated, the model could predict response for a new patient based on a pre-treatment biopsy. Finding out whether a treatment is likely to be effective could greatly minimise the number of unnecessary interventions and associated side effects (and cost) whilst also increasing the chance of the chosen intervention being an effective treatment strategy.

Across several projects Sanjay is collaborating with a vast number of researchers including: Dimitris Voukantsis and Naveen Prasad within the Oxford Bioinformatics Hub, and Prof. Tim Maughan (Department of Oncology) and the S:CORT team.  Sanjay’s work is funded by CRUK and the MRC

Find out more about our research below