Leveraging Random Forest Machine Learning for Subgroup Classification of Medulloblastoma

Bo Lan

Authors

Bo Lan United World College of South East Asia, Singapore

Abstract

Medulloblastoma, the most prevalent malignant brain tumour in children, necessitates precise diagnostic methods due to its heterogeneous molecular subgroups. This study leverages Random Forest machine learning algorithms to classify medulloblastoma subgroups by analysing DNA methylation and gene expression data. Utilising the Gene Expression Omnibus dataset GSE85218 — comprising 763 primary MB samples — the study implements variance threshold feature selection for preprocessing. Models were evaluated based on precision, recall, F1 score, and accuracy — with the highest performance observed in models utilising Top 1% varied combined DNA methylation and gene expression data. Models performed similarly however, meaning only targeted gene expression and DNA methylation data are required for an accurate diagnosis. Gene Set Enrichment Analysis (GSEA) identified significant pathways related to neural processes, underscoring the tumour’s impact on neural development and function. Biomarkers were identified from the most important features identified by the ML model, with possible new biomarkers for subgroup diagnosis being discovered.

Leveraging Random Forest Machine Learning for Subgroup Classification of Medulloblastoma

Authors

Abstract

Downloads

Published

How to Cite

Issue

Section