Leveraging Random Forest Machine Learning for Subgroup Classification of Medulloblastoma

Authors

  • Bo Lan United World College of South East Asia, Singapore

Abstract

Medulloblastoma, the most prevalent malignant brain tumour in children, necessitates precise diagnostic methods due to its heterogeneous molecular subgroups. This study leverages Random Forest machine learning algorithms to classify medulloblastoma subgroups by analysing DNA methylation and gene expression data. Utilising the Gene Expression Omnibus dataset GSE85218 — comprising 763 primary MB samples — the study implements variance threshold feature selection for preprocessing. Models were evaluated based on precision, recall, F1 score, and accuracy — with the highest performance observed in models utilising Top 1% varied combined DNA methylation and gene expression data. Models performed similarly however, meaning only targeted gene expression and DNA methylation data are required for an accurate diagnosis. Gene Set Enrichment Analysis (GSEA) identified significant pathways related to neural processes, underscoring the tumour’s impact on neural development and function. Biomarkers were identified from the most important features identified by the ML model, with possible new biomarkers for subgroup diagnosis being discovered.

Downloads

Published

2024-12-30

How to Cite

Bo Lan. (2024). Leveraging Random Forest Machine Learning for Subgroup Classification of Medulloblastoma. ournal of esearch in ocial cience and umanities, 3(12), 41–52. etrieved from https://www.pioneerpublisher.com/jrssh/article/view/1137

Issue

Section

Articles