Volume 7, Issue 1

Random Forest Classifier for Diagnosis of Breast Cancer in African Women

Babafemi Macaulay1, Soji Akande2, Olusola Olabanjo3, Bolu Akinnuwesi4, and Benjamin Aribisala5
1Lagos State University, Nigeria, 2Lagos State University Teaching Hospital, Nigeria, 3Lagos State University , Nigeria, 4Lagos State University , Nigeria, and 5Lagos State University , Nigeria


Introduction: Breast cancer is the highest cause of cancer-related mortality among women globally. It is documented that 15% of all female cancer is breast cancer. Diagnosis and treatment of breast cancer in its earliest stage remains the only way to improve its outcome and reduce mortality, thus early and accurate diagnosis of breast cancer is important. Early detection of breast cancer among women in Sub-Saharan Africa (SSA) is very challenging to say the least as factors such as low knowledge of breast cancer, lack of awareness of early detection treatment, treatment cost, poor perception of breast cancer, socio-cultural factors such as belief, traditions and fears affect health seeking behaviour of African women but there is limited research efforts in computational approach to diagnosis of breast cancer in SSA. Aim: Here, we propose a novel diagnosis model for African women using Random Forest (RF) machine learning technique. Methods: Study data comprised of technical indicators for breast cancer diagnosis, collected from breast cancer patients attending oncology clinic in Lagos State University teaching hospital. A total of 180 subjects were studied out of which 90 were confirmed cases of breast cancer and 90 were benign. Nine diagnostic parameters were included. These are clump thickness, marginal adhesion, uniformity of cell size, uniformity of cell shape, single epithelial cell, bare nuclei, bland chromatin, normal nucleoli and mitosis. Principal Component Analysis (PCA) was used for feature selection and RF model was used for classification. Results: The RF model gave an accuracy of 98.23%, sensitivity of 95.24%, and specificity of 100.00% and Area under curve (AUC) of 98%. Conclusion: The proposed Random Forest model has a good potential at classifying breast cancer in African women. Adoption of computational diagnosis approach in SSA can lead to early diagnosis and reduction of mortality rate.

Keywords: Breast Cancer, Random Forest, Machine Learning, African Women, Diagnosis, and Feature Extraction

Download PDF