Optimized Breast Cancer Classification Using PCA-LASSO Feature Selection and Ensemble Learning Strategies With Optuna Optimization

Project Code :TCPGPY1976

Objective

This project proposes an optimized breast cancer classification system combining PCA and LASSO for feature selection, with classifiers like Random Forest, SVM, XGBoost, ANN, and Decision Tree. Models are fine-tuned using GridSearchCV, RandomizedSearchCV, and Optuna. The best-performing model is deployed via a Flask-based web interface for real-time diagnosis by uploading test inputs. This system enables accurate, fast, and accessible breast cancer detection, supporting clinical decision-making through intelligent pattern recognition and user-friendly automation.

Abstract

This project presents an optimized breast cancer classification system integrating dimensionality reduction, feature selection, and advanced ensemble learning models. The existing framework employs Principal Component Analysis (PCA) and LASSO for feature refinement, followed by traditional classifiers like Random Forest, SVM, Gradient Boosting, and Logistic Regression, all tuned via GridSearchCV, RandomizedSearchCV, and Optuna optimization with 3-fold cross-validation. To enhance performance, the proposed system introduces Artificial Neural Networks (ANN), XGBoost, and Decision Tree classifiers for more robust prediction. The model exhibiting the best accuracy is selected for deployment. A user-friendly web interface is developed using Flask, HTML, CSS, and JavaScript to allow real-time prediction of breast cancer status—malignant or benign—by uploading test inputs. This approach combines powerful machine learning algorithms with modern web technologies to provide accurate, fast, and accessible breast cancer diagnosis support. The final system aims to aid clinical decision-making and early detection through automation and intelligent pattern recognition.

Keywords:
Breast Cancer, PCA, LASSO, Ensemble Learning, XGBoost, ANN, Decision Tree, Optuna, Classification, Flask Web App, Feature Selection, GridSearchCV, Kaggle Dataset, Malignant, Benign

NOTE: Without the concern of our team, please don't submit to the college. This Abstract varies based on student requirements.

Block Diagram

Specifications

SOFTWARE REQUIREMENS

Operating System                               :  Windows 7/8/10

Server-side Script                               :  HTML, CSS, Bootstrap & JS

Programming Language                     :  Python

Libraries                                              : Flask, Pandas,, Sklearn,NumPy, Seaborn, Matplotlib

IDE/Workbench                                  :  VSCode

Technology                                         :  Python 3.8+

Server Deployment                             :  Xampp Server

Database                                             :  MySQL    

 

HARDWARE REQUIREMENTS

Processor                                  - I5/Intel Processor

RAM                                       - 8GB+ (min)

Hard Disk                                - 128 GB+

Key Board                               - Standard Windows Keyboard

Mouse                                      - Two or Three Button Mouse

Monitor                                    - Any

Demo Video

mail-banner
call-banner
contact-banner
Request Video