Lung Cancer Prediction using Machine Learning

Project Code :TCMAPY1516

Objective

The objective of this project is to develop an effective image classification system by combining deep learning and traditional machine learning techniques. The system aims to extract relevant features from images using a pre-trained MobileNet model and apply various ensemble strategies, including SVM + Random Forest, Random Forest + Logistic Regression, SVM + Logistic Regression, and a combined SVM + Random Forest + Logistic Regression ensemble. By employing soft voting for prediction aggregation and StratifiedKFold for cross-validation, the goal is to evaluate and compare the performance of different classifier combinations in terms of accuracy, precision, recall, F1-score, and confusion matrix.

Abstract

This research proposes a hybrid strategy that combines deep learning and conventional machine learning methods for image classification. Initially, images are preprocessed using Image Data Generator to resize them to 224x224 pixels and normalize pixel values to the range [0, 1]. A pre-trained MobileNet model (excluding the top layer) is utilized for feature extraction. The feature map is then transformed into a flat vector using a Global Average Pooling layer. Various combinations of classifiers are employed for prediction, including SVM + Random Forest, Random Forest + Logistic Regression, SVM + Logistic Regression, and a full ensemble of SVM + Random Forest + Logistic Regression. For aggregating predictions, the ensemble approach uses soft voting based on the average of each classifier’s probabilities. StratifiedKFold cross-validation is used to maintain the proportion of each class during testing, ensuring reliable model evaluation. Model performance is assessed using accuracy, precision, recall, F1-score, and a confusion matrix against test and validation sets to compare the effectiveness of different classifier combinations. Keywords: Image classification, MobileNet, feature extraction, ensemble learning, Support Vector Machine, Random Forest, Logistic Regression, StratifiedKFold, cross-validation, soft voting.  

NOTE: Without the concern of our team, please don't submit to the college. This Abstract varies based on student requirements.

Block Diagram

Specifications

SOFTWARE REQUIREMENS

Operating System                               :  Windows 7/8/10

Server side Script                                :  HTML, CSS, Bootstrap & JS

Programming Language                     :  Python

Libraries                                              :Flask, Torch, Tensorflow, Pandas, Mysql.connector

IDE/Workbench                                  :  VSCode

Server Deployment                             :  Xampp Server

Database                                             :  MySQL    

HARDWARE REQUIREMENTS

Processor                                   - I3/Intel Processor

RAM                                       - 8GB (min)

Hard Disk                                - 128 GB

Key Board                               - Standard Windows Keyboard

Mouse                                      - Two or Three Button Mouse

Monitor                                    - Any

Demo Video