This project aims to develop a machine learning framework to predict associations between genes and diseases using various models like Random Forest and XGBoost. It will categorize diseases into 'Disease', 'Group', and 'Phenotype', and enhance prediction accuracy through data preprocessing. A Flask web application will enable users to input data and research into gene-disease associations.
The project "Analysis for Disease Gene Association Using Machine Learning" aims to analyze and predict associations between genes and diseases using advanced machine learning techniques. With the growing availability of genetic data, understanding how specific genes relate to diseases has become vital in biomedical research. This study utilizes a comprehensive dataset that includes gene-specific information such as Disease Specificity Index (DSI) and Disease Pleiotropy Index (DPI), alongside various disease features like semantic type and classification.
The project implements four machine learning algorithms—Random Forest, XGBoost, LightGBM, and K-Nearest Neighbors (KNN)—to predict three key output classes: Disease, Group, and Phenotype. The Random Forest model achieved the highest accuracy (97.81%), and it was deployed using a Flask framework for real-time predictions. Preprocessing steps included filling missing values, label encoding, and clustering to categorize diseases. KMeans clustering grouped diseases into broader categories based on similarities, further enhancing the prediction capabilities.
The project demonstrates the potential of machine learning in advancing genomic research by providing insights into gene-disease associations. It offers a practical tool for researchers to explore genetic links to diseases efficiently.
Keywords: gene-disease association, machine learning, clustering, Flask, Random Forest, disease classification, phenotype prediction, gene prediction, bioinformatics, data preprocessing
NOTE: Without the concern of our team, please don't submit to the college. This Abstract varies based on student requirements.

H/W CONFIGURATION:
Processor - I3/Intel Processor
Hard Disk - 160GB
Key Board - Standard Windows Keyboard
Mouse - Two or Three Button Mouse
Monitor - SVGA
RAM - 8GB
S/W CONFIGURATION:
• Operating System : Windows 7/8/10
• Server side Script : HTML, CSS, Bootstrap & JS
• Programming Language : Python
• Libraries : Flask, Pandas, MySQL. Connector, Tensor flow, Keras
• IDE/Workbench : VS Code
• Technology : Python 3.8+
• Server Deployment : Xampp Server