Novel XGBoost Tuned Machine Learning Model for Software Bug Prediction

Project Code :TCMAPY205

Objective

In this project, we verify the effectiveness of XGBoost Algorithm in detecting bugs in software, and compared with the other traditional machine learning algorithms like logistic regression, decision trees, random forest and AdaBoost.

Abstract

Software bug prediction becomes the vital activity during software development and maintenance. Defect prediction at early stages of software development life cycle is a crucial activity of quality assurance process and has been broadly studied in the last two decades.

The early prediction of defective modules in developing software can help the development team to utilize the available resources efficiently and effectively to deliver high quality software product in limited time. Machine learning approach is an effective way to identify the defective modules, which works by extracting the hidden patterns among software attributes.

In this project , several machine learning classification techniques are used to predict the software defects in NASA datasets JM1, CM1, KC2 and PC3. New model was proposed based on tuning the existing XGBoost model by changing its parameter namely n_estimator, learning rate, max depth, and subsample. The results achieved were compared with state-of the art models and our model outperformed them for all datasets.

Keywords: Machine Learning, Dataset, Supervised Learning, Random Forest, XgBoost, Ada Boost, Decision Tree.

NOTE: Without the concern of our team, please don't submit to the college. This Abstract varies based on student requirements.

Block Diagram

Specifications

HARDWARE SPECIFICATIONS:

Processor: I3/Intel
Processor RAM: 4GB (min)
Hard Disk: 128 GB
Key Board: Standard Windows Keyboard
Mouse: Two or Three Button Mouse
Monitor: Any

SOFTWARE SPECIFICATIONS:

Operating System: Windows 7+
Server-side Script: Python 3.6+
IDE: PyCharm
Libraries Used: Pandas, NumPy, sklearn, Flask, NLTK, TensorFlow.
Data set: JM1, CM1, KC2 and PC3 Data set.

Learning Outcomes

Uses of Unsupervised Learning.
Importance of classification.
Scope of XGBoost.
Use of Boosting techniques.
Importance of Jupyter Notebook.
How ensemble models works.
How boosting and bagging benefits simple ensemble techniques.
How gradient boosting enhances a models performance.
Process of debugging a code.
The problem with imbalanced dataset.
Benefits of SMOTE technique.
Input and Output modules
How test the project based on user inputs and observe the output
Project Development Skills:

Problem analyzing skills.
Problem solving skills.
Creativity and imaginary skills.
Programming skills.
Deployment.
Testing skills.
Debugging skills.
Project presentation skills.
Thesis writing skills.

Demo Video

Request Video

Python

Deep Learning
Data Mining
Cloud Computing
Data Science
Artificial Intelligence
Machine Learning

Android

Mobile Computing
Cloud Computing
Data Mining
Secure Computing
Service Computing

Java

Cloud Computing
Data Mining
CyberSecurity
BlockChain
Big Data
Secure Computing
Service Computing
Software Engineering
Networking
Intrusion Detection System
Mobile Computing
Parallel and Distributed System

NS2

MANET
VANET
Networking
Wireless Communication
Mobile Computing

Gaming Projects

Machine Learning