The project develops a machine learning system for detecting malware in PDFs using various algorithms, aiming for high accuracy, interpretability, and real-time threat mitigation.
In the digital age, PDF files are widely used for document sharing, but their popularity also makes them a target for malware attacks. This project, titled "PDF Malware Detection: Toward Machine Learning Modeling With Explainability Analysis," aims to develop and evaluate machine learning models for detecting malware in PDF files. Utilizing a dataset from Kaggle, which contains labeled examples of malicious and benign PDFs, various algorithms including Random Forest, C5.0, J48, Support Vector Machine (SVM), AdaBoost, Deep Neural Network (DNN), Gradient Boosting Machine (GBM), and K-Nearest Neighbors (KNN) will be applied. The primary focus is on achieving high detection accuracy while also providing explainability to understand the decision-making process of the models. By leveraging machine learning techniques, this project seeks to enhance cybersecurity measures, offering a robust solution to identify and mitigate potential threats embedded in PDF documents.
Keywords: PDF malware detection, machine learning, Random Forest, SVM, DNN, explainability, cybersecurity, malicious PDF, classification algorithms, Kaggle dataset.
NOTE: Without the concern of our team, please don't submit to the college. This Abstract varies based on student requirements.

Hardware Requirements:
Operating system : Windows 7 or 7+
RAM : 8 GB
Hard disc or SSD : More than 500 GB
Processor : Intel 3rd generation or high or Ryzen with 8 GB Ram
Software Requirements:
Softwareβs : Python 3.10 or high version
IDE : Visual Studio Code.
Framework : Flask