Across the Spectrum In-Depth Review AI-Based Models for Phishing Detection

Project Code :TCPGPY1827

Objective

This project aims to evaluate and compare various AI-based models for phishing detection, focusing on the effectiveness of CatBoost and Neural Oblivious Decision Ensemble (NODE). It addresses phishing threats by leveraging advanced machine learning techniques for accurate detection using structured data. Key goals include benchmarking models like RF, KNN, SVM, LightGBM, CatBoost, and NODE; identifying limitations of traditional classifiers; and proposing CatBoost and NODE for their strengths in handling categorical features and complex patterns. Models will be tested on a phishing dataset using metrics like accuracy, precision, recall, and F1-score, offering insights for real-world cybersecurity applications.

Abstract

Phishing attacks remain a significant threat to cybersecurity, exploiting user trust to steal sensitive data such as credentials, financial information, and personal identifiers. As these attacks grow in sophistication and frequency, traditional rule-based detection techniques prove inadequate. This study presents a comprehensive review and evaluation of AI-based models for phishing detection, focusing on six prominent machine learning and ensemble techniques: Random Forest (RF), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), CatBoost, LightGBM, and Neural Oblivious Decision Ensembles (NODE). Each model is assessed based on accuracy, precision, recall, F1-score, and computational efficiency using a benchmark phishing dataset.

While all models demonstrate competent performance, the NODE and CatBoost algorithms exhibit superior capabilities in capturing complex non-linear relationships inherent in phishing data. CatBoost, with its gradient boosting framework and native support for categorical features, achieves high classification performance with minimal preprocessing. NODE, a neural network variant optimized for tabular data, leverages feature masks and oblivious trees to outperform traditional deep learning models in structured datasets. Our comparative analysis reveals that NODE not only matches or exceeds tree-based ensemble models in accuracy but also provides enhanced generalization and interpretability, making it highly suitable for real-time phishing detection systems.

This review not only benchmarks widely-used models but also introduces the potential of NODE and CatBoost as robust alternatives to conventional approaches. The insights derived aim to guide cybersecurity practitioners and researchers in selecting and deploying effective AI-driven solutions for phishing mitigation in dynamic digital environments.

Keywords: Phishing Detection, Machine Learning, CatBoost, NODE, Random Forest, SVM, KNN, LightGBM.

NOTE: Without the concern of our team, please don't submit to the college. This Abstract varies based on student requirements.