The objective of this project is to develop an efficient multi-dataset text classification system using advanced deep learning and binary classification techniques. The system applies Spectral Binary Head (SBH) and Wavelet Binary Head (WBH) algorithms with a two-stage training strategy on the AG News dataset for topic classification. It also integrates frozen and fine-tuned BERT models for Papluca and DBpedia datasets, and a BiLSTM with Attention mechanism for IMDb sentiment analysis. The project aims to achieve high accuracy, reduced parameter overhead, efficient model deployment, and real-time prediction through a Flask-based web application.
This project develops a text classification system that integrates multiple classification architectures within a web-based interface. Four benchmark datasets are used: AG News for topic classification with four categories, Papluca for language identification across twenty languages, IMDb for binary sentiment analysis, and DBpedia for fourteen-class topic classification. The implemented models include Spectral Binary Head (SBH) and Wavelet Binary Head (WBH) which apply periodic functions to binarize classification weights while maintaining gradient flow via straight-through estimation. Both models follow a two-stage training strategy where a full-precision warm-up MLP is trained first, followed by the binary head. For comparison, a frozen BERT backbone with multi-layer perceptron heads is applied to Papluca, a fine-tuned BERT model is used for DBpedia, and a bidirectional LSTM with attention mechanism is built for IMDb. Evaluation metrics comprise accuracy, macro-F1, micro-F1, precision, recall, confusion matrices, and ROC-AUC where applicable. Experimental results show WBH achieves 88.83% accuracy on AG News, the enhanced BERT head reaches 97.71% on Papluca, BiLSTM with attention attains 89.80% on IMDb, and fine-tuned BERT obtains 99.17% on DBpedia. A Flask-based web application provides user registration, login, text input, model selection, prediction display, and logout functionality. All trained models are saved as PyTorch state dictionaries alongside vocabulary and configuration files for later loading and single-text prediction. The system demonstrates that periodic binary classification heads offer competitive performance with reduced parameter overhead compared to full-precision equivalents.
Keywords: text classification, periodic binary head, spectral binary head, wavelet binary head, two-stage training, BERT, frozen backbone, fine-tuning, BiLSTM, attention mechanism, AG News, Papluca, IMDb, DBpedia, Flask
NOTE: Without the concern of our team, please don't submit to the college. This Abstract varies based on student requirements.

The following hardware specifications are recommended for developing, training, and running the text classification system (including periodic binary heads, BERT models, and BiLSTM with attention). Training on large datasets (e.g., DBpedia with 560,000 samples) benefits from a GPU, but CPU‑only execution is possible for inference and smaller datasets.
Component
Minimum Requirement
Recommended Requirement
Processor
Intel Core i3 (10th gen or newer) or AMD equivalent
Intel Core i7 / i9 (12th gen+) or AMD Ryzen 7/9
RAM
8 GB
16 GB or higher
Hard Disk
160 GB (SSD preferred)
512 GB NVMe SSD
GPU (Optional)
None (CPU only)
NVIDIA GPU with 8 GB VRAM (e.g., GTX 1070, RTX 2070, RTX 3060, Tesla P100)
Keyboard
Standard Windows keyboard
Standard USB/wireless keyboard
Mouse
Two or three button mouse
Optical mouse
Monitor
SVGA (1024×768)
Full HD (1920×1080) or higher
Below is the updated software requirements specification aligned with the current project (text classification using SBH, WBH, BERT, BiLSTM with web interface).
Category
Requirement
Operating System
Windows 10/11, Linux (Ubuntu 20.04+), or macOS (11+)
Front‑end Languages
HTML5, CSS3, JavaScript
Back‑end Language
Python 3.8 or higher
Web Framework
Flask 2.0+
Deep Learning Framework
PyTorch 1.10+ (with CUDA support optional)
Machine Learning Libraries
scikit‑learn 1.2+
NLP & Transformers
Hugging Face Transformers 4.30+, Datasets 2.14+
Data Handling
pandas, numpy
Visualisation
matplotlib, seaborn
Tokenisation
BertTokenizer (monolingual & multilingual)
Development Environment
VS Code, Jupyter Notebook, or Kaggle Notebook
Database
Not required (user credentials stored in‑memory dictionary or simple JSON file; sessions managed by Flask)
Version Control
Git (optional)
Browser Compatibility
Google Chrome, Mozilla Firefox, Microsoft Edge (latest versions)