Periodic Binary Heads with Two?Stage Training for Efficient Multi?Dataset Text Classification

Project Code :TCMAPY2391

Objective

The objective of this project is to develop an efficient multi-dataset text classification system using advanced deep learning and binary classification techniques. The system applies Spectral Binary Head (SBH) and Wavelet Binary Head (WBH) algorithms with a two-stage training strategy on the AG News dataset for topic classification. It also integrates frozen and fine-tuned BERT models for Papluca and DBpedia datasets, and a BiLSTM with Attention mechanism for IMDb sentiment analysis. The project aims to achieve high accuracy, reduced parameter overhead, efficient model deployment, and real-time prediction through a Flask-based web application.

Abstract

This project develops a text classification system that integrates multiple classification architectures within a web-based interface. Four benchmark datasets are used: AG News for topic classification with four categories, Papluca for language identification across twenty languages, IMDb for binary sentiment analysis, and DBpedia for fourteen-class topic classification. The implemented models include Spectral Binary Head (SBH) and Wavelet Binary Head (WBH) which apply periodic functions to binarize classification weights while maintaining gradient flow via straight-through estimation. Both models follow a two-stage training strategy where a full-precision warm-up MLP is trained first, followed by the binary head. For comparison, a frozen BERT backbone with multi-layer perceptron heads is applied to Papluca, a fine-tuned BERT model is used for DBpedia, and a bidirectional LSTM with attention mechanism is built for IMDb. Evaluation metrics comprise accuracy, macro-F1, micro-F1, precision, recall, confusion matrices, and ROC-AUC where applicable. Experimental results show WBH achieves 88.83% accuracy on AG News, the enhanced BERT head reaches 97.71% on Papluca, BiLSTM with attention attains 89.80% on IMDb, and fine-tuned BERT obtains 99.17% on DBpedia. A Flask-based web application provides user registration, login, text input, model selection, prediction display, and logout functionality. All trained models are saved as PyTorch state dictionaries alongside vocabulary and configuration files for later loading and single-text prediction. The system demonstrates that periodic binary classification heads offer competitive performance with reduced parameter overhead compared to full-precision equivalents.

Keywords: text classification, periodic binary head, spectral binary head, wavelet binary head, two-stage training, BERT, frozen backbone, fine-tuning, BiLSTM, attention mechanism, AG News, Papluca, IMDb, DBpedia, Flask

NOTE: Without the concern of our team, please don't submit to the college. This Abstract varies based on student requirements.

Block Diagram

Specifications

Hardware Requirements

The following hardware specifications are recommended for developing, training, and running the text classification system (including periodic binary heads, BERT models, and BiLSTM with attention). Training on large datasets (e.g., DBpedia with 560,000 samples) benefits from a GPU, but CPU‑only execution is possible for inference and smaller datasets.

Component

Minimum Requirement

Recommended Requirement

Processor

Intel Core i3 (10th gen or newer) or AMD equivalent

Intel Core i7 / i9 (12th gen+) or AMD Ryzen 7/9

RAM

8 GB

16 GB or higher

Hard Disk

160 GB (SSD preferred)

512 GB NVMe SSD

GPU (Optional)

None (CPU only)

NVIDIA GPU with 8 GB VRAM (e.g., GTX 1070, RTX 2070, RTX 3060, Tesla P100)

Keyboard

Standard Windows keyboard

Standard USB/wireless keyboard

Mouse

Two or three button mouse

Optical mouse

Monitor

SVGA (1024×768)

Full HD (1920×1080) or higher

Software Requirements

Below is the updated software requirements specification aligned with the current project (text classification using SBH, WBH, BERT, BiLSTM with web interface).

Category

Requirement

Operating System

Windows 10/11, Linux (Ubuntu 20.04+), or macOS (11+)

Front‑end Languages

HTML5, CSS3, JavaScript

Back‑end Language

Python 3.8 or higher

Web Framework

Flask 2.0+

Deep Learning Framework

PyTorch 1.10+ (with CUDA support optional)

Machine Learning Libraries

scikit‑learn 1.2+

NLP & Transformers

Hugging Face Transformers 4.30+, Datasets 2.14+

Data Handling

pandas, numpy

Visualisation

matplotlib, seaborn

Tokenisation

BertTokenizer (monolingual & multilingual)

Development Environment

VS Code, Jupyter Notebook, or Kaggle Notebook

Database

Not required (user credentials stored in‑memory dictionary or simple JSON file; sessions managed by Flask)

Version Control

Git (optional)

Browser Compatibility

Google Chrome, Mozilla Firefox, Microsoft Edge (latest versions)

Demo Video

Request Video

Python

Artificial Intelligence
Data Science
Deep Learning
BlockChain

Android

Data Science
Artificial Intelligence

Java

Data Mining
Big Data
Artificial Intelligence
Data Science