Robust Sentiment and Semantic Analysis of Small and Medium-Sized News Headline Datasets: A Study on Sports, Science, and Agricultural Domain

Project Code :TCMAPY2414

Objective

The primary objective of this project is to develop a robust system for sentiment and semantic analysis applied to small and medium-sized news headline datasets across the sports, science, and agricultural domains. The project aims to achieve accurate classification of news headlines into predefined categories such as World, Sports, Business, and Sci/Tech, using machine learning models like DistilBERT + LightGBM and SBERT + k-NN. 

Abstract

 

This study focuses on robust sentiment and semantic analysis applied to small and medium-sized news headline datasets, specifically within the sports, science, and agricultural domains. Given the growing need for effective content classification, the task of classifying news headlines into predefined categories such as World, Sports, Business, and Sci/Tech has gained considerable attention. In this work, two distinct machine learning approaches are explored: a combination of DistilBERT for textual feature extraction with LightGBM for classification and SBERT paired with k-NN for a more fine-grained semantic understanding. Both models are evaluated based on accuracy, with SBERT + k-NN achieving an accuracy of 92.14% and DistilBERT + LightGBM achieving 91.55%. The results indicate that while both models perform effectively, the SBERT + k-NN approach demonstrates slightly better performance, highlighting its potential for nuanced understanding of news headlines across various domains. The findings underscore the importance of leveraging pre-trained transformer models like SBERT for improved sentiment and semantic analysis, especially in the context of smaller datasets where model generalization is critical. This study provides insights into the application of transformer-based models in news classification, contributing to the development of more accurate and scalable systems for domain-specific news analytics.

 

Keywords: Sentiment Analysis, Semantic Analysis, News Classification, Small Datasets, Sports Domain, Science Domain, Agricultural Domain, DistilBERT, LightGBM, SBERT.

NOTE: Without the concern of our team, please don't submit to the college. This Abstract varies based on student requirements.

Block Diagram

Specifications

SOFTWARE REQUIREMENS

 

Operating System                               :  Windows 7/8/10

Server-side Script                               :  HTML,Css,JS

Programming Language                     :  Python

Libraries                                             : Flask, Pandas, Sklearn,Tensorflow                                                                                        NumPy, Seaborn, Matplotlib

IDE/Workbench                                 :  VSCode

Technology                                         :  Python 3.8+

Server Deployment                             :  Xampp Server

Database                                             :  MySQL .   

HARDWARE REQUIREMENTS

 

Processor                                - I5/Intel Processor

RAM                                       - 8GB +(min)

Hard Disk                                - 128 +GB

Key Board                               - Standard Windows Keyboard

Mouse                                      - Two or Three Button Mouse

Monitor                                    - Any

Demo Video

mail-banner
call-banner
contact-banner
Request Video