Exploring Text Similarity in Human and AI-Generated Scientific Abstracts: A Comprehensive Analysis

Project Code :TCMAPY1876

Objective

The objective of this project is to explore and analyze the text similarity between human-written and AI-generated scientific abstracts. The primary goal is to build a robust classification model that accurately distinguishes between human-generated and AI-generated abstracts using deep learning techniques. The project leverages three advanced algorithms: RoBERTa+BiLSTM, DistilBERT, and BERT+CNN, to perform this classification. By examining semantic and syntactic similarities, the project aims to gain deeper insights into the structural and linguistic characteristics that differentiate human and AI-generated content. Ultimately, the project aims to contribute to the development of AI tools that can assist in academic writing and validate the quality of AI-generated texts.

Abstract

The rise of artificial intelligence (AI) and its ability to generate human-like text has sparked significant interest in understanding the similarities between human-written and AI-generated content. This project explores the text similarity between human and AI-generated scientific abstracts through a comprehensive analysis. The research focuses on comparing the quality and structure of abstracts written by humans and AI models using three deep learning-based algorithms: RoBERTa+BiLSTM, DistilBERT, and BERT+CNN. These models were trained to classify abstracts into two categories: Human Abstract and AI Abstract. By utilizing state-of-the-art natural language processing techniques, the models are evaluated on their ability to capture semantic and syntactic similarities between the two types of abstracts. The project highlights the effectiveness of hybrid models combining pre-trained transformer models with recurrent neural networks and convolutional networks for this classification task. Performance metrics such as accuracy, precision, recall, and F1-score are used to assess the models, with the goal of providing deeper insights into how AI-generated text compares to human writing in scientific contexts. The results of this analysis can offer valuable implications for the development of AI-driven tools in academic writing and contribute to the ongoing discussion on the authenticity and reliability of AI-generated content in scholarly research.

Keywords: AI-generated text, Text similarity, Scientific abstracts, RoBERTa+BiLSTM, DistilBERT, BERT+CNN, Natural Language Processing, Deep Learning, Text classification, Hybrid models.

NOTE: Without the concern of our team, please don't submit to the college. This Abstract varies based on student requirements.

Block Diagram

Specifications

SOFTWARE REQUIREMENS

Operating System                               :  Windows 7/8/10

Server side Script                                :  html,css,js

Programming Language                     :  Python

Libraries                                              : Django, Pandas, Torch, Keras, Sklearn,Numpy , Seaborn

IDE/Workbench                                  :  VSCode

Server Deployment                             :  Xampp Server

Database                                             :  SQLite  

HARDWARE REQUIREMENTS

Processor                                   - I3/Intel Processor

RAM                                       - 8GB (min)

Hard Disk                                - 128 GB

Key Board                               - Standard Windows Keyboard

Mouse                                      - Two or Three Button Mouse

Monitor                                    - Any

Demo Video