Paraphrase Identification

Project Code :TCMAPY1736

Objective

This project investigates deep learning models such as CNNs, RNNs, BERT, and RoBERTa for classifying sentence pairs as paraphrases or non-paraphrases. The frontend will be developed using standard web technologies, including HTML, CSS, and JavaScript.

Abstract

Paraphrase identification is a fundamental task in Natural Language Processing (NLP) that aims to determine whether two sentences express the same meaning using different wording. This classification-based project explores various deep learning architectures to classify sentence pairs into two categories: paraphrase or non-paraphrase. Initially, existing models such as Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and BERT are employed to benchmark performance. Subsequently, more advanced transformer-based models like RoBERTa are proposed to enhance accuracy and semantic understanding. These models leverage pre-trained contextual embeddings and contrastive learning for improved textual similarity detection. The project uses benchmark datasets suitable for sentence-pair classification to train and evaluate the models. While this work is academic in nature, it simulates accurate challenges such as ambiguity, syntax variation, and context preservation. The goal is to understand the strengths and limitations of various models in paraphrase detection and to compare their performance comprehensively. This exploration can contribute to applications such as question-answer matching, duplicate detection, and content summarization systems.

Keywords: Paraphrase Identification, NLP, Sentence Similarity, Classification, Deep Learning,  RoBERTa, BERT, CNN, RNN

NOTE: Without the concern of our team, please don't submit to the college. This Abstract varies based on student requirements.

Block Diagram

Specifications

H/W CONFIGURATION:

Processor                                 - I3/Intel Processor

Hard Disk                                - 160GB

Key Board                              - Standard Windows Keyboard

Mouse                                     - Two or Three Button Mouse

Monitor                                   - SVGA

RAM                                       - 8GB

S/W CONFIGURATION:

β€’      Operating System                   :  Windows 7/8/10

β€’      Server side Script                    :  HTML, CSS, Bootstrap & JS

β€’      Programming Language         :  Python

β€’      Libraries                                  :  Flask, Pandas, MySQL. Connector, Scikit-Learn

β€’      IDE/Workbench                      :  VS Code

β€’      Technology                             :  Python 3.8+

β€’      Server Deployment                 :  Xampp Server

Demo Video

mail-banner
call-banner
contact-banner
Request Video