Binary multilingual machine-generated text detection

Project Code :TCMAPY1383

Objective

Based on text dataset we trying to predict, if the text is AI generated or Human generated text

Abstract

ABSTRACT

With the rapid advancement of natural language generation technologies, distinguishing machine-generated text from human-written content has become increasingly challenging yet essential. This project aims to develop a robust, multilingual system capable of accurately identifying machine-generated text across languages, including English, Indonesian, German, and Russian. Utilizing a substantial dataset of 674,083 training samples and 288,894 development samples characterized by attributes such as source, sub-source, language, generation model, label, and text we explore the efficacy of various machine learning and deep learning algorithms.

To achieve reliable classification, the system integrates Random Forest, Long Short-Term Memory (LSTM) networks, Bidirectional Encoder Representations from Transformers (BERT), Decision Tree, and Logistic Regression models. Each model is rigorously evaluated on its ability to handle multilingual data, focusing on both accuracy and computational efficiency. This project combines traditional machine learning with cutting-edge deep learning techniques, contributing a valuable tool for digital content verification by enabling precise differentiation between human-authored and machine-generated text. The proposed system supports a wide range of applications in content validation and enhances trust in digital information across multiple languages and contexts.

Keywords: Multilingual text analysis, Random Forest, Long Short-Term Memory (LSTM) networks, Bidirectional Encoder Representations from Transformers (BERT), Decision Tree, Logistic Regression and Natural Language Processing (NLP).

NOTE: Without the concern of our team, please don't submit to the college. This Abstract varies based on student requirements.

Block Diagram

Specifications

Hardware Requirements

 

Processor                                 - I3/Intel Processor

Hard Disk                                - 160GB

Key Board                               - Standard Windows Keyboard

Mouse                                     - Two or Three Button Mouse

Monitor                                   - SVGA

RAM                                       - 8GB

 

Software Requirements

β€’      Operating System                    :  Windows 7/8/10

β€’      Programming Language         :  Python

β€’      Libraries                                  :  Pandas, Numpy, scikit-learn.

β€’      IDE/Workbench                      :  Visual Studio Code.

 

Demo Video