SAMTAlign A MultiStage Transformer for InstructionBased Captioning and Editing

Project Code :TCMAPY2297

Objective

This project presents a Django-based web application featuring a multi-stage transformer pipeline for instruction-based image captioning and editing. It utilizes BLIP for caption generation, GroundingDINO for object localization, SAM for high-fidelity object masks, and Stable Diffusion Inpainting for context-aware image modifications. The system enables users to upload images, generate captions, and edit them based on text prompts. Django handles user registration, session management, and persistent storage, offering an intuitive tool for automated image understanding and manipulation.

Abstract

This project presents a Django-based web application implementing a multi-stage transformer pipeline for instruction-based image captioning and semantic editing. The system integrates state-of-the-art AI models to provide an intuitive user interface for automated image understanding and manipulation. Upon image upload, the BLIP (Bootstrapping Language-Image Pre-training) model generates descriptive captions, which are stored in user history. For editing, the application employs GroundingDINO for precise object localization based on source and target text prompts. The Segment Anything Model (SAM) generates high-fidelity masks for identified objects. Finally, a Stable Diffusion Inpainting pipeline performs context-aware image modifications, seamlessly replacing, removing, or altering objects as per user instructions. The Django framework manages user registration, session handling, and persistent storage of original and edited images. This fusion of large language models and diffusion techniques within a web framework demonstrates a powerful, accessible tool for interactive and automated image content creation.

Keywords: Django, BLIP, Image Captioning, GroundingDINO, SAM, Stable Diffusion, Inpainting, Transformers, Image Editing, Deep Learning, Computer Vision

NOTE: Without the concern of our team, please don't submit to the college. This Abstract varies based on student requirements.

Block Diagram

Specifications

1.      SOFTWARE REQUIREMENS

Operating System                               :  Windows 7/8/10

Server-side Script                               :  HTML, CSS, Bootstrap & JS

Programming Language                     :  Python

Libraries                                             : Django, NumPy , Seaborn, Matplotlib Transformers, pytorch, diffusers, scipy

IDE/Workbench                                  :  VSCode

Technology                                         :  Python 3.8+

Server Deployment                             :  Xampp Server

Database                                             :  MySQL    

 

HARDWARE REQUIREMENTS

Processor                                  - I5/Intel Processor+Gpu

RAM                                       - 8GB+ (min)

Hard Disk                                - 128 GB+

Key Board                               - Standard Windows Keyboard

Mouse                                      - Two or Three Button Mouse

Monitor                                    - Any

Demo Video

mail-banner
call-banner
contact-banner
Request Video