Hierarchical ViT And Dynamic Window Shift Unit and for Remote Sensing Image Scene Classification

Project Code :TCMAPY1909

Objective

The primary objective of this project is to develop a robust system for remote sensing image classification by combining hierarchical Vision Transformers with EfficientNet models. The key goals include designing and implementing a hybrid model that integrates Vision Transformers and EfficientNet to enhance classification accuracy. Additionally, a Dynamic Window Shift Unit will be incorporated to effectively manage diverse image sizes and spatial resolutions. The project will also evaluate the model’s performance across multiple metrics, such as accuracy, precision, and computational efficiency. Furthermore, the system's scalability and adaptability will be analyzed to assess its capability in handling large datasets with multiple image classes. Finally, the project aims to enhance the classification system's ability to accurately identify various scene types in remote sensing data.

Abstract

Remote sensing image classification is a critical task for analyzing geographical and environmental data. This project proposes a novel hybrid framework that combines Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) to classify remote sensing images into 45 distinct classes a challenging benchmark due to high inter-class similarity and diverse scene complexity. To tackle this, the architecture integrates Inception V3 and EfficientNet, two state-of-the-art CNN models known for their efficiency and strong feature extraction capabilities, to extract rich spatial features from input images. These features are further refined using a Hierarchical Vision Transformer framework, specifically the Swin Transformer, which effectively captures long-range dependencies and hierarchical representations through shifted windows. To enhance adaptability across varying image resolutions and scene types, a Dynamic Window Shift Unit is incorporated, improving the model’s robustness and spatial flexibility. This synergistic integration of CNN-based and Transformer-based modules results in a highly accurate, scalable, and robust solution for remote sensing image classification, with promising applications in environmental monitoring, urban planning, and disaster response.

Keywords: Remote Sensing, Image Classification, Swin Transformer, EfficientNet, Vision Transformers, Dynamic Window Shift, Hierarchical Model, Deep Learning, Scene Classification, Environmental Monitoring.

NOTE: Without the concern of our team, please don't submit to the college. This Abstract varies based on student requirements.

Block Diagram

Specifications

Hardware Requirements

Processor                                 - I3/Intel Processor

Hard Disk                                - 160GB

Key Board                              - Standard Windows Keyboard

Mouse                                     - Two or Three Button Mouse

Monitor                                   - SVGA

RAM                                       - 8GB

 

Software Requirements:

Operating System                   :  Windows 7/8/10

Server side Script                    :  HTML, CSS, Bootstrap & JS

Programming Language         :  Python

Libraries                                  :  Flask/Django, Pandas, Mysql.connector, Os, Smtplib, Numpy

IDE/Workbench                      :  PyCharm

Technology                             :  Python 3.6+

Server Deployment                 :  Xampp Server

Database                                 :  MySQL

Demo Video