A Multiplier-Free Discrete Cosine Transform Architecture Using Approximate Full Adder and Subtractor

Project Code :TVMAFE780

Objective

This study presents a high-performance hardware accelerator tailored for the implementation of the 2D 8×8 Discrete Cosine Transform (DCT) and its inverse (IDCT). The proposed design focuses on optimizing the dataflow of the 8-point 1D DCT/IDCT to effectively exploit the inherent properties of image and video processing workloads. To enhance computational efficiency, an 8-stage pipelined architecture is employed, enabling parallel execution and balanced distribution of arithmetic operations across multiple clock cycles

Abstract

This study presents a high-performance hardware accelerator tailored for the implementation of the 2D 8×8 Discrete Cosine Transform (DCT) and its inverse (IDCT). The proposed design focuses on optimizing the dataflow of the 8-point 1D DCT/IDCT to effectively exploit the inherent properties of image and video processing workloads. To enhance computational efficiency, an 8-stage pipelined architecture is employed, enabling parallel execution and balanced distribution of arithmetic operations across multiple clock cycles. This approach significantly improves processing speed while reducing per-cycle computational complexity. A notable feature of the accelerator is the use of a multiplication-free approximation scheme for DCT coefficients. The design replaces conventional multipliers with adders and shifters, supported by fixed-point arithmetic and Canonical Signed Digit (CSD) representations. These techniques collectively minimize hardware complexity, power consumption, and latency without compromising transformation accuracy. The proposed architecture has been implemented and validated on an FPGA platform using an Artix-7 XC7VX330T device. The implementation achieves a maximum operating frequency of 288 MHz and delivers a throughput of 558 M pixels per second. It is capable of real-time processing of Full-HD video at up to 269 frames per second. Moreover, each 8×8 block of the 2D DCT/IDCT is processed within only 33 clock cycles, demonstrating the suitability of the proposed accelerator for high-speed image and video compression applications.

Key words:

DCT, IDCT, parallel transpose, hardware accelerator

NOTE: Without the concern of our team, please don't submit to the college. This Abstract varies based on student requirements.

Block Diagram

Specifications

TOOLS USED

xilinx vivado

matlab

Learning Outcomes

Learning Outcomes:

  • Basics of Digital Electronics
  • VLSI design Flow
  • Introduction to Verilog Coding
  • Different modelling styles in Verilog

o   Data Flow modelling

o   Structural modelling

o   Behavioural modelling

o   Mixed level modelling

  • Introduction to DCT/IDCT design
  • Understanding of Hardware Accelerators:
  • In-Depth Knowledge of the Loeffler Algorithm:
  • FPGA Design and Implementation:
  • Hardware Resource Management:

·         Application to Real-Time Systems:

·         Xilinx ISE 14.7/Xilinx Vivado for design and simulation

·         Generation of Netlist

·         Solution providing for real time problems

·         Project Development Skills:

o   Problem Analysis Skills

o   Problem Solving Skills

o   Logical Skills

o   Designing Skills

o   Testing Skills

o   Debugging Skills

o   Presentation Skills

o   Thesis Writing Skills

Demo Video

mail-banner
call-banner
contact-banner
Request Video