Objective of Project: Primary Goals • Build a scalable system for extracting deep knowledge from large PDF sets. • Use RAG for dynamic retrieval and CAG for caching to improve speed, accuracy, and efficiency. • Overcome current limitations like slow response, poor context handling, and high computation costs. Specific Aims • PDF Parsing & Chunking • Develop methods to extract and structure PDF content for embedding and semantic search. RAG + CAG Integration • Combine retrieval and caching to boost query speed and reduce resource usage. • Expected Outcome • A high-performance, low-overhead framework for knowledge extraction from unstructured PDFs.
With the digital transformation of information that happened so quickly, there has been an accumulation of PDF documents through which knowledge flows. In this project, a very sturdy PDF Knowledge Extraction System was presented integrating the RAG and CAG models for intelligent and scalable document querying. It permits users to upload PDF files and then these files will be automatically parsed and segmented into chunks of content. Considered two parallel embedding pipelines: one uses Google Gemini 1.5 Flash API to generate high-quality embeddings for RAG model and the other uses HuggingFace models to cache in CAG framework.
Embeddings from both pipelines are kept in two different vector stores using
ChromaDB, which guarantees rapid retrieval and response generation. When a
query goes in, the system looks into the cache to see if any results are there.
In case of a cache hit, an appropriate answer is returned immediately with just
some milliseconds of latency. A cache miss presents where the query will be
processed via RAG as well as be cached for later requests. This hybridization
is mainly for optimized performance since RAG contributes its content awareness
while CAG supports excellent efficiency, suitable for knowledge-agent type
applications in a wide array of domains.
NOTE: Without the concern of our team, please don't submit to the college. This Abstract varies based on student requirements.
