The primary objective of this project is to develop an AI-driven data pipeline system capable of automating schema evolution and data processing tasks. The system aims to leverage advanced reasoning techniques to detect schema changes and generate necessary updates without manual intervention. It seeks to enhance the efficiency of data ingestion, validation, and transformation by integrating AI-based tools like large language models. The project also aims to improve data quality by ensuring consistent schema validation and seamless updates throughout the pipeline. Another key objective is to provide scalability, allowing the system to adapt to increasing data complexity and volume. Additionally, the system is designed to notify stakeholders about schema changes and updates, ensuring transparency and real-time monitoring. Ultimately, the project intends to create a more reliable, efficient, and adaptive data pipeline infrastructure.
The system applies AI-based reason to automate the data ingestion process and schema evolution process since it needs automatic processing of schema changes. It starts with the data entry by data batch ingestion into the Bronze Layer which is the raw landing zone. The data is subjected to validation tests to convey conformity of the data to the standards of the necessary schema. The system isolates the batch in case of a mismatch of the schema. Semantic analysis is done by the Reasoning Agent to generate DDL scripts that facilitate the management of schema evolution. The Golden Layer takes the data after a successful validation since the system undertakes the quality control and schema validation before undertaking DDL and creating business logic. The system utilizes AI advanced reasoning with the use of the LLM technology to deal with schema amendments and process data at the Bronze Silver and Gold levels of operation. The Gold Layer holds the processed data that is the final curated data that has both the updated views and materialized data after data validation and quality checks. The system integrates both the human feedback and the notification of schemes change to the stakeholders that make precise and reliable data processing channels.
Keywords: Agentic AI, Cloud Data Pipelines, Schema Evolution, Automated Data Management, AI-Driven Reasoning, Data Validation, Semantic Analysis, Data Transformation.
NOTE: Without the concern of our team, please don't submit to the college. This Abstract varies based on student requirements.

HARDWARE REQUIREMENTS:
SOFTWARE SYSTEM CONFIGURATION: