A Document Retrieval System Using Retrieval-Augmented Generation
DOI:
https://doi.org/10.64751/Abstract
In the age of exponential information growth, effective document retrieval has become essential for knowledge-based systems. This project presents a Document Retrieval System using RetrievalAugmented Generation (RAG), an architecture that combines traditional retrieval mechanisms with the generative power of transformer-based language models. RAG enhances the retrieval process by dynamically integrating external documents into the response-generation pipeline, allowing the system to produce more accurate and contextually relevant answers. The system first retrieves relevant passages from a pre-indexed document corpus using vector-similarity search (FAISS), and the retrieved documents are then passed to a generative language model that synthesises the final output based on both the input query and the retrieved content. This hybrid approach ensures factual grounding while maintaining the flexibility of generative models, and it reduces hallucinations by conditioning generation on retrieved context. The solution is implemented in Python using LangChain, FAISS, and transformer-based language models, and is particularly effective for question answering, summarisation, and knowledge-base augmentation. Testing and evaluation across document processing, semantic retrieval, response generation, scalability, and reliability confirmed that the system retrieves relevant information accurately and generates context-aware responses. The approach demonstrates improved performance over traditional retrieval-only or generation-only models, offering a scalable and intelligent document-access system suitable for enterprise, academic, and personal use.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.






