Explainable RAG

Explainable Retrieval Augmented Generation (X-RAG)

Explainable RAG: Understanding How LLMs Use Context

Motivation and Why It Matters

RAG systems use retrieved documents for context but don't tell how each retrieved document chunk affects their responses, so users can't verify faithfulness. Using a RAG system with sources we control, this project asks the Generation LLM to use only the retrieved context for its response, allowing us to evaluate how the LLMs use the context they get to provide a response.

How It Works

Retrieve top-3 chunks using FAISS (Facebook AI Similarity Search)
Generate dense embeddings using all-MiniLM-L6-v2 from sentence-transformers
Generate responses using OpenAI's `gpt-4.1-mini`
Attribute sentences to sources via cosine similarity
Compute token-level saliency (E5 embeddings) to reveal reliance on the source
Flag low-support sentences for hallucination detection

Context

The system was tested using four uploaded documents. If a question falls outside their scope, the model is designed to admit it cannot answer, this is part of the controlled experiment using a restricted prompting template.

The current documents include:

An article on GradCAM and the original Grad-CAM paper (images and formulas removed)
A Hacker’s Mind — A personal reflection on Bruce Schneier’s book
A reflection on Emerging Techniques in XAI by Mathew et al.

The system can ingest any .txt or .md file placed in the documents folder. The full implementation is available in my GitHub repository: siliconshells.

Some Questions

What is Artificial Intelligence?
Tell me about A Hacker's Mind.
What is XAI?
What is GradCAM?
What is a CNN?
What is mechanistic interpretability?

Retrieved Documents

This section shows which stored text chunks were retrieved as context for the LLM’s answer generation. The system retrieves the top three chunks and displays the file each chunk originated from.

Token-Level Saliency

Token-Level Saliency measures how much each individual token (word) contributes to the model’s prediction. It compares the question and the answer to determine influence at a microscopic level.

Low-Support Sentences (Possible Hallucinations)

Lists sentences with very low attribution scores—likely not sourced from retrieved documents and possibly hallucinated by the model.

Technical Features

FAISS retrieval
Sentence attribution
Token saliency
Hallucination detection
Flask + Gunicorn + Nginx
Redis caching
Docker + docker-compose
CI tests

Explainable Retrieval Augmented Generation (X-RAG)

Ask a Question on: