Explainable RAG

Explainable Retrieval Augmented Generation (X-RAG)

Explainable RAG: Understanding How LLMs Use Context

Motivation and Why It Matters

RAG systems use retrieved documents for context but don't tell how each retrieved document chunk affects their responses, so users can't verify faithfulness. Using a RAG system with sources we control, this project asks the Generation LLM to use only the retrieved context for its response, allowing us to evaluate how the LLMs use the context they get to provide a response.

How It Works

Retrieve top-3 chunks using FAISS (Facebook AI Similarity Search)
Generate dense embeddings using all-MiniLM-L6-v2 from sentence-transformers
Generate responses using OpenAI's gpt-4.1-mini
Attribute sentences to sources via cosine similarity
Compute token-level saliency via leave-one-out embedding shifts (all-MiniLM-L6-v2) to reveal reliance on the source
Flag low-support sentences for hallucination detection
Score answer quality with Ragas (faithfulness, answer relevancy, context precision) using gpt-4o-mini as the LLM judge

Context

The system was tested using four uploaded documents. If a question falls outside their scope, the model is designed to admit it cannot answer, this is part of the controlled experiment using a restricted prompting template.

The current documents include:

An article on GradCAM and the original Grad-CAM paper (images and formulas removed)
A Hacker’s Mind — A personal reflection on Bruce Schneier’s book
A reflection on Emerging Techniques in XAI by Mathew et al.

The system can ingest any .txt or .md file placed in the documents folder. The full implementation is available in my GitHub repository: siliconshells.

Some Questions

What is Artificial Intelligence?
Tell me about A Hacker's Mind.
What is XAI?
What is GradCAM?
What is a CNN?
What is mechanistic interpretability?

Retrieved Documents

This section shows which stored text chunks were retrieved as context for the LLM’s answer generation. The system retrieves the top three chunks and displays the file each chunk originated from.

Token-Level Saliency

Token-Level Saliency measures how much each individual token (word) contributes to the answer's meaning. It uses a leave-one-out approach: each token is removed from the answer one at a time, and the shift in the sentence embedding measures how much that token mattered.

Ragas Evaluation

The first four tabs explain how the answer was constructed. This tab grades how good the answer is, using Ragas's reference-free LLM-judged metrics. After the answer renders, the page asynchronously requests scores from the server (so the answer never waits on them).

Three scores are computed by an LLM judge (gpt-4o-mini):

Faithfulness — the LLM extracts atomic claims from the answer and checks each one against the retrieved chunks. The score is the share of claims that are supported. A direct, LLM-judged version of the heuristic flagging shown in the next tab.
Answer Relevancy — the LLM reverse-engineers candidate questions that the answer would address, embeds them, and compares to the original question via cosine similarity. Penalizes answers that are off-topic or evasive.
Context Precision — for each retrieved chunk, the LLM judges whether it is actually useful for answering the question. The score reflects retrieval quality rather than generation quality. The reference-free variant is used so any user-typed question is scorable.

Scores are cached in Redis for 24 hours per question, so repeating a question costs zero API calls. Reference-based metrics like context_recall are excluded from the live tab because they require curated ground-truth answers.

Low-Support Sentences (Possible Hallucinations)

Lists sentences with very low attribution scores—likely not sourced from retrieved documents and possibly hallucinated by the model.

Technical Features

FAISS retrieval
Sentence attribution
Token saliency
Hallucination detection
Ragas evaluation (faithfulness, answer relevancy, context precision)
Flask + Gunicorn + Nginx
Redis caching
Docker + docker-compose
CI tests

Explainable Retrieval Augmented Generation (X-RAG)

Ask a question on