RAG systems use retrieved documents for context but don't tell how each retrieved document chunk affects their responses, so users can't verify faithfulness. Using a RAG system with sources we control, this project asks the Generation LLM to use only the retrieved context for its response, allowing us to evaluate how the LLMs use the context they get to provide a response.
all-MiniLM-L6-v2 from sentence-transformersThe system was tested using four uploaded documents. If a question falls outside their scope, the model is designed to admit it cannot answer, this is part of the controlled experiment using a restricted prompting template.
The current documents include:
The system can ingest any .txt or .md file placed in the documents folder.
The full implementation is available in my GitHub repository: siliconshells.
This section shows which stored text chunks were retrieved as context for the LLM’s answer generation. The system retrieves the top three chunks and displays the file each chunk originated from.
Token-Level Saliency measures how much each individual token (word) contributes to the model’s prediction. It compares the question and the answer to determine influence at a microscopic level.
Lists sentences with very low attribution scores—likely not sourced from retrieved documents and possibly hallucinated by the model.