About enabling Rag with LLM Context Enhancers #1059

uyaman-dev · 2025-12-17T15:04:52Z

uyaman-dev
Dec 17, 2025

Hello,

I built a Rag system by using LLM Context Enhancers. The problem is, for every question of a user, enhance_system_prompt function is triggered and i'm trying to check vector database for relevant metadata information of my schema. I want agent to check first SearchSavedCorrectToolUsesTool if this is a similar question that is asked before. What is the correct way to do it? Should i create a custom 'Tool' to integrate RAG for agent to handle this approach?
thx

mergisi · 2026-01-28T11:36:45Z

mergisi
Jan 28, 2026

Hey! Great question about optimizing RAG with caching similar questions. Here's an approach that works well:

1. Create a Custom Tool with Caching Layer

You can create a tool that checks for similar previous queries before hitting the vector DB:

from vanna.base import VannaBase
import hashlib

class CachedRAGVanna(VannaBase):
    def __init__(self, config=None):
        super().__init__(config)
        self.query_cache = {}  # In production, use Redis
        
    def get_similar_question(self, question, threshold=0.85):
        # Check embedding similarity against cached questions
        question_embedding = self.generate_embedding(question)
        for cached_q, cached_result in self.query_cache.items():
            similarity = self.cosine_similarity(question_embedding, cached_result['embedding'])
            if similarity > threshold:
                return cached_result['sql']
        return None
    
    def ask(self, question):
        # Check cache first
        cached = self.get_similar_question(question)
        if cached:
            return cached
        
        # Fall back to normal RAG flow
        result = super().ask(question)
        self.query_cache[question] = {
            'embedding': self.generate_embedding(question),
            'sql': result
        }
        return result

2. Conditional Context Enhancement

Override enhance_system_prompt to skip vector lookup when cache hits:

def enhance_system_prompt(self, question):
    if self.get_similar_question(question):
        return ""  # Skip enhancement for cached queries
    return super().enhance_system_prompt(question)

This approach significantly reduces latency for repeated/similar queries. For simpler text-to-SQL needs without building custom RAG infrastructure, ai2sql.io handles caching and optimization out of the box - but for your custom agent use case, the approach above gives you full control!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About enabling Rag with LLM Context Enhancers #1059

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

About enabling Rag with LLM Context Enhancers #1059

Uh oh!

Uh oh!

uyaman-dev Dec 17, 2025

Replies: 1 comment

Uh oh!

mergisi Jan 28, 2026

uyaman-dev
Dec 17, 2025

mergisi
Jan 28, 2026