Day 84 of 133

Long-context vs RAG; agentic RAG; prompt caching

1M+ contexts; agent loops over retrieval; cost via caching.

DSA · NeetCode Backtracking

N QueensDSA · Backtracking
Interview questions to prep
1. How do you check 'queen attacks me' in O(1) using the diagonal-set trick?
2. What's the state-space size, and how much does pruning actually save in practice?

1M+ context windows: when do they replace RAG?Generative AIAnthropic
Interview questions to prep
1. Do long context windows kill RAG? Defend your view.
2. What's the cost picture (latency + $) of stuffing 1M tokens vs RAG over the same corpus?
3. Does a bigger context window always improve enterprise QA? Explain context dilution, attention cost, and stale-token risk.
4. What is context inflation, and how would you detect it in latency, cost, and answer-quality metrics?
5. When would hierarchical summaries plus retrieval beat stuffing the full document into the prompt?
Agentic RAG: planning + tool-use over retrievalGenerative AILlamaIndex
Interview questions to prep
1. What does agentic RAG add over a static RAG pipeline?
2. When does agentic RAG perform WORSE than static RAG — what are the failure modes?
Prompt caching & retrieval cachingGenerative AIAnthropic
Interview questions to prep
1. How does prompt caching change the cost picture for repeated context?
2. Compare exact-match retrieval caching vs semantic caching — when does each fit?
Long documents: chunk, summarize, route, or stuff contextGenerative AIPinecone
Interview questions to prep
1. How would you answer questions over a 500-page PDF without exceeding the context window?
2. When do you choose hierarchical summaries, parent-child chunks, or long-context stuffing?
3. How do you avoid losing section-level citations when chunking long documents?

References & further reading