Day 84 of 133
Long-context vs RAG; agentic RAG; prompt caching
1M+ contexts; agent loops over retrieval; cost via caching.
DSA · NeetCode Backtracking
- N QueensDSA · Backtracking
Interview questions to prep
- How do you check 'queen attacks me' in O(1) using the diagonal-set trick?
- What's the state-space size, and how much does pruning actually save in practice?
GenAI · Long context & agentic RAG
Interview questions to prep
- Do long context windows kill RAG? Defend your view.
- What's the cost picture (latency + $) of stuffing 1M tokens vs RAG over the same corpus?
- Does a bigger context window always improve enterprise QA? Explain context dilution, attention cost, and stale-token risk.
- What is context inflation, and how would you detect it in latency, cost, and answer-quality metrics?
- When would hierarchical summaries plus retrieval beat stuffing the full document into the prompt?
Interview questions to prep
- What does agentic RAG add over a static RAG pipeline?
- When does agentic RAG perform WORSE than static RAG — what are the failure modes?
Interview questions to prep
- How does prompt caching change the cost picture for repeated context?
- Compare exact-match retrieval caching vs semantic caching — when does each fit?
Interview questions to prep
- How would you answer questions over a 500-page PDF without exceeding the context window?
- When do you choose hierarchical summaries, parent-child chunks, or long-context stuffing?
- How do you avoid losing section-level citations when chunking long documents?
References & further reading
- Anthropic — Prompt Engineering Guide ↗Anthropic
- LangChain — RAG concepts ↗LangChain