Day 84 of 133

Long-context vs RAG; agentic RAG; prompt caching

1M+ contexts; agent loops over retrieval; cost via caching.

DSA · NeetCode Backtracking

  • N QueensDSA · Backtracking

    Interview questions to prep

    1. How do you check 'queen attacks me' in O(1) using the diagonal-set trick?
    2. What's the state-space size, and how much does pruning actually save in practice?

GenAI · Long context & agentic RAG

  • Interview questions to prep

    1. Do long context windows kill RAG? Defend your view.
    2. What's the cost picture (latency + $) of stuffing 1M tokens vs RAG over the same corpus?
    3. Does a bigger context window always improve enterprise QA? Explain context dilution, attention cost, and stale-token risk.
    4. What is context inflation, and how would you detect it in latency, cost, and answer-quality metrics?
    5. When would hierarchical summaries plus retrieval beat stuffing the full document into the prompt?
  • Interview questions to prep

    1. What does agentic RAG add over a static RAG pipeline?
    2. When does agentic RAG perform WORSE than static RAG — what are the failure modes?
  • Interview questions to prep

    1. How does prompt caching change the cost picture for repeated context?
    2. Compare exact-match retrieval caching vs semantic caching — when does each fit?
  • Interview questions to prep

    1. How would you answer questions over a 500-page PDF without exceeding the context window?
    2. When do you choose hierarchical summaries, parent-child chunks, or long-context stuffing?
    3. How do you avoid losing section-level citations when chunking long documents?

References & further reading