Day 113 of 133
Design enterprise RAG (Glean-style) + DSA review
Multi-source ingestion; permission-aware retrieval; offline+online eval.
DSA · NeetCode Trees
- Maximum Depth OF Binary TreeDSA · Trees
Interview questions to prep
- Compare BFS vs DFS for this problem — which fits, and what's the iterative version?
- What's the recursion's space cost on the stack, and how would you go iterative if you needed O(log n)?
- What's the relationship between this problem's invariant and the BST property (if any)?
ML System Design · Enterprise RAG
Interview questions to prep
- Walk me through designing an enterprise RAG over Confluence + Slack + Drive.
- How do you handle access control / permissions in retrieval?
- How would you handle 50M docs and 10k QPS?
Interview questions to prep
- How would you build an offline + online eval pipeline for an enterprise RAG?
- What synthetic golden set would you generate for a domain where humans can't easily score answers?
Interview questions to prep
- How would you ingest SharePoint, Jira, Slack, and Drive while preserving permissions and freshness?
- What metadata schema would you attach to chunks so retrieval can enforce ACLs and route by source?
- How do you backfill 50M documents without breaking freshness for newly edited docs?
Interview questions to prep
- Implement retrieve_relevant_chunks(markdown, query) that preserves H1/H2/H3 hierarchy in returned chunks.
- How would you score headings plus body text so a section title can match even when the paragraph uses different wording?
- What edge cases break naive markdown chunking: tables, code blocks, duplicate headings, or very long sections?
References & further reading
- LangChain — RAG concepts ↗LangChain
- Pinecone — Vector Databases Explained ↗Pinecone
- Ragas metrics catalog ↗Ragas