Day 79 of 133
Embeddings + vector DBs + ANN indexes
MTEB; pgvector vs Pinecone vs Weaviate vs Qdrant; HNSW vs IVF.
DSA · NeetCode 1-D DP
- Partition Equal Subset SumDSA · 1-D DP
Interview questions to prep
- State the DP: define the state, the transition, and the base case explicitly.
- Top-down (memoized recursion) vs bottom-up (tabulation) — which is more natural here, and why?
- Can you space-optimize from O(n) to O(1)? Show the rolling-window trick.
GenAI · Embeddings & vector DBs
Interview questions to prep
- How do you pick an embedding model — what does the MTEB leaderboard tell you and not tell you?
- Why does dimension count matter for cost and recall?
Interview questions to prep
- Compare Pinecone, Weaviate, Qdrant, and pgvector — when does each fit?
- When is pgvector enough vs when do you need a dedicated vector DB?
Interview questions to prep
- Compare HNSW vs IVF for ANN — accuracy/speed/memory trade-offs.
- When would you switch from in-memory FAISS to a hosted vector DB — what's the breakpoint?
Interview questions to prep
- Why does mean-pooled vanilla BERT often underperform Sentence-BERT for semantic search?
- When is cosine similarity insufficient for retrieval quality, and what diagnostics would you add?
- How do you evaluate embedding quality beyond a single nearest-neighbor demo?
References & further reading
- Pinecone — Vector Databases Explained ↗Pinecone
- Hugging Face LLM course ↗Hugging Face
- LangChain — RAG concepts ↗LangChain