Day 79 of 133

Embeddings + vector DBs + ANN indexes

MTEB; pgvector vs Pinecone vs Weaviate vs Qdrant; HNSW vs IVF.

DSA · NeetCode 1-D DP

  • Interview questions to prep

    1. State the DP: define the state, the transition, and the base case explicitly.
    2. Top-down (memoized recursion) vs bottom-up (tabulation) — which is more natural here, and why?
    3. Can you space-optimize from O(n) to O(1)? Show the rolling-window trick.

GenAI · Embeddings & vector DBs

  • Interview questions to prep

    1. How do you pick an embedding model — what does the MTEB leaderboard tell you and not tell you?
    2. Why does dimension count matter for cost and recall?
  • Interview questions to prep

    1. Compare Pinecone, Weaviate, Qdrant, and pgvector — when does each fit?
    2. When is pgvector enough vs when do you need a dedicated vector DB?
  • Interview questions to prep

    1. Compare HNSW vs IVF for ANN — accuracy/speed/memory trade-offs.
    2. When would you switch from in-memory FAISS to a hosted vector DB — what's the breakpoint?
  • Interview questions to prep

    1. Why does mean-pooled vanilla BERT often underperform Sentence-BERT for semantic search?
    2. When is cosine similarity insufficient for retrieval quality, and what diagnostics would you add?
    3. How do you evaluate embedding quality beyond a single nearest-neighbor demo?

References & further reading