Day 101 of 133

NLP deep dive: IE, multilingual, code generation

Span-based IE; XLM-R; StarCoder / Code Llama.

DSA · NeetCode Advanced Graphs

  • Cheapest Flights Within K StopsDSA · Advanced Graphs

    Interview questions to prep

    1. Pick between Dijkstra, Bellman-Ford, Floyd-Warshall, MST (Prim/Kruskal), or topo sort — defend the choice.
    2. What does this problem assume about edge weights (non-negative? integer? bounded?) — and what breaks if those don't hold?
    3. Walk me through complexity in V and E, and the data-structure choice (heap vs Fibonacci heap vs array).

Specialization · NLP deep dive

  • Interview questions to prep

    1. Compare span-based vs sequence-tagging IE — when does each win?
    2. How would you design a relation-extraction system that handles entities not seen in training?
  • Interview questions to prep

    1. How does mBERT/XLM-R enable cross-lingual transfer, and where does it break?
    2. What's the curse of multilinguality, and how do you mitigate it for low-resource languages?
  • Interview questions to prep

    1. How is training a code model different from training a natural-language model?
    2. How would you evaluate a code generation model — pass@k, HumanEval, SWE-bench?
  • Interview questions to prep

    1. When would you choose BERT-style encoder-only models over GPT-style decoder-only models?
    2. When is encoder-decoder still the cleanest choice for summarization, translation, or extraction?
    3. How would latency and serving cost affect that architecture choice?

References & further reading