Day 101 of 133
NLP deep dive: IE, multilingual, code generation
Span-based IE; XLM-R; StarCoder / Code Llama.
DSA · NeetCode Advanced Graphs
- Cheapest Flights Within K StopsDSA · Advanced Graphs
Interview questions to prep
- Pick between Dijkstra, Bellman-Ford, Floyd-Warshall, MST (Prim/Kruskal), or topo sort — defend the choice.
- What does this problem assume about edge weights (non-negative? integer? bounded?) — and what breaks if those don't hold?
- Walk me through complexity in V and E, and the data-structure choice (heap vs Fibonacci heap vs array).
Specialization · NLP deep dive
Interview questions to prep
- Compare span-based vs sequence-tagging IE — when does each win?
- How would you design a relation-extraction system that handles entities not seen in training?
Interview questions to prep
- How does mBERT/XLM-R enable cross-lingual transfer, and where does it break?
- What's the curse of multilinguality, and how do you mitigate it for low-resource languages?
Interview questions to prep
- How is training a code model different from training a natural-language model?
- How would you evaluate a code generation model — pass@k, HumanEval, SWE-bench?
Interview questions to prep
- When would you choose BERT-style encoder-only models over GPT-style decoder-only models?
- When is encoder-decoder still the cleanest choice for summarization, translation, or extraction?
- How would latency and serving cost affect that architecture choice?
References & further reading
- Hugging Face NLP course ↗Hugging Face
- Papers with Code — SOTA leaderboards ↗Papers with Code