Day 53 of 133
Pretrained models — BERT, T5, GPT + DSA Graphs
Encoder-only vs encoder-decoder vs decoder-only; MLM vs causal LM.
DSA · NeetCode Graphs
- Surrounded RegionsDSA · Graphs
Interview questions to prep
- Is this BFS, DFS, or Union-Find? Defend the choice over the other two.
- Walk through complexity in terms of V and E. Where do those costs come from?
- How would you handle disconnected components, self-loops, or duplicate edges?
- Rotting OrangesDSA · Graphs
Interview questions to prep
- Is this BFS, DFS, or Union-Find? Defend the choice over the other two.
- Walk through complexity in terms of V and E. Where do those costs come from?
- How would you handle disconnected components, self-loops, or duplicate edges?
NLP · Pretrained models (BERT, T5, GPT)
Interview questions to prep
- Walk through BERT's MLM and NSP objectives.
- Why is BERT bidirectional while GPT is left-to-right?
- How would you fine-tune BERT for token classification (NER)?
- Why is masking used in masked-language-model training, and how is it different from causal masking?
- How would you fine-tune DistilBERT for IMDB sentiment and check whether compression hurt quality?
Interview questions to prep
- What is T5's text-to-text framing, and what does it enable?
- Compare BART vs T5 for summarization.
Interview questions to prep
- Why did decoder-only models win the LLM race?
- What does the causal-LM objective give you that masked-LM doesn't, and vice versa?
- Explain teacher forcing during training and autoregressive decoding during inference.
References & further reading
- BERT paper ↗Google
- Hugging Face NLP course ↗Hugging Face
- Illustrated Transformer (Jay Alammar) ↗Jay Alammar