Day 53 of 133

Pretrained models — BERT, T5, GPT + DSA Graphs

Encoder-only vs encoder-decoder vs decoder-only; MLM vs causal LM.

DSA · NeetCode Graphs

Surrounded RegionsDSA · Graphs
Interview questions to prep
1. Is this BFS, DFS, or Union-Find? Defend the choice over the other two.
2. Walk through complexity in terms of V and E. Where do those costs come from?
3. How would you handle disconnected components, self-loops, or duplicate edges?
Rotting OrangesDSA · Graphs
Interview questions to prep
1. Is this BFS, DFS, or Union-Find? Defend the choice over the other two.
2. Walk through complexity in terms of V and E. Where do those costs come from?
3. How would you handle disconnected components, self-loops, or duplicate edges?

BERT: MLM + NSP, encoder-onlyDeep LearningDevlin et al.
Interview questions to prep
1. Walk through BERT's MLM and NSP objectives.
2. Why is BERT bidirectional while GPT is left-to-right?
3. How would you fine-tune BERT for token classification (NER)?
4. Why is masking used in masked-language-model training, and how is it different from causal masking?
5. How would you fine-tune DistilBERT for IMDB sentiment and check whether compression hurt quality?
T5 / BART: encoder-decoderDeep LearningRaffel et al.
Interview questions to prep
1. What is T5's text-to-text framing, and what does it enable?
2. Compare BART vs T5 for summarization.
GPT: decoder-only, causal LMDeep LearningOpenAI
Interview questions to prep
1. Why did decoder-only models win the LLM race?
2. What does the causal-LM objective give you that masked-LM doesn't, and vice versa?
3. Explain teacher forcing during training and autoregressive decoding during inference.

References & further reading