Day 55 of 133
DL/NLP consolidation + DSA Graphs
60-min DL breadth quiz; rehearse self-attention, BERT, T5, transfer learning.
DSA · NeetCode Graphs
- Course Schedule IIDSA · Graphs
Interview questions to prep
- Is this BFS, DFS, or Union-Find? Defend the choice over the other two.
- Walk through complexity in terms of V and E. Where do those costs come from?
- How would you handle disconnected components, self-loops, or duplicate edges?
- Redundant ConnectionDSA · Graphs
Interview questions to prep
- Is this BFS, DFS, or Union-Find? Defend the choice over the other two.
- Walk through complexity in terms of V and E. Where do those costs come from?
- How would you handle disconnected components, self-loops, or duplicate edges?
DL · Attention & Transformer
Interview questions to prep
- Walk me through self-attention(Q, K, V) end-to-end.
- Why divide by √d_k inside softmax?
- What does multi-head attention buy you over a single head?
Interview questions to prep
- Walk me through the transformer block: attention → add+norm → FFN → add+norm.
- Compare absolute vs relative vs RoPE positional encodings.
Interview questions to prep
- What problem does FlashAttention solve, and how?
- Compare MHA, MQA, and GQA — KV-cache trade-offs.
Interview questions to prep
- Implement scaled dot-product self-attention and track the shapes of Q, K, V, scores, and output.
- How do masks change self-attention for causal language modeling?
- What changes when you split attention into multiple heads and then concatenate them?
NLP · Pretrained models (BERT, T5, GPT)
Interview questions to prep
- Walk through BERT's MLM and NSP objectives.
- Why is BERT bidirectional while GPT is left-to-right?
- How would you fine-tune BERT for token classification (NER)?
Interview questions to prep
- What is T5's text-to-text framing, and what does it enable?
- Compare BART vs T5 for summarization.
Interview questions to prep
- Why did decoder-only models win the LLM race?
- What does the causal-LM objective give you that masked-LM doesn't, and vice versa?
- Explain teacher forcing during training and autoregressive decoding during inference.
References & further reading
- Illustrated Transformer (Jay Alammar) ↗Jay Alammar
- CS224n — NLP with deep learning ↗Stanford