Day 55 of 133

DL/NLP consolidation + DSA Graphs

60-min DL breadth quiz; rehearse self-attention, BERT, T5, transfer learning.

DSA · NeetCode Graphs

Course Schedule IIDSA · Graphs
Interview questions to prep
1. Is this BFS, DFS, or Union-Find? Defend the choice over the other two.
2. Walk through complexity in terms of V and E. Where do those costs come from?
3. How would you handle disconnected components, self-loops, or duplicate edges?
Redundant ConnectionDSA · Graphs
Interview questions to prep
1. Is this BFS, DFS, or Union-Find? Defend the choice over the other two.
2. Walk through complexity in terms of V and E. Where do those costs come from?
3. How would you handle disconnected components, self-loops, or duplicate edges?

Self-attention(Q, K, V) end-to-endDeep LearningJay Alammar
Interview questions to prep
1. Walk me through self-attention(Q, K, V) end-to-end.
2. Why divide by √d_k inside softmax?
3. What does multi-head attention buy you over a single head?
Encoder-decoder, positional encoding, residualsDeep LearningVaswani et al.
Interview questions to prep
1. Walk me through the transformer block: attention → add+norm → FFN → add+norm.
2. Compare absolute vs relative vs RoPE positional encodings.
Sparse, linear, FlashAttention, MQA, GQADeep LearningFlashAttention
Interview questions to prep
1. What problem does FlashAttention solve, and how?
2. Compare MHA, MQA, and GQA — KV-cache trade-offs.
Code self-attention, multi-head attention, and a tiny transformerDeep LearningNeetCode ML
Interview questions to prep
1. Implement scaled dot-product self-attention and track the shapes of Q, K, V, scores, and output.
2. How do masks change self-attention for causal language modeling?
3. What changes when you split attention into multiple heads and then concatenate them?

BERT: MLM + NSP, encoder-onlyDeep LearningDevlin et al.
Interview questions to prep
1. Walk through BERT's MLM and NSP objectives.
2. Why is BERT bidirectional while GPT is left-to-right?
3. How would you fine-tune BERT for token classification (NER)?
T5 / BART: encoder-decoderDeep LearningRaffel et al.
Interview questions to prep
1. What is T5's text-to-text framing, and what does it enable?
2. Compare BART vs T5 for summarization.
GPT: decoder-only, causal LMDeep LearningOpenAI
Interview questions to prep
1. Why did decoder-only models win the LLM race?
2. What does the causal-LM objective give you that masked-LM doesn't, and vice versa?
3. Explain teacher forcing during training and autoregressive decoding during inference.

References & further reading