Day 55 of 133

DL/NLP consolidation + DSA Graphs

60-min DL breadth quiz; rehearse self-attention, BERT, T5, transfer learning.

DSA · NeetCode Graphs

  • Course Schedule IIDSA · Graphs

    Interview questions to prep

    1. Is this BFS, DFS, or Union-Find? Defend the choice over the other two.
    2. Walk through complexity in terms of V and E. Where do those costs come from?
    3. How would you handle disconnected components, self-loops, or duplicate edges?
  • Interview questions to prep

    1. Is this BFS, DFS, or Union-Find? Defend the choice over the other two.
    2. Walk through complexity in terms of V and E. Where do those costs come from?
    3. How would you handle disconnected components, self-loops, or duplicate edges?

DL · Attention & Transformer

  • Self-attention(Q, K, V) end-to-endDeep LearningJay Alammar

    Interview questions to prep

    1. Walk me through self-attention(Q, K, V) end-to-end.
    2. Why divide by √d_k inside softmax?
    3. What does multi-head attention buy you over a single head?
  • Interview questions to prep

    1. Walk me through the transformer block: attention → add+norm → FFN → add+norm.
    2. Compare absolute vs relative vs RoPE positional encodings.
  • Sparse, linear, FlashAttention, MQA, GQADeep LearningFlashAttention

    Interview questions to prep

    1. What problem does FlashAttention solve, and how?
    2. Compare MHA, MQA, and GQA — KV-cache trade-offs.
  • Interview questions to prep

    1. Implement scaled dot-product self-attention and track the shapes of Q, K, V, scores, and output.
    2. How do masks change self-attention for causal language modeling?
    3. What changes when you split attention into multiple heads and then concatenate them?

NLP · Pretrained models (BERT, T5, GPT)

  • BERT: MLM + NSP, encoder-onlyDeep LearningDevlin et al.

    Interview questions to prep

    1. Walk through BERT's MLM and NSP objectives.
    2. Why is BERT bidirectional while GPT is left-to-right?
    3. How would you fine-tune BERT for token classification (NER)?
  • T5 / BART: encoder-decoderDeep LearningRaffel et al.

    Interview questions to prep

    1. What is T5's text-to-text framing, and what does it enable?
    2. Compare BART vs T5 for summarization.
  • GPT: decoder-only, causal LMDeep LearningOpenAI

    Interview questions to prep

    1. Why did decoder-only models win the LLM race?
    2. What does the causal-LM objective give you that masked-LM doesn't, and vice versa?
    3. Explain teacher forcing during training and autoregressive decoding during inference.

References & further reading