Day 56 of 133

Deep learning wrap + DSA Graphs finish

Re-record CV + NLP breadth answers; identify shaky topics for re-study.

DSA · NeetCode Graphs

Number OF Connected Components IN AN Undirected GraphDSA · Graphs
Interview questions to prep
1. Is this BFS, DFS, or Union-Find? Defend the choice over the other two.
2. Walk through complexity in terms of V and E. Where do those costs come from?
3. How would you handle disconnected components, self-loops, or duplicate edges?
Graph Valid TreeDSA · Graphs
Interview questions to prep
1. Is this BFS, DFS, or Union-Find? Defend the choice over the other two.
2. Walk through complexity in terms of V and E. Where do those costs come from?
3. How would you handle disconnected components, self-loops, or duplicate edges?
Word LadderDSA · Graphs
Interview questions to prep
1. Why BFS over DFS for shortest path?
2. What's the trick with the wildcard pattern (e.g., 'h*t') to build neighbours efficiently?

Perceptron, MLP, forward passDeep LearningDLS C1
Interview questions to prep
1. Walk me through forward pass through a 2-layer MLP for binary classification.
2. Why can't a single perceptron solve XOR — and how does adding a hidden layer fix it?
Activations: sigmoid, tanh, ReLU, GELU, SwiGLUDeep LearningRead
Interview questions to prep
1. Compare ReLU, Leaky ReLU, GELU, and SwiGLU — when does each shine?
2. Why did ReLU largely replace sigmoid/tanh in deep networks?
3. What is the dying ReLU problem and how do you mitigate it?
Weight initialization (Xavier, He)Deep LearningDLAI
Interview questions to prep
1. Why does poor initialization cause vanishing or exploding gradients?
2. Compare Xavier vs He initialization — which goes with which activation and why?

Vision Transformer (ViT)Deep LearningGoogle
Interview questions to prep
1. How does ViT tokenize an image, and what's the role of the [CLS] token?
2. When does a ViT beat a CNN, and when does data-hungriness hurt it?
CLIP: contrastive image-text pretrainingDeep LearningOpenAI
Interview questions to prep
1. How does CLIP enable zero-shot image classification?
2. Walk me through CLIP's contrastive training objective.
BLIP, LLaVA, multimodal LLMsDeep LearningLLaVA
Interview questions to prep
1. How do multimodal LLMs like LLaVA fuse vision encoders with language models?
2. Compare early fusion vs late fusion in vision-language models — what does each cost in compute and quality?

Self-attention(Q, K, V) end-to-endDeep LearningJay Alammar
Interview questions to prep
1. Walk me through self-attention(Q, K, V) end-to-end.
2. Why divide by √d_k inside softmax?
3. What does multi-head attention buy you over a single head?
Encoder-decoder, positional encoding, residualsDeep LearningVaswani et al.
Interview questions to prep
1. Walk me through the transformer block: attention → add+norm → FFN → add+norm.
2. Compare absolute vs relative vs RoPE positional encodings.
Sparse, linear, FlashAttention, MQA, GQADeep LearningFlashAttention
Interview questions to prep
1. What problem does FlashAttention solve, and how?
2. Compare MHA, MQA, and GQA — KV-cache trade-offs.
Code self-attention, multi-head attention, and a tiny transformerDeep LearningNeetCode ML
Interview questions to prep
1. Implement scaled dot-product self-attention and track the shapes of Q, K, V, scores, and output.
2. How do masks change self-attention for causal language modeling?
3. What changes when you split attention into multiple heads and then concatenate them?

References & further reading