Day 56 of 133
Deep learning wrap + DSA Graphs finish
Re-record CV + NLP breadth answers; identify shaky topics for re-study.
DSA · NeetCode Graphs
Interview questions to prep
- Is this BFS, DFS, or Union-Find? Defend the choice over the other two.
- Walk through complexity in terms of V and E. Where do those costs come from?
- How would you handle disconnected components, self-loops, or duplicate edges?
- Graph Valid TreeDSA · Graphs
Interview questions to prep
- Is this BFS, DFS, or Union-Find? Defend the choice over the other two.
- Walk through complexity in terms of V and E. Where do those costs come from?
- How would you handle disconnected components, self-loops, or duplicate edges?
- Word LadderDSA · Graphs
Interview questions to prep
- Why BFS over DFS for shortest path?
- What's the trick with the wildcard pattern (e.g., 'h*t') to build neighbours efficiently?
DL · Neural network foundations
Interview questions to prep
- Walk me through forward pass through a 2-layer MLP for binary classification.
- Why can't a single perceptron solve XOR — and how does adding a hidden layer fix it?
Interview questions to prep
- Compare ReLU, Leaky ReLU, GELU, and SwiGLU — when does each shine?
- Why did ReLU largely replace sigmoid/tanh in deep networks?
- What is the dying ReLU problem and how do you mitigate it?
Interview questions to prep
- Why does poor initialization cause vanishing or exploding gradients?
- Compare Xavier vs He initialization — which goes with which activation and why?
DL · ViT, CLIP, multimodal
Interview questions to prep
- How does ViT tokenize an image, and what's the role of the [CLS] token?
- When does a ViT beat a CNN, and when does data-hungriness hurt it?
Interview questions to prep
- How does CLIP enable zero-shot image classification?
- Walk me through CLIP's contrastive training objective.
Interview questions to prep
- How do multimodal LLMs like LLaVA fuse vision encoders with language models?
- Compare early fusion vs late fusion in vision-language models — what does each cost in compute and quality?
DL · Attention & Transformer
Interview questions to prep
- Walk me through self-attention(Q, K, V) end-to-end.
- Why divide by √d_k inside softmax?
- What does multi-head attention buy you over a single head?
Interview questions to prep
- Walk me through the transformer block: attention → add+norm → FFN → add+norm.
- Compare absolute vs relative vs RoPE positional encodings.
Interview questions to prep
- What problem does FlashAttention solve, and how?
- Compare MHA, MQA, and GQA — KV-cache trade-offs.
Interview questions to prep
- Implement scaled dot-product self-attention and track the shapes of Q, K, V, scores, and output.
- How do masks change self-attention for causal language modeling?
- What changes when you split attention into multiple heads and then concatenate them?
References & further reading
- DeepLearning.AI Deep Learning Specialization ↗DeepLearning.AI
- CS224n — NLP with deep learning ↗Stanford
- CS231n — CNNs for visual recognition ↗Stanford