Day 51 of 133

RNN, LSTM, GRU — and why we left them

Vanishing gradients, LSTM gates, why transformers won.

DSA · NeetCode Graphs

Number OF IslandsDSA · Graphs
Interview questions to prep
1. Compare DFS, BFS, and Union-Find — pick one and defend it.
2. What if the grid is huge and streamed (rows arrive one at a time)?
Clone GraphDSA · Graphs
Interview questions to prep
1. Is this BFS, DFS, or Union-Find? Defend the choice over the other two.
2. Walk through complexity in terms of V and E. Where do those costs come from?
3. How would you handle disconnected components, self-loops, or duplicate edges?

RNN forward + truncated BPTTDeep LearningDLS C5
Interview questions to prep
1. Why do vanilla RNNs struggle with long-term dependencies?
2. Walk through truncated backprop-through-time — why is it the practical default?
3. What causes exploding gradients in BPTT, and where do clipping or normalization help?
4. How would you set up next-word prediction with an RNN, and what is the target at each time step?
LSTM cell: gates and cell stateDeep Learningcolah
Interview questions to prep
1. Walk through an LSTM cell: forget, input, output gates and cell state.
2. How does the cell state help with vanishing gradients?
3. Why is a BiLSTM useful for sequence labeling but not valid for causal next-token generation?
GRU vs LSTMDeep LearningChung et al.
Interview questions to prep
1. When would you pick GRU over LSTM?
2. What does GRU's update gate do that LSTM splits across forget + input gates?

References & further reading