Day 51 of 133
RNN, LSTM, GRU — and why we left them
Vanishing gradients, LSTM gates, why transformers won.
DSA · NeetCode Graphs
- Number OF IslandsDSA · Graphs
Interview questions to prep
- Compare DFS, BFS, and Union-Find — pick one and defend it.
- What if the grid is huge and streamed (rows arrive one at a time)?
- Clone GraphDSA · Graphs
Interview questions to prep
- Is this BFS, DFS, or Union-Find? Defend the choice over the other two.
- Walk through complexity in terms of V and E. Where do those costs come from?
- How would you handle disconnected components, self-loops, or duplicate edges?
DL · RNN, LSTM, GRU
Interview questions to prep
- Why do vanilla RNNs struggle with long-term dependencies?
- Walk through truncated backprop-through-time — why is it the practical default?
- What causes exploding gradients in BPTT, and where do clipping or normalization help?
- How would you set up next-word prediction with an RNN, and what is the target at each time step?
Interview questions to prep
- Walk through an LSTM cell: forget, input, output gates and cell state.
- How does the cell state help with vanishing gradients?
- Why is a BiLSTM useful for sequence labeling but not valid for causal next-token generation?
Interview questions to prep
- When would you pick GRU over LSTM?
- What does GRU's update gate do that LSTM splits across forget + input gates?
References & further reading
- DeepLearning.AI Deep Learning Specialization ↗DeepLearning.AI
- CS224n — NLP with deep learning ↗Stanford
- Dive into Deep Learning (d2l.ai) ↗d2l.ai