Day 51 of 133

RNN, LSTM, GRU — and why we left them

Vanishing gradients, LSTM gates, why transformers won.

DSA · NeetCode Graphs

  • Number OF IslandsDSA · Graphs

    Interview questions to prep

    1. Compare DFS, BFS, and Union-Find — pick one and defend it.
    2. What if the grid is huge and streamed (rows arrive one at a time)?
  • Clone GraphDSA · Graphs

    Interview questions to prep

    1. Is this BFS, DFS, or Union-Find? Defend the choice over the other two.
    2. Walk through complexity in terms of V and E. Where do those costs come from?
    3. How would you handle disconnected components, self-loops, or duplicate edges?

DL · RNN, LSTM, GRU

  • RNN forward + truncated BPTTDeep LearningDLS C5

    Interview questions to prep

    1. Why do vanilla RNNs struggle with long-term dependencies?
    2. Walk through truncated backprop-through-time — why is it the practical default?
    3. What causes exploding gradients in BPTT, and where do clipping or normalization help?
    4. How would you set up next-word prediction with an RNN, and what is the target at each time step?
  • Interview questions to prep

    1. Walk through an LSTM cell: forget, input, output gates and cell state.
    2. How does the cell state help with vanishing gradients?
    3. Why is a BiLSTM useful for sequence labeling but not valid for causal next-token generation?
  • GRU vs LSTMDeep LearningChung et al.

    Interview questions to prep

    1. When would you pick GRU over LSTM?
    2. What does GRU's update gate do that LSTM splits across forget + input gates?

References & further reading