Reason about architectures

Deep Learning

Move from backprop and optimization into CNNs, sequence modeling, and transformer intuition.

Featured topics

3 topic cards built for interview prep

Each topic includes a summary, practical learning goals, representative interview prompts, and a suggested roadmap day.

Practice prompts

Daily-plan topics tied directly to this pillar

These are pulled from the same 133-day roadmap content used by Browse Questions.

Day 36DL · Neural network foundations

Perceptron, MLP, forward pass

  • Walk me through forward pass through a 2-layer MLP for binary classification.
  • Why can't a single perceptron solve XOR — and how does adding a hidden layer fix it?
Day 36DL · Neural network foundations

Activations: sigmoid, tanh, ReLU, GELU, SwiGLU

  • Compare ReLU, Leaky ReLU, GELU, and SwiGLU — when does each shine?
  • Why did ReLU largely replace sigmoid/tanh in deep networks?
Day 36DL · Neural network foundations

Weight initialization (Xavier, He)

  • Why does poor initialization cause vanishing or exploding gradients?
  • Compare Xavier vs He initialization — which goes with which activation and why?
Day 37DL · Backpropagation & autograd

Backprop on a computation graph

  • Derive backprop for a 2-layer MLP with cross-entropy loss.
  • Explain why automatic differentiation is reverse-mode for ML.
Day 37DL · Backpropagation & autograd

Vanishing & exploding gradients

  • Why does a deep sigmoid network suffer vanishing gradients?
  • How do residual connections, ReLU, and BN/LN help?
Day 37DL · Backpropagation & autograd

PyTorch autograd in 30 minutes

  • When would you use torch.no_grad() and detach()?
  • What does requires_grad=True actually do under the hood?
Day 38DL · Optimizers in practice

SGD, momentum, Nesterov

  • Why does momentum help SGD escape narrow ravines?
  • How is Nesterov momentum different from plain momentum, and when does the difference matter?
Day 38DL · Optimizers in practice

Adam, AdamW, RMSprop

  • Why is AdamW preferred over Adam when using weight decay?
  • When would you ever pick SGD over Adam in deep learning?