Day 38 of 133

DL optimizers in practice + DSA Trees

Adam vs AdamW vs SGD+momentum. Warmup, cosine, OneCycle.

DSA · NeetCode Trees

  • Interview questions to prep

    1. How does the BST property let you avoid traversing the whole tree?
    2. Generalize to a non-BST binary tree (LC 236) — how does the algorithm change?
  • Interview questions to prep

    1. Walk through BFS with queue. How do you cleanly separate one level from the next?
    2. Can you do this DFS-recursively while still grouping by level?

DL · Optimizers in practice

  • SGD, momentum, NesterovDeep LearningRuder

    Interview questions to prep

    1. Why does momentum help SGD escape narrow ravines?
    2. How is Nesterov momentum different from plain momentum, and when does the difference matter?
  • Adam, AdamW, RMSpropDeep Learningfast.ai

    Interview questions to prep

    1. Why is AdamW preferred over Adam when using weight decay?
    2. When would you ever pick SGD over Adam in deep learning?
  • Interview questions to prep

    1. Why is learning rate warmup important for transformer training?
    2. Compare cosine vs step decay — when does each work better?

References & further reading