Day 38 of 133
DL optimizers in practice + DSA Trees
Adam vs AdamW vs SGD+momentum. Warmup, cosine, OneCycle.
DSA · NeetCode Trees
Interview questions to prep
- How does the BST property let you avoid traversing the whole tree?
- Generalize to a non-BST binary tree (LC 236) — how does the algorithm change?
- Binary Tree Level Order TraversalDSA · Trees
Interview questions to prep
- Walk through BFS with queue. How do you cleanly separate one level from the next?
- Can you do this DFS-recursively while still grouping by level?
DL · Optimizers in practice
Interview questions to prep
- Why does momentum help SGD escape narrow ravines?
- How is Nesterov momentum different from plain momentum, and when does the difference matter?
Interview questions to prep
- Why is AdamW preferred over Adam when using weight decay?
- When would you ever pick SGD over Adam in deep learning?
Interview questions to prep
- Why is learning rate warmup important for transformer training?
- Compare cosine vs step decay — when does each work better?
References & further reading