Day 38 of 133

DL optimizers in practice + DSA Trees

Adam vs AdamW vs SGD+momentum. Warmup, cosine, OneCycle.

DSA · NeetCode Trees

Lowest Common Ancestor OF A Binary Search TreeDSA · Trees
Interview questions to prep
1. How does the BST property let you avoid traversing the whole tree?
2. Generalize to a non-BST binary tree (LC 236) — how does the algorithm change?
Binary Tree Level Order TraversalDSA · Trees
Interview questions to prep
1. Walk through BFS with queue. How do you cleanly separate one level from the next?
2. Can you do this DFS-recursively while still grouping by level?

SGD, momentum, NesterovDeep LearningRuder
Interview questions to prep
1. Why does momentum help SGD escape narrow ravines?
2. How is Nesterov momentum different from plain momentum, and when does the difference matter?
Adam, AdamW, RMSpropDeep Learningfast.ai
Interview questions to prep
1. Why is AdamW preferred over Adam when using weight decay?
2. When would you ever pick SGD over Adam in deep learning?
Warmup, cosine, OneCycle, step decayDeep LearningHF
Interview questions to prep
1. Why is learning rate warmup important for transformer training?
2. Compare cosine vs step decay — when does each work better?

References & further reading