Day 39 of 133
DL regularization (dropout, BN/LN/RMSNorm, augmentation) + DSA Trees
Why transformers prefer LayerNorm. Mixup. When dropout hurts.
DSA · NeetCode Trees
- Binary Tree Right Side ViewDSA · Trees
Interview questions to prep
- Compare BFS vs DFS for this problem — which fits, and what's the iterative version?
- What's the recursion's space cost on the stack, and how would you go iterative if you needed O(log n)?
- What's the relationship between this problem's invariant and the BST property (if any)?
- Count Good Nodes IN Binary TreeDSA · Trees
Interview questions to prep
- Compare BFS vs DFS for this problem — which fits, and what's the iterative version?
- What's the recursion's space cost on the stack, and how would you go iterative if you needed O(log n)?
- What's the relationship between this problem's invariant and the BST property (if any)?
DL · Regularization in deep nets
Interview questions to prep
- Why is dropout disabled at inference, and what is inverted dropout?
- Where does dropout work well, and where does it hurt (e.g., conv vs fully-connected)?
Interview questions to prep
- Why does BN behave differently between train and eval mode?
- Why do transformers prefer LayerNorm over BatchNorm?
- What problem does RMSNorm solve in modern LLMs?
Interview questions to prep
- How does mixup regularize, and why does it improve calibration?
- Compare mixup vs CutMix vs RandAugment — when does each shine?
References & further reading