Day 39 of 133

DL regularization (dropout, BN/LN/RMSNorm, augmentation) + DSA Trees

Why transformers prefer LayerNorm. Mixup. When dropout hurts.

DSA · NeetCode Trees

Binary Tree Right Side ViewDSA · Trees
Interview questions to prep
1. Compare BFS vs DFS for this problem — which fits, and what's the iterative version?
2. What's the recursion's space cost on the stack, and how would you go iterative if you needed O(log n)?
3. What's the relationship between this problem's invariant and the BST property (if any)?
Count Good Nodes IN Binary TreeDSA · Trees
Interview questions to prep
1. Compare BFS vs DFS for this problem — which fits, and what's the iterative version?
2. What's the recursion's space cost on the stack, and how would you go iterative if you needed O(log n)?
3. What's the relationship between this problem's invariant and the BST property (if any)?

Dropout: training vs inference, inverted dropoutDeep LearningJMLR
Interview questions to prep
1. Why is dropout disabled at inference, and what is inverted dropout?
2. Where does dropout work well, and where does it hurt (e.g., conv vs fully-connected)?
BatchNorm vs LayerNorm vs GroupNorm vs RMSNormDeep LearningBa et al.
Interview questions to prep
1. Why does BN behave differently between train and eval mode?
2. Why do transformers prefer LayerNorm over BatchNorm?
3. What problem does RMSNorm solve in modern LLMs?
Data augmentation (vision, text, mixup)Deep LearningPyTorch
Interview questions to prep
1. How does mixup regularize, and why does it improve calibration?
2. Compare mixup vs CutMix vs RandAugment — when does each shine?

References & further reading