Day 39 of 133

DL regularization (dropout, BN/LN/RMSNorm, augmentation) + DSA Trees

Why transformers prefer LayerNorm. Mixup. When dropout hurts.

DSA · NeetCode Trees

  • Interview questions to prep

    1. Compare BFS vs DFS for this problem — which fits, and what's the iterative version?
    2. What's the recursion's space cost on the stack, and how would you go iterative if you needed O(log n)?
    3. What's the relationship between this problem's invariant and the BST property (if any)?
  • Interview questions to prep

    1. Compare BFS vs DFS for this problem — which fits, and what's the iterative version?
    2. What's the recursion's space cost on the stack, and how would you go iterative if you needed O(log n)?
    3. What's the relationship between this problem's invariant and the BST property (if any)?

DL · Regularization in deep nets

  • Interview questions to prep

    1. Why is dropout disabled at inference, and what is inverted dropout?
    2. Where does dropout work well, and where does it hurt (e.g., conv vs fully-connected)?
  • Interview questions to prep

    1. Why does BN behave differently between train and eval mode?
    2. Why do transformers prefer LayerNorm over BatchNorm?
    3. What problem does RMSNorm solve in modern LLMs?
  • Interview questions to prep

    1. How does mixup regularize, and why does it improve calibration?
    2. Compare mixup vs CutMix vs RandAugment — when does each shine?

References & further reading