Day 41 of 133
DL training tricks (clipping, accum, mixed precision, checkpointing) + DSA Trees
How real teams fit big models on small GPUs.
DSA · NeetCode Trees
Interview questions to prep
- Compare BFS vs DFS for this problem — which fits, and what's the iterative version?
- What's the recursion's space cost on the stack, and how would you go iterative if you needed O(log n)?
- What's the relationship between this problem's invariant and the BST property (if any)?
- Binary Tree Maximum Path SumDSA · Trees
Interview questions to prep
- What does the recursion return vs what it updates globally? Why those two different things?
- What's the time and space complexity, and where does the space go?
DL · Training tricks that matter
Interview questions to prep
- When would you reach for gradient clipping?
- Why does gradient accumulation let you simulate a larger batch size?
Interview questions to prep
- Compare fp16 vs bf16 — why does bf16 matter for training stability?
- What is loss scaling and when do you still need it under bf16/fp8?
Interview questions to prep
- How does activation checkpointing trade compute for memory?
- When does activation checkpointing become NOT worth it — what's the typical compute overhead?
Interview questions to prep
- Implement a minimal PyTorch training loop for MNIST-style handwritten digits and name each required step.
- Why do PyTorch image tensors usually use channel-first shape NCHW instead of NHWC?
- What is the practical difference between a PyTorch tensor and a NumPy array during training?
References & further reading