Day 41 of 133

DL training tricks (clipping, accum, mixed precision, checkpointing) + DSA Trees

How real teams fit big models on small GPUs.

DSA · NeetCode Trees

Construct Binary Tree From Preorder And Inorder TraversalDSA · Trees
Interview questions to prep
1. Compare BFS vs DFS for this problem — which fits, and what's the iterative version?
2. What's the recursion's space cost on the stack, and how would you go iterative if you needed O(log n)?
3. What's the relationship between this problem's invariant and the BST property (if any)?
Binary Tree Maximum Path SumDSA · Trees
Interview questions to prep
1. What does the recursion return vs what it updates globally? Why those two different things?
2. What's the time and space complexity, and where does the space go?

Gradient clipping, accumulationDeep LearningPyTorch
Interview questions to prep
1. When would you reach for gradient clipping?
2. Why does gradient accumulation let you simulate a larger batch size?
Mixed precision (fp16, bf16, fp8)Deep LearningPyTorch
Interview questions to prep
1. Compare fp16 vs bf16 — why does bf16 matter for training stability?
2. What is loss scaling and when do you still need it under bf16/fp8?
Activation checkpointing for memoryDeep LearningPyTorch
Interview questions to prep
1. How does activation checkpointing trade compute for memory?
2. When does activation checkpointing become NOT worth it — what's the typical compute overhead?
PyTorch training loops: tensors, channels, and handwritten digitsDeep LearningNeetCode ML
Interview questions to prep
1. Implement a minimal PyTorch training loop for MNIST-style handwritten digits and name each required step.
2. Why do PyTorch image tensors usually use channel-first shape NCHW instead of NHWC?
3. What is the practical difference between a PyTorch tensor and a NumPy array during training?

References & further reading