Day 94 of 133

Hardware: GPUs, TPUs, NVLink, IB + DSA review

A100 vs H100 vs B200; HBM bandwidth; interconnect bottlenecks.

DSA · NeetCode Backtracking

Combination Sum IIDSA · Backtracking
Interview questions to prep
1. Walk through your pruning strategy — what subtrees do you skip and why is it safe?
2. Where does memoization apply? Could this be a DP problem in disguise?
3. What's the worst-case time complexity, and what's the depth of the recursion stack?

NVIDIA GPUs: A100, H100, B200 — memory, FLOPs, tensor coresMLOpsNVIDIA
Interview questions to prep
1. Compare A100, H100, and B200 — what changed each generation?
2. Why does HBM matter more than FLOPs for many ML workloads?
TPUs: pods, slices, XLAMLOpsGoogle
Interview questions to prep
1. Compare GPUs vs TPUs for training — when does each win?
2. What's the cost of porting a PyTorch model to TPU via XLA, and where does it break?
NVLink, InfiniBand, network bottlenecksMLOpsNCCL
Interview questions to prep
1. Why does interconnect (NVLink, IB) often bottleneck distributed training before compute does?
2. How would you diagnose whether your distributed training is bottlenecked by compute, memory, or network?

References & further reading