Day 14 of 133
Math/stats consolidation + DSA Sliding Window finish
Recap weeks 1-2 with rehearsed answers. Wrap Sliding Window pattern.
DSA · NeetCode Sliding Window
- Minimum Window SubstringDSA · Sliding Window
Interview questions to prep
- Walk through your shrink condition — when do you safely move the left pointer?
- How do you handle duplicate characters in t (e.g., 'aabb')?
- Sliding Window MaximumDSA · Sliding Window
Interview questions to prep
- Is this a fixed-size or variable-size window? Why does that fit this problem?
- What's the invariant inside the window, and how do you maintain it on shrink/expand?
- Why is the overall pass O(n) even though the inner loop looks like it could be O(n²)?
ML · Bias-variance trade-off
Interview questions to prep
- Decompose expected squared error into bias², variance, and irreducible noise.
- Why does adding more training data reduce variance but not bias?
Interview questions to prep
- Explain the double-descent phenomenon. How does it overturn classical bias-variance intuition?
- Why do over-parameterized models often generalize well in deep learning?
Interview questions to prep
- How do you read a learning curve to decide between more data, regularization, or a bigger model?
- What does a large gap between training and validation curves usually mean — and what shrinks it?
Math · Optimization for ML
Interview questions to prep
- What does convexity guarantee for optimization?
- Are deep neural network losses convex? Why does SGD still work?
Interview questions to prep
- Compare batch GD, SGD, and mini-batch SGD — trade-offs in compute, noise, and convergence.
- Why does SGD with momentum converge faster than vanilla SGD?
Interview questions to prep
- Implement gradient descent for a simple squared-error objective and explain the update rule line by line.
- How would you debug a training run where gradient descent diverges after a few steps?
Interview questions to prep
- Compare Adam, AdamW, and SGD with momentum — which would you reach for first and why?
- Why is the AdamW correction important when using weight decay with adaptive optimizers?
- What's the role of learning-rate warmup and cosine schedules?
References & further reading