Day 14 of 133

Math/stats consolidation + DSA Sliding Window finish

Recap weeks 1-2 with rehearsed answers. Wrap Sliding Window pattern.

DSA · NeetCode Sliding Window

Minimum Window SubstringDSA · Sliding Window
Interview questions to prep
1. Walk through your shrink condition — when do you safely move the left pointer?
2. How do you handle duplicate characters in t (e.g., 'aabb')?
Sliding Window MaximumDSA · Sliding Window
Interview questions to prep
1. Is this a fixed-size or variable-size window? Why does that fit this problem?
2. What's the invariant inside the window, and how do you maintain it on shrink/expand?
3. Why is the overall pass O(n) even though the inner loop looks like it could be O(n²)?

Bias-variance decomposition with worked exampleStatisticsStatQuest
Interview questions to prep
1. Decompose expected squared error into bias², variance, and irreducible noise.
2. Why does adding more training data reduce variance but not bias?
Double descent: when more parameters helpStatisticsOpenAI
Interview questions to prep
1. Explain the double-descent phenomenon. How does it overturn classical bias-variance intuition?
2. Why do over-parameterized models often generalize well in deep learning?
Learning curves: diagnose underfit vs overfitStatisticsscikit-learn
Interview questions to prep
1. How do you read a learning curve to decide between more data, regularization, or a bigger model?
2. What does a large gap between training and validation curves usually mean — and what shrinks it?

Convex vs non-convex optimizationStatisticsBoyd
Interview questions to prep
1. What does convexity guarantee for optimization?
2. Are deep neural network losses convex? Why does SGD still work?
Gradient descent, SGD, mini-batch SGDStatisticsSebastian Ruder
Interview questions to prep
1. Compare batch GD, SGD, and mini-batch SGD — trade-offs in compute, noise, and convergence.
2. Why does SGD with momentum converge faster than vanilla SGD?
Code gradient descent for a scalar lossStatisticsNeetCode ML
Interview questions to prep
1. Implement gradient descent for a simple squared-error objective and explain the update rule line by line.
2. How would you debug a training run where gradient descent diverges after a few steps?
Momentum, Nesterov, Adam, AdamW, schedulersStatisticsfast.ai
Interview questions to prep
1. Compare Adam, AdamW, and SGD with momentum — which would you reach for first and why?
2. Why is the AdamW correction important when using weight decay with adaptive optimizers?
3. What's the role of learning-rate warmup and cosine schedules?

References & further reading