Day 14 of 133

Math/stats consolidation + DSA Sliding Window finish

Recap weeks 1-2 with rehearsed answers. Wrap Sliding Window pattern.

DSA · NeetCode Sliding Window

  • Minimum Window SubstringDSA · Sliding Window

    Interview questions to prep

    1. Walk through your shrink condition — when do you safely move the left pointer?
    2. How do you handle duplicate characters in t (e.g., 'aabb')?
  • Sliding Window MaximumDSA · Sliding Window

    Interview questions to prep

    1. Is this a fixed-size or variable-size window? Why does that fit this problem?
    2. What's the invariant inside the window, and how do you maintain it on shrink/expand?
    3. Why is the overall pass O(n) even though the inner loop looks like it could be O(n²)?

ML · Bias-variance trade-off

  • Interview questions to prep

    1. Decompose expected squared error into bias², variance, and irreducible noise.
    2. Why does adding more training data reduce variance but not bias?
  • Interview questions to prep

    1. Explain the double-descent phenomenon. How does it overturn classical bias-variance intuition?
    2. Why do over-parameterized models often generalize well in deep learning?
  • Interview questions to prep

    1. How do you read a learning curve to decide between more data, regularization, or a bigger model?
    2. What does a large gap between training and validation curves usually mean — and what shrinks it?

Math · Optimization for ML

  • Interview questions to prep

    1. What does convexity guarantee for optimization?
    2. Are deep neural network losses convex? Why does SGD still work?
  • Gradient descent, SGD, mini-batch SGDStatisticsSebastian Ruder

    Interview questions to prep

    1. Compare batch GD, SGD, and mini-batch SGD — trade-offs in compute, noise, and convergence.
    2. Why does SGD with momentum converge faster than vanilla SGD?
  • Interview questions to prep

    1. Implement gradient descent for a simple squared-error objective and explain the update rule line by line.
    2. How would you debug a training run where gradient descent diverges after a few steps?
  • Interview questions to prep

    1. Compare Adam, AdamW, and SGD with momentum — which would you reach for first and why?
    2. Why is the AdamW correction important when using weight decay with adaptive optimizers?
    3. What's the role of learning-rate warmup and cosine schedules?

References & further reading