Day 35 of 133
Trad-ML consolidation + week-5 wrap
Wrap classical ML; record yourself answering the 5 hardest questions.
DSA · NeetCode Heap / Priority Queue
- Design TwitterDSA · Heap / Priority Queue
Interview questions to prep
- Why is a heap the right structure? Could a balanced BST or sorted list work — why is heap better?
- Explain the heap-of-k pattern: keep size k, push new, pop if over k. What's the resulting complexity?
- What does the comparator look like, and how would you tweak it to flip min/max behaviour?
ML · Regularization (L1, L2, ElasticNet)
Interview questions to prep
- Why does L1 produce sparse solutions while L2 doesn't? Show the geometric picture.
- When do you use ElasticNet over L1 or L2 alone?
Interview questions to prep
- Why is early stopping equivalent to L2 regularization in some cases?
- How would you choose the early-stopping patience and what happens when it's too small or too large?
Interview questions to prep
- Show that L2 regularization corresponds to a Gaussian prior on weights and L1 to a Laplace prior.
- When does the Bayesian framing actually change a modeling decision in practice?
ML · Feature engineering
Interview questions to prep
- When do tree models NOT need feature scaling, and when do they (gradient boosting libraries with regularization)?
- When would you apply a log transform vs Box-Cox?
Interview questions to prep
- Compare one-hot, target, frequency, and hashing encoders — trade-offs in cardinality and leakage.
- Why is target encoding leak-prone and how does k-fold target encoding fix it?
Interview questions to prep
- When is mean/median imputation harmful?
- Why do tree models often handle missing values natively while linear models cannot?
Interview questions to prep
- How do you detect and handle outliers using box plots, robust scaling, winsorization, or model choice?
- Why does multicollinearity destabilize linear models, and why are tree models less sensitive?
- How would you distinguish true data drift from a one-off outlier spike in production?
ML · Semi-supervised and proxy labels
Interview questions to prep
- When would you use pseudo-labeling, weak supervision, or proxy labels instead of waiting for fully supervised labels?
- How do proxy labels introduce bias, and how would you validate that they predict the true product target?
- What safeguards prevent a semi-supervised training loop from reinforcing its own mistakes?
References & further reading