Day 34 of 133
Trad-ML pitfall drills (leakage, imbalance, calibration)
The three failure modes that show up in nearly every ML interview.
DSA · NeetCode Binary Search
- Time Based Key Value StoreDSA · Binary Search
Interview questions to prep
- State your loop invariant precisely — what must be true on every iteration?
- Why does the loop terminate, and how do you avoid infinite loops on the search-space update?
- Walk through edge cases: empty array, target smaller than min, target larger than max, duplicates.
ML · Cross-validation & evaluation
Interview questions to prep
- When does k-fold leak data, and what does TimeSeriesSplit do differently?
- Why is stratified k-fold important for imbalanced classification?
Interview questions to prep
- Why do you need both a validation and a test set for hyperparameter tuning?
- What is nested cross-validation and when is it worth the cost?
- How would your split strategy change for time-series forecasting vs random tabular rows?
Interview questions to prep
- Walk through three common ways data leakage sneaks into an ML pipeline.
- How would you build a pipeline that prevents leakage when scaling features?
ML · Imbalanced classification
Interview questions to prep
- Compare random oversampling, undersampling, SMOTE, and class weighting — when does each help?
- Why can SMOTE leak when applied before cross-validation?
Interview questions to prep
- Why does focal loss help with extreme imbalance in object detection?
- Compare focal loss vs class-weighted cross-entropy — when does focal actually win?
Interview questions to prep
- Why is accuracy a terrible metric for imbalanced classification?
- When do you reach for F1, F-beta, MCC, or PR-AUC?
References & further reading
- scikit-learn user guide ↗scikit-learn
- Eugene Yan — applied ML writing ↗Eugene Yan