Day 33 of 133
Trad-ML breadth review (regression, trees, ensembles, clustering)
60-min self-quiz on weeks 3-4. Note the questions you fumble.
DSA · NeetCode Stack
- Daily TemperaturesDSA · Stack
Interview questions to prep
- What's the monotonic-stack invariant, and how does each pop give you the answer?
- Why is the total work O(n) when the inner loop looks O(n²)?
ML · Linear regression
Interview questions to prep
- Derive the OLS solution θ = (XᵀX)⁻¹Xᵀy. When is XᵀX not invertible?
- Show that OLS = MLE under Gaussian noise.
Interview questions to prep
- Walk through the four classical assumptions of linear regression and how to diagnose violations.
- What's heteroscedasticity and how do you fix it?
Interview questions to prep
- Compare MSE, MAE, and Huber loss — what do you use when outliers matter?
- Why is Huber loss differentiable and robust at the same time?
Interview questions to prep
- Implement a vectorized linear regression forward pass for X @ w + b and state the expected tensor shapes.
- Implement one gradient-descent training step for linear regression and explain loss vs cost vs prediction error.
- Does sklearn's LinearRegression use gradient descent or ordinary least squares? Why does that matter in an interview?
ML · Gradient boosting (XGBoost, LightGBM, CatBoost)
Interview questions to prep
- Walk me through how GBM fits each new tree on the negative gradient of the loss.
- Why is GBM more sensitive to learning rate than Random Forest?
Interview questions to prep
- What does XGBoost do differently from vanilla GBM that made it dominate Kaggle?
- How does XGBoost handle missing values automatically?
Interview questions to prep
- When would you pick LightGBM over XGBoost?
- What problem does CatBoost's ordered boosting solve?
References & further reading
- StatQuest — Statistics & ML playlists ↗YouTube
- scikit-learn user guide ↗scikit-learn