Day 11 of 133

Calculus & gradients + DSA Bit Manipulation kickoff

Chain rule, partial derivatives, Jacobians, Hessians — what backprop will use.

DSA · NeetCode Bit Manipulation

Single NumberDSA · Bit Manipulation
Interview questions to prep
1. Why does XOR-all give the unique element when others appear twice?
2. How does this generalize when others appear three times (LC 137)?
Number OF 1 BitsDSA · Bit Manipulation
Interview questions to prep
1. Walk me through the bit trick used here, bit by bit on a small input.
2. Why XOR / AND / shift specifically — what property of that operation does the problem exploit?
3. What's the complexity in terms of bits (often O(32) → O(1)), and where could that break for big-int?
Counting BitsDSA · Bit Manipulation
Interview questions to prep
1. Walk me through the bit trick used here, bit by bit on a small input.
2. Why XOR / AND / shift specifically — what property of that operation does the problem exploit?
3. What's the complexity in terms of bits (often O(32) → O(1)), and where could that break for big-int?

Derivatives, partial derivatives, chain ruleStatistics3Blue1Brown
Interview questions to prep
1. Derive the chain rule for f(g(x)) and apply it to a 2-layer neural network.
2. What's the difference between a partial derivative and a directional derivative?
Gradients, Jacobians, HessiansStatisticsKhan Academy
Interview questions to prep
1. What does the gradient vector represent geometrically?
2. When would you need the full Jacobian or Hessian in ML?
3. Why do second-order methods (Newton) rarely scale to deep nets?
Backprop as the chain rule on a computation graphStatisticscolah
Interview questions to prep
1. Explain backpropagation as the chain rule applied to a computation graph.
2. Why do vanishing gradients happen, and how do ReLU / residual connections help?

References & further reading