Day 104 of 133

RLHF & RLAIF deep dive

Reward modeling pipeline; Constitutional AI; reward hacking detection.

DSA · NeetCode Tries

Design Add And Search Words Data StructureDSA · Tries
Interview questions to prep
1. Why a trie over a hash map — what queries does the trie make cheaper?
2. What's the time/space trade-off vs storing all suffixes?
3. How would you support deletion or wildcard matching efficiently?

RLHF pipeline: SFT → RM → PPOGenerative AIHF
Interview questions to prep
1. Walk through the three stages of RLHF end-to-end.
2. What goes wrong if your reward model is poorly calibrated?
RLAIF, Constitutional AI, RLEFGenerative AIAnthropic
Interview questions to prep
1. What is Constitutional AI, and how does it reduce reliance on human labels?
2. Where does RLAIF struggle — what kinds of judgments still need humans?
Reward hacking & alignment evalsGenerative AIRafailov et al.
Interview questions to prep
1. Walk through how reward hacking shows up in practice and how you'd detect it.
2. How would you design an eval suite that catches sycophancy and excessive hedging in an RLHF'd model?

References & further reading