Day 89 of 133
Guardrails & safety: input/output filters, jailbreaks, red-teaming
Defense-in-depth; classify the major jailbreak categories.
DSA · NeetCode Greedy
- Valid Parenthesis StringDSA · Greedy
Interview questions to prep
- Prove the greedy choice — why is the locally-optimal pick safe globally? (Exchange argument or staying-ahead.)
- When does greedy fail on a similar-looking problem, and what would you reach for instead (DP, BFS)?
- Walk through edge cases that often break naive greedy: ties, negatives, single element.
GenAI · Guardrails & safety
Interview questions to prep
- Compare input vs output filtering — what does each catch and miss?
- What latency does an output-moderation step add, and how do you keep it under your SLO?
Interview questions to prep
- Walk through three categories of jailbreak attacks, and how you'd defend against each.
- Why is many-shot jailbreaking so effective on long-context models, and what mitigates it?
Interview questions to prep
- How would you build a red-teaming process for a customer-facing LLM?
- How would you scale red-teaming with automated attackers without missing novel attack vectors?
References & further reading