Day 98 of 133

Infra wrap + LLMOps consolidation

Cross-link distributed training + serving + cost questions.

DSA · NeetCode Greedy

  • Interview questions to prep

    1. Prove the greedy choice — why is the locally-optimal pick safe globally? (Exchange argument or staying-ahead.)
    2. When does greedy fail on a similar-looking problem, and what would you reach for instead (DP, BFS)?
    3. Walk through edge cases that often break naive greedy: ties, negatives, single element.

LLMOps · Caching, routing, cost

  • Interview questions to prep

    1. Compare exact-match prompt caching vs semantic caching — when does each fit?
    2. How would you measure semantic-cache safety — what's the false-hit failure mode?
  • Interview questions to prep

    1. How would you route requests across GPT-5, Claude 4.5, and a small open-source model?
    2. Walk through how a verifier model gates the cheap-model output before falling back to the expensive one.
  • Interview questions to prep

    1. What does vLLM's PagedAttention do for throughput?
    2. Compare vLLM vs TensorRT-LLM vs SGLang.
  • Interview questions to prep

    1. How would you diagnose high first-token latency vs high tokens-per-second latency?
    2. How do rate limits, concurrency limits, queues, and retries interact in an LLM API gateway?
    3. What metrics tell you whether the bottleneck is prompt length, model compute, KV cache pressure, or downstream tools?

MLOps · Cost & scaling

  • Interview questions to prep

    1. How would you model the unit cost of a prediction in production?
    2. What levers reduce inference cost (batching, quantization, caching, distillation)?
  • Interview questions to prep

    1. Compare CPU-based HPA vs queue-based KEDA scaling for ML inference.
    2. Why does GPU-pinned inference often defeat HPA, and how do you actually scale GPU pods?
  • Interview questions to prep

    1. How would you train safely on spot instances (checkpointing, retries)?
    2. When does spot training become NET more expensive than on-demand — what's the breakeven?

References & further reading