Day 71 of 133

LLM foundations: pretraining, scaling laws, KV cache + DSA 2-D DP

Chinchilla scaling, decoder-only at inference, emergent abilities debate.

DSA · NeetCode 2-D DP

  • Interview questions to prep

    1. State the 2-D DP: indices, recurrence, base case. What's the order of fill?
    2. Can you reduce 2-D to 1-D by reusing rows or columns? Walk through the dependency direction.
    3. Top-down with memoization vs bottom-up — which is easier to reason about, and which is faster in practice?

GenAI · LLM foundations

  • Interview questions to prep

    1. Walk me through Chinchilla scaling laws — what's the data:parameters ratio?
    2. Why has 'compute-optimal' training overtaken 'parameter-optimal' as the design target?
    3. How would you turn noisy API or Wikipedia data into a pretraining corpus without contaminating evaluation?
  • Interview questions to prep

    1. How do you turn raw text into input-target pairs for GPT next-token prediction?
    2. What are block size, context window, and stride in a language-model dataset?
    3. How do train/validation splits prevent contamination in LLM pretraining or fine-tuning?
  • Decoder-only transformer for LLMsGenerative AIJay Alammar

    Interview questions to prep

    1. Walk me through one forward pass of a decoder-only LLM at inference time.
    2. What is the KV cache and why is it so important?
    3. Why do modern LLMs need billions of parameters — what capacity, memorization, and generalization trade-offs are involved?
    4. What changes architecturally or operationally when you move from a small language model to a frontier-scale LLM?
  • Interview questions to prep

    1. Define 'emergent abilities' in LLMs — and why some researchers say they're a measurement artifact.
    2. What does the 'mirage' paper claim, and how does the choice of metric drive apparent emergence?

References & further reading