Build reliable GenAI products

Generative AI and LLM Engineering

Prepare for LLM fundamentals, RAG, evaluation, agents, and the practical systems questions now common in AI engineer loops.

Featured topics

4 topic cards built for interview prep

Each topic includes a summary, practical learning goals, representative interview prompts, and a suggested roadmap day.

Practice prompts

Daily-plan topics tied directly to this pillar

These are pulled from the same 133-day roadmap content used by Browse Questions.

Day 71GenAI · LLM foundations

Pretraining: data, scaling laws (Chinchilla)

  • Walk me through Chinchilla scaling laws — what's the data:parameters ratio?
  • Why has 'compute-optimal' training overtaken 'parameter-optimal' as the design target?
Day 71GenAI · LLM foundations

Build a GPT dataset for next-token prediction

  • How do you turn raw text into input-target pairs for GPT next-token prediction?
  • What are block size, context window, and stride in a language-model dataset?
Day 71GenAI · LLM foundations

Decoder-only transformer for LLMs

  • Walk me through one forward pass of a decoder-only LLM at inference time.
  • What is the KV cache and why is it so important?
Day 71GenAI · LLM foundations

Emergent abilities & in-context learning

  • Define 'emergent abilities' in LLMs — and why some researchers say they're a measurement artifact.
  • What does the 'mirage' paper claim, and how does the choice of metric drive apparent emergence?
Day 72GenAI · Tokenization deep-dive

BPE algorithm step-by-step

  • Walk through the BPE training algorithm.
  • Why does BPE result in different tokenizations for similar words across languages?
Day 72GenAI · Tokenization deep-dive

Vocabulary size trade-offs

  • Why is vocabulary size a critical design choice — what does increasing it cost?
  • How does vocab size affect throughput and memory of the embedding + LM-head layers?
Day 72GenAI · Tokenization deep-dive

Tokenization quirks: numbers, code, multilingual

  • Why do LLMs struggle with arithmetic, and how does tokenization contribute?
  • Why are non-Latin-script languages disproportionately expensive to serve, and how do you fix it?
Day 73GenAI · Decoding strategies

Greedy, beam, top-k, top-p, temperature

  • Compare greedy, beam, top-k, and nucleus (top-p) decoding.
  • Why is beam search usually a bad choice for open-ended generation?