ML Interview Roadmap

Deep Learning

Transformers From First Principles

Understand the transformer stack deeply enough to explain scaling, context handling, and attention trade-offs.

Recommended on day 30120 minutesAdvanced

Learning objectives

• Explain self-attention, positional encoding, and multi-head attention
• Contrast encoder-decoder setups with decoder-only LLMs
• Reason about context length, memory, and inference cost

Interview prompts

• Why do transformers parallelize better than RNNs?
• What breaks when context windows grow without retrieval support?

Prerequisites

Backpropagation and Optimization