Day 99 of 133

Speech: ASR (Whisper) + TTS + streaming

Encoder-decoder vs CTC; Tacotron / FastSpeech; streaming latency.

DSA · NeetCode Trees

Binary Tree Level Order TraversalDSA · Trees
Interview questions to prep
1. Walk through BFS with queue. How do you cleanly separate one level from the next?
2. Can you do this DFS-recursively while still grouping by level?

ASR: from HMMs to WhisperDeep LearningOpenAI Whisper
Interview questions to prep
1. Walk through Whisper's encoder-decoder design and weakly-supervised training data.
2. Compare CTC vs attention-based seq2seq for ASR.
TTS: Tacotron, FastSpeech, neural vocodersDeep LearningGoogle
Interview questions to prep
1. Compare two-stage (text → mel → wav) vs end-to-end TTS.
2. Why does prosody / expressive control remain hard in TTS, and how do modern systems handle it?
Streaming ASR: latency vs accuracyDeep LearningRead
Interview questions to prep
1. What's the trade-off between latency and accuracy in streaming ASR?
2. How would you design a chunked / look-ahead streaming ASR to keep latency under 300ms?

References & further reading