Day 49 of 133
CV consolidation + DSA Backtracking
Run 60-min CV breadth quiz; rehearse mAP, IoU, U-Net, DETR.
DSA · NeetCode Backtracking
- Palindrome PartitioningDSA · Backtracking
Interview questions to prep
- Walk through your pruning strategy — what subtrees do you skip and why is it safe?
- Where does memoization apply? Could this be a DP problem in disguise?
- What's the worst-case time complexity, and what's the depth of the recursion stack?
DL · CNN architectures
Interview questions to prep
- What problem did ResNet's residual connections actually solve?
- Why did 1×1 convs become so important (Inception, bottleneck blocks)?
Interview questions to prep
- Explain why training error went UP with depth before ResNet.
- Walk me through a residual block.
Interview questions to prep
- How do depthwise separable convolutions reduce compute?
- What does EfficientNet's compound scaling do that one-axis scaling doesn't?
DL · ViT, CLIP, multimodal
Interview questions to prep
- How does ViT tokenize an image, and what's the role of the [CLS] token?
- When does a ViT beat a CNN, and when does data-hungriness hurt it?
Interview questions to prep
- How does CLIP enable zero-shot image classification?
- Walk me through CLIP's contrastive training objective.
Interview questions to prep
- Compare contrastive learning, masked prediction, and autoencoding as self-supervised objectives.
- How would you evaluate whether a self-supervised embedding transfers to a downstream product task?
- What data leakage or shortcut-learning failure modes appear in self-supervised pretraining?
Interview questions to prep
- How do multimodal LLMs like LLaVA fuse vision encoders with language models?
- Compare early fusion vs late fusion in vision-language models — what does each cost in compute and quality?
References & further reading
- CS231n — CNNs for visual recognition ↗Stanford
- Papers with Code — SOTA leaderboards ↗Papers with Code
- Lilian Weng — Self-supervised representation learning ↗Lilian Weng