• For CV:
    • use Adam when you don’t have a choice, or don’t know anything that’s more specific to the problem
    • Adam used to be popular until batch norm, which smooths out training and reduces a lot of instability. And then SGD become popular and tends to outperform Adam in well-tuned settings.