[[Course - Theories of Deep Learning MT25]]U Diagonal scaling Preconditioning Adam (Adaptive moment estimation)