logo IMB
Retour

Séminaire Images Optimisation et Probabilités

(Maths-IA) Training Neural Networks at Any Scale with Scion

Antonio Silveti-Falls

( CentraleSupélec )

Salle de conférences

11 juin 2026 à 11:15

Abstract: I will discuss optimization methods that leverage the linear minimization oracle (LMO) over a norm-ball and their application to training huge neural networks. In recent work, I have proposed with my coauthors a new stochastic family of algorithms that uses the LMO to adapt to the geometry of the problem. The resulting update rule unifies several existing optimization methods under a single framework, including spectral methods like Muon, which we are able to prove rigorous convergence results for. Furthermore, we propose an explicit choice of norm for deep architectures, which, as a side benefit, guarantees the transferability of hyperparameters like learning rate across model sizes. Experimentally, we demonstrate significant speedups on nanoGPT training using our algorithm, Scion, without any reliance on Adam.