Google DeepMind Introduces Decoupled DiLoCo: An Asynchronous Training Architecture Achieving 88% Goodput Under High Hardware Failure Rates
Training frontier AI models is, at its core, a coordination problem. Thousands of chips must communicate with each…