Self-maintaining [networked] systems: The rise of datacenter robotics!
- Freddie Hong ,
- Iason Sarantopoulos ,
- Elliott Hogg ,
- David Richardson ,
- Yizhong Zhang ,
- Hugh Williams ,
- David Sweeney ,
- Andromachi Chatzieleftheriou ,
- Ant Rowstron
Published by ACM
The vision of self-maintaining systems is to make cloud hardware automatically servicing and repairing using robotics. We define a self-maintaining system as one where software can control robotics that can automatically perform hardware maintenance tasks and repair operations. This reduces failure service windows and lowers the risk of repairs causing further cascading failures and outages. Self-maintaining systems are not purely reactive to failures, but also do proactive maintenance before failures occur which reduces future hardware failures. Operating an entire datacenter as a self-maintaining system is many years away, and we present four stages of automation, analogous to levels used for autonomous vehicles, required to reach the full vision for datacenters.
To experiment with and learn about self-maintaining systems we have focused on datacenter networking. We have created basic robots that support common network maintenance tasks, such as reseating and cleaning optical transcei-vers and replacing optical fiber cables. The advantages of self-maintaining networks are lower costs and increased availability and reliability. Key is a cross-layering co-design approach; the core cloud services are co-designed with the robotic systems performing the repairs and maintenance. The services control the robots, and this is very analogous to how Software Defined Networking has evolved for broader network management.