AIBench Training: Balanced Industry-Standard AI Training Benchmarking
- Fei Tang ,
- Wanling Gao ,
- Jianfeng Zhan ,
- Chuanxin Lan ,
- Xu Wen ,
- Lei Wang ,
- Chunjie Luo ,
- Jiahui Dai ,
- Zheng Cao ,
- Xingwang Xiong ,
- Zihan Jiang ,
- Tianshu Hao ,
- Fanda Fan ,
- Fan Zhang ,
- Yunyou Huang ,
- Jianan Chen ,
- Mengjia Du ,
- Rui Ren ,
- Chen Zheng ,
- Daoyi Zheng ,
- Haoning Tang ,
- Kunlin Zhan ,
- Biao Wang ,
- Defei Kong ,
- Minghe Yu ,
- Chongkang Tan ,
- Huan Li ,
- Xinhui Tian ,
- Yatao Li ,
- Gang Lu ,
- Junchao Shao ,
- Zhenyu Wang ,
- Xiaoyu Wang ,
- Hainan Ye
Earlier-stage evaluations of a new AI architecture/system need affordable AI benchmarks, while using a few AI component benchmarks alone in the other stages may lead to misleading conclusions. This paper proposes a balanced benchmarking methodology. Performing an exhaustive survey on Internet service AI domains, we identify and implement seventeen representative AI tasks with the state-of-the-art models to guarantee the diversity and representativeness of the benchmarks. Meanwhile, we keep a benchmark subset to a minimum for affordability. We contribute by far the most comprehensive AI training benchmark suite with seventeen industry partners. The evaluations show: (1) AIBench Training outperforms MLPerf Training in terms of the diversity and representativeness of model complexity, computational cost, convergent rate, computation and memory access patterns, and hotspot functions; (2) With respect to the AIBench full benchmarks, its subset shortens the benchmarking cost by 54%, while maintaining the primary workload characteristics; (3) The performance ranking shows the single-purpose AI accelerator like TPU with the optimized TensorFlow framework performs better than that of GPUs while losing the latters’ general support for a variety of AI models. The AIBench Training specifications, source code, testbed, and performance numbers are publicly available from the web site this http URL.