“Before I became an intern at Microsoft Research Asia (MSR Asia), my knowledge of the institute was a paper on ResNet (Residual Network),” said Changho Hwang. “In the paper, researchers at MSR Asia introduced the idea of ‘residual learning’ and made ResNet a milestone in the development of computer vision technology.”
Hwang’s first impression of MSR Asia was that it was the place for cutting-edge technological research conducted by top innovative talents.
Join MSR Asia and “Do the right thing in the right way”
During the second year of his PhD studies, Hwang became an intern at MSR Asia thanks to the recommendation of his supervisor at the Korean Advanced Institute of Science and Technology (KAIST). After two internships at the lab, one during the winter of 2018 and one during the summer of 2019, Hwang developed a new understanding of MSR Asia. He decided that upon graduation, his career goal would be to join MSR Asia and further pursue forward-looking technological research. Hwang said, “At the time, some of my classmates and colleagues had introduced me other laboratories and companies, but my internship experiences made MSR Asia a clear choice for me. I preferred the working environment and research atmosphere here. This place enabled me to focus on the areas I was really interested in.”
According to Hwang, what’s most attractive about MSR Asia is that it always does the right thing in the right way. MSR Asia never blindly follows technology trends. Rather, it sets unique strategies and research directions and always looks at the bigger picture while focusing on cutting-edge technologies.
The people Hwang worked with during his internships and the diverse research directions found at MSR Asia were also important reasons behind Hwang’s decision to join the lab. MSR Asia boasts a group of extremely professional yet convivial researchers. Hwang’s mentor during his internships was highly approachable and offered him a great deal of freedom and solid academic support for his research. His colleagues were also warm and helpful both in and out of the office, and allowed Hwang to feel at home despite being abroad. Furthermore, among the cutting-edge research endeavors undertaken by MSR Asia, Hwang discovered not only research areas and projects that matched his expertise in electrical engineering but also a multitude of interdisciplinary research directions that offered researchers opportunities to expand the breadth and depth of their academic pursuits. Therefore, after graduating from his doctorate program in 2022, Hwang quickly decided to join MSR Asia and became a member of the Networking Infrastructure Group. He currently holds a position as a researcher at MSR Asia – Vancouver.
Enhancing AI system performance: Hone achievements through progressive research
During his internship, Hwang was assigned to a team tasked with optimizing the performance of GPUs that support the operation of artificial intelligence (AI) models. At the time, Hwang’s mission was clear: to find a new way to improve the throughput and utilization rate of AI systems through software-hardware collaborative designing. However, scientific research is often a long journey, and many studies do not yield immediate results. As an advocate of long-term research, Hwang did not see himself as a mere passerby in the research team. Instead, he continued to work with them for two years after returning to school. The subsequent research results achieved by the team won them the Best Paper Award at the MLArchSys 2022 conference.
Paper Title: Towards GPU driven Code Execution for Distributed Deep Learning
Paper link: https://chhwang.github.io/pubs/mlarchsys22_hwang.pdf (opens in new tab)
With the development of large models, GPUs had become increasingly crucial for training and deploying AI models, and the performance and utilization efficiency of GPUs directly affected AI development. Upon joining MSR Asia as a researcher, Hwang continued to focus on this area, except now, he was a project leader rather than a mere participant.
Hwang believed that the most advanced deep learning applications today required a large number of parallel GPUs to provide sufficient computing power. However, communication efficiency between GPUs and CPUs served as a restricting factor that affected the performance of AI models. This was because CPUs played the role of chief commander in the current GPU-driven communication mode of AI systems, where each CPU was responsible for assigning tasks to multiple GPUs, but there existed considerable delay in message transmission between them, leading to low efficiency in task execution and a waste of GPU resources.
In his research, Hwang’s goal was to enable a GPU to command itself, thereby improving communication efficiency. To this end, he and his colleagues in the group designed a GPU-driven code execution system, along with a DMA engine that could be directly driven by the GPU. This allowed GPUs to directly solve communication problems that were used to require CPU commands, thus reducing communication latency in AI systems and improving the utilization rate of GPU computing resources. This new method freed up occupied CPU resources in earlier communication modes, allowing CPUs to focus on their own work and GPUs to perform autonomous scheduling as well as to do what it did best: provide higher computational performance for AI models. This research has demonstrated that an AI system based on distributed GPUs are capable of having GPUs manage task scheduling on their own. The paper on this research has been accepted by the NSDI 2023 conference.
Paper Title: ARK: GPU driven Code Execution for Distributed Deep Learning
Paper link: https://www.usenix.org/system/files/nsdi23-hwang.pdf (opens in new tab)
“System performance optimization is an eternal topic,” said Hwang. “In the past decade or so, we have witnessed the rapid development of AI, with one of the main driving forces being the continuously strengthening support of computing power. Adequate computing power enables steady improvement in system performance, resulting in larger and more powerful AI models. Currently, there are two leading approaches to improving system performance: one is to enhance hardware such as GPUs, and the other is to propose new AI algorithms. Both approaches are challenging, and hardware design and manufacture can be very costly.”
With this understanding, Hwang and his colleagues proposed a hardware-algorithm collaborative design method that could serve as another effective solution for enhancing the performance of AI systems. And so, after successfully proving that GPUs could autonomously schedule and achieve performance improvement, Hwang went on to explore GPU scheduling algorithms to avoid scheduling conflicts and further improve communication efficiency among GPUs. “We hope that in the future, GPUs will be able to achieve autonomous scheduling without requiring additional DMA engines, thereby bringing the performance of AI systems to a new level,” Hwang said.
“At MSR Asia, I can freely choose my research direction”
The long-standing culture of openness, inclusion, and diversity at MSR Asia has been a great attraction to Hwang, who has now worked in this lab for more than 12 months with a deepening appreciation for it. According to him, “MSR Asia is more like a laboratory—it’s a true research institute. Everyone here is equal, and there is transparency in our work. People understand each other’s ideas and are able to stay on the same page. At MSR Asia, we enjoy the greater liberty of being able to choose our own research direction.”
In addition to fostering a free academic atmosphere within itself, MSR Asia also works closely with the global academic community, which certainly includes its counterparts in South Korea, in academic exchanges and talent training. For example, MSR Asia, together with Tsinghua University, Peking University, National University of Singapore, Seoul National University, and many other Asian universities, established the OpenNetLab, an open networking community and platform to promote the application and development of AI in networking research. Hwang’s supervisor at KAIST is also involved in this collaboration. Another example is the Microsoft Research Collaboration Program with MSIT (the Ministry of Science and ICT) of Korea, which has been a long-lasting program for talent training and academic research elevation targeting South Korean colleges and universities. For more than a decade, Microsoft Research has partnered with the Korea MSIT has served as a bridge for academic exchange with the South Korean academic community. Through programs such as these, scholars have carried out in-depth scientific research collaborations and enriched the talent pool for global computing research. After his internship at MSR Asia, Hwang had also participated in the MSIT program and the resulting paper won the Best Paper Award at the APSys 2021 conference.
Paper Title: Accelerating GNN Training with Locality Aware Partial Execution
Paper link: https://dl.acm.org/doi/10.1145/3476886.3477515 (opens in new tab)
As a part of MSR Asia and also the broader computer science academic ecosystem, these diverse programs for exchange and collaboration not only yield cutting-edge research achievements, but also serve as a “matchmaker” between MSR Asia and many scholars and students. In South Korea alone, more than 200 interdisciplinary talents have interned at MSR Asia to date, and many outstanding ones like Hwang have become researchers of MSR .
Thoughts on adhering to long-term research
Conducting scientific research is often a long and arduous journey, and maintaining a commitment to long-term research is no simple endeavor. Besides being persistent, Hwang also has a set of methods and insights for upholding this commitment.
Hwang believes that maintaining a high level of enthusiasm for research is essential to building a career in this field. He himself, for one, enjoys the entire process of discovering and solving problems in scientific research. “The goal of some jobs,” he said, “is to find the best way to avoid problems, but the mission of scientific research is to identify, confront, and solve problems. I sincerely enjoy the entire research process from discovering problems to solving them.”
In the course of long-term research, researchers may inevitably encounter obstacles or produce unsatisfactory results. For intstance, some of Hwang’s papers had been repeatedly rejected by conferences he’d submitted them to. Hwang believes that you should not be discouraged or resentful when you encounter these frustrations, but should rather reflect on yourself, review existing work, identify the problems, and then invest in new research. Hwang explained, “It is a process of self-persuasion, where you show yourself the value of research.”
When facing difficulties, Hwang believes that you should not confine yourself to the problem at hand, but should look away and relax for a while. Hwang, for example, would play the piano or chat with peers to break free from the invisible shackles. “A shift in perspective might take you right to the solution.”