Introduction
Large-scale language models have revolutionized natural language processing tasks, and researchers are exploring their potential for enhancing human-robot interaction and communication. In this post, we will present our co-speech gesturing chat system, which integrates GPT-3/ChatGPT with a gesture engine to provide users with a more flexible and natural chat experience. We will explain how the system works and discuss the synergistic effects of integrating robotic systems and language models.
Co-Speech Gesturing Chat System: How it works
Our co-speech gesturing chat system operates within a browser. When a user inputs a message, GPT-3/ChatGPT generates the robot’s textual response based on a prompt carefully crafted to create a chat-like experience. The system then utilizes a gesture engine to analyze the text and select an appropriate gesture from a library associated with the conceptual meaning of the speech. A speech generator converts the text into speech, while a gesture generator executes co-speech gestures, providing audio-visual feedback expressed through a CG robot. The system leverages various Azure services, including Azure Speech Service for speech-to-text conversion, Azure Open AI service for GPT-3-based response generation, and Azure Language Understanding service for concept estimation. The source code of the system is available on GitHub (opens in new tab).
MSRAbot DIYKit
In this post, we have utilized our in-house developed robot named MSRAbot, originally designed for a platform for human-robot interaction research. As an additional resource for readers interested in the robot, we have developed and open-sourced a DIYKit for MSRAbot (opens in new tab). This DIYKit includes 3D models of the parts and step-by-step assembly instructions, enabling users to build the robot’s hardware using commercially available items. The software needed to operate the robot is also available on the same page.
The Benefits of Integrating Robotic Systems and Language Models
The fusion of existing robot gesture systems with large-scale language models has positive effects for both components. Traditionally, studies on robot gesture systems have used predetermined phrases for evaluation. The integration with language models enables evaluation under more natural conversational conditions, which promotes the development of superior gesture generation algorithms. On the other hand, large-scale language models can expand the range of expression by adding speech and gestures to their excellent language responses. By integrating these two technologies, we can develop more flexible and natural chat systems that enhance human-robot interaction and communication.
Challenges and Limitations
While our co-speech gesturing chat system appears straightforward and promising, it also encounters limitations and challenges. For example, the use of language models poses risks associated with language models, such as generating biased and inappropriate responses. Additionally, the gesture engine and concept estimation must be reliable and accurate to ensure the overall effectiveness and usability of the system. Further research and development are needed to make the system more robust, reliable, and user-friendly.
Conclusion
In conclusion, our co-speech gesturing chat system represents an exciting advance in the integration of robotic systems and language models. By using a gesture engine to analyze speech text and integrating GPT-3 for response generation, we have created a chat system that offers users a more flexible and natural chat experience. As we continue to refine and develop this technology, we believe that the fusion of robotic systems and language models will lead to more sophisticated and beneficial systems for users, such as virtual assistants and tutors.
About our research group
Visit our homepage: Applied Robotics Research
Learn more about this project
- [Project page] Gesture Generation for Service Robots
- [Paper] Labeling the Phrases of a Conversational Agent with a Unique Personalized Vocabulary
- [Paper] Integration of Gesture Generation System Using Gesture Library with DIY Robot Design Kit (opens in new tab)
- [Paper] Design of conversational humanoid robot based on hardware independent gesture generation (opens in new tab)
- [GitHub] Sample code to test co-speech gestures using Toyota HSR robot (opens in new tab)