DSSM<\/a>. Stands for Deep Structural Simulated Model. The idea is very simple. We take the web search scenario as a test case. The idea is that you have a query, you want to identify relevant documents, but unfortunately, the documents are written by the author. Query issued by the users using very, very different vocabulary and language. There\u2019s a mismatch. So the deep learning idea is to map both query and document into a common vector space we call sematic space. In that space all these concepts are represented using vectors and the distance between vectors measures the sematic similarity. The idea is very straightforward. Fortunately, we got a lot of Bing click data. User issue a query and they click a document.<\/p>\nHost: Right.<\/b><\/p>\n
Jianfeng\u00a0Gao: These are weak supervision training data. We have tons of this and then we train the deep learning model called DSSM. It\u2019s fantastic. Encouraged by this result, we decided to form a deep learning team. The key concept of deep learning is representation learning. You know, let\u2019s take natural language as an example, okay? Let\u2019s say natural language sentence consists of words and phrases. These are symbolic tokens. The good thing about these symbolic tokens is that people can understand\u00a0them\u00a0easily. But they are discrete. Meaning that,\u00a0if you are given two words, you want to ask a question: how similar they are. Deep learning is trying to map all these words into semantic representations so that you can measure the sematic similarity. And this mapping is done through a non-linear function, and the deep learning model, in some sense, is an implementation of this non-linear function.<\/p>\n
Host: Okay.<\/b><\/p>\n
Jianfeng\u00a0Gao: And it\u2019s a very effective implementation, in the sense that you can add more, more, more layers, make them very deep, and you have\u00a0a\u00a0different model architecture to capture different aspects of the input and even identify the features at a different abstract level. Then this model needs large amounts of data to train. I think a half century ago, we don\u2019t have the compute power to do this. Now we have. And we also have large amounts of training data for this.<\/p>\n
Host: Yeah.<\/b><\/p>\n
Jianfeng\u00a0Gao: That\u2019s why I think this deep learning take off.<\/p>\n
Host:\u00a0<\/b>Okay<\/b>. Well, let\u2019s talk a little bit about these representations and some of the latest research that\u2019s going on<\/b>\u00a0today<\/b>. In terms of the kinds of representations you\u2019re dealing with, we\u2019ve been talking about symbolic representations, both in language and mathematics, and you\u2019re moving into a space where you\u2019re dealing more with neural representations. And\u00a0<\/b>those two things \u2013\u00a0<\/b>that architecture is going to kind of set the stage for the work that we\u2019re going to talk about in a minute, but I would like you to talk a little bit about both the definitions of symbolic representations and neural representations<\/b>,\u00a0<\/b>and why these neural representations represent an interesting, and possibly fruitful, line of research?<\/b><\/p>\n
Jianfeng\u00a0Gao: Let\u2019s talk about two different spaces. One is called symbolic space. The other is the neural space. They have different characteristics. The symbolic space, take natural language as an example, is what we are familiar with,\u00a0where the concepts are represented using words, phrases and sentences. These are discrete. The problem of this space is that natural language is highly ambiguous, so the same concept can be represented using very different words and phrases. And the same words or sentence can mean two or three different things given the context, but in the symbolic space it\u2019s hard to tell.<\/p>\n
Host: Yeah.<\/b><\/p>\n
Jianfeng\u00a0Gao: In the neural space it\u2019s different. All the concepts are going to be represented using vectors, and the distance between vectors measures the relationship at the sematic level. So we already talked about representation learning, which is the major task of deep learning.<\/p>\n
Host: Yeah.<\/b><\/p>\n
Jianfeng\u00a0Gao: Deep learning, in some sense, is to map all the knowledge from the symbolic space to neural space because in the neural space, all the concepts are represented using continuous vectors. It\u2019s a continuous space. It has a lot of very nice mass\u00a0properties. It\u2019s very easy to train. That\u2019s why, if you have a large amount of data and you want to train a highly non-linear function, it\u2019s much easier to do so in the neural space than in the symbolic space, but the disadvantage of the neural space is it\u2019s not human comprehensible.\u00a0Because if I give you,\u00a0say,\u00a0okay, these two concepts are similar because the vectors of their representation are close to each other. How close they are? I don\u2019t know. It\u2019s hard to explain!<\/p>\n
Host: It\u2019s uninterpretable.<\/b><\/p>\n
Jianfeng\u00a0Gao: It\u2019s not interpretable. At all. That\u2019s why people believe that the neural net model is like a black box.<\/p>\n
Host: Okay.<\/b><\/p>\n
Jianfeng\u00a0Gao: It can give you very precise prediction, but it\u2019s hard to explain how the model came up with the prediction. This applies to some tasks like image recognition. Deep learning model does great job for tasks like this, but give a different task,\u00a0like\u00a0math\u00a0task. If I give you problem statement like,\u00a0let\u2019s say the population of a city is\u00a0five thousand\u00a0, it increases\u00a0by\u00a0ten percent\u00a0every year. What\u2019s the population after\u00a0ten\u00a0years? The deep learning would try to just map this text into a number without knowing how the number is come up with, but in this particular case, we need neural symbolic computing. Ideally, you need to identify how many steps you need to take to generate the result. And for each step, what are the functions? So this is a much tougher task.<\/p>\n
Host: Right.<\/b><\/p>\n
Jianfeng\u00a0Gao: I don\u2019t think the current deep learning model can solve.<\/p>\n
Host: All right, so, but that is something you\u2019re working on?<\/b><\/p>\n
Jianfeng\u00a0Gao: Yes.<\/p>\n
Host: You\u2019re trying to figure out how you can move from symbolic representations to neural representations and also have them be interpretable?<\/b><\/p>\n
Jianfeng\u00a0Gao: Yes, exactly.<\/p>\n
Host: Big task.<\/b><\/p>\n
Jianfeng\u00a0Gao: Yeah, yeah. There\u2019s a book called\u00a0Thinking Fast and S<\/i>low<\/i>. In that book it also describes two different systems that drive the way we think. They call this System\u00a0One\u00a0and System\u00a0Two. System\u00a0One\u00a0is like very intuitive, fast and emotional. So you ask me something. I don\u2019t need to think. I give you answer immediately because I already answered similar questions many, many times.<\/p>\n
Host: Right.<\/b><\/p>\n
Jianfeng\u00a0Gao: System\u00a0Two\u00a0is slower, more logical, more derivative. It\u2019s like you need some reasoning such as the question I just asked, right, the math problem\u00a0of\u00a0the population of the city.\u00a0You need to think harder. I think most of the state-of- the-art deep learning models are like System\u00a0One. It\u00a0trains\u00a0on large amounts of training data. Each training example is input-output pairs. So the model learns the mapping between input-output by fitting a non-linear function on the data. That\u2019s it. Without knowing how exactly the result is generated, but now we are working on, in some sense, System\u00a0Two. That\u2019s\u00a0neural symbolic. You not only need to identify to generate an answer, but also needs to figure out the intermediate steps you follow to generate the answer.<\/p>\n
(music plays)<\/i><\/b><\/p>\n
Host<\/b>:<\/b>\u00a0<\/b>Your group has several areas of research interest and I want you to be our tour guide today and take us on a couple of excursions to explore these areas. And let\u2019s start\u00a0<\/b>with\u00a0<\/b>an area called neural language modeling. So talk about some promising projects\u00a0<\/b>and<\/b>\u00a0lines of inquiry, particularly as they relate to neural symbolic reasoning and computing.<\/b><\/p>\n
Jianfeng\u00a0Gao: Neural language model is not a new topic. It\u2019s been there for many years. Only recently Google proposed a neural language model called BERT. It achieves state-of-the-art results on many\u00a0NLP\u00a0tasks because they use a new neural\u00a0network\u00a0architecture called a transformer. So the idea of this model is representation learning. Whatever text they take, they will represent using vectors. And we are working on the same problem, but we are taking a different approach. So we also want to learn representations\u00a0and then try to make the representations\u00a0as universal as possible in the sense that the same representation can be used by many different applications.\u00a0Historically, there are two approaches to achieve the goal. One is to use large data. The idea is that if I can collect all the data in the world, then I believe the representation learned from this data is universal. Because I see all of them. The other approach is\u00a0that, since the goal of this representation is to serve different applications, how about I train the model using application-specific objective functions across many, many different applications?\u00a0<\/b>So this is called multi-task learning. So Microsoft Research is taking the multi-task learning approach.\u00a0So we have\u00a0a\u00a0model called MT-DNN,\u00a0Unified Language Model.<\/p>\n
Host: So that\u2019s MT-DNN, so multi-task\u2026?<\/b><\/p>\n
Jianfeng\u00a0Gao: Stands for Multi-Task Deep Neural Network.\u00a0They, for those two models, the multi-task learning is applied at\u00a0a\u00a0different stage. The\u00a0pre-training\u00a0stage and the fine-tuning stage.\u00a0Yeah.\u00a0So this is the neural language model part.<\/p>\n
Host: Okay.<\/b><\/p>\n
Jianfeng\u00a0Gao: But mainly I would say this is still like System\u00a0One.<\/p>\n
Host: Still back to the thinking fast?<\/b><\/p>\n
Jianfeng\u00a0Gao: Yeah, thinking fast.\u00a0Fast thinking\u2026<\/p>\n
Host: Gotcha. That\u2019s a good anchor. Well, let\u2019s talk about an important line of work that you\u2019re tackling and it falls under the umbrella of vision and language. You call it VL.<\/b><\/p>\n
Jianfeng\u00a0Gao:\u00a0Uh-huh.\u00a0Vision-language.<\/p>\n
Host: Vision<\/b>–<\/b>language. Give us a snapshot of the current VL landscape in terms of progress in the field and then tell us what you\u2019re doing to advance the state<\/b>–<\/b>of<\/b>–<\/b>the<\/b>–<\/b>art.<\/b><\/p>\n
Jianfeng\u00a0Gao: This is called vision-language, the idea is the same. We still learn the representation. Now, since we are learning\u00a0a\u00a0hidden sematic space where all the objects would be represented as vectors no matter the original media of the object. It could be a text. It could be an image. It could be a video. So, remember we talked about the representation learning for natural language?<\/p>\n
Host: Right.<\/b><\/p>\n
Jianfeng\u00a0Gao: Now we extend the concept. Extend the modality for natural language to multi-modality to handle natural language, vision and video. The idea is,\u00a0okay, give me a video or image or text,\u00a0I will represent them using vectors.<\/p>\n
Host: Okay.<\/b><\/p>\n
Jianfeng\u00a0Gao: By doing so, if we do it correctly, then this leads to many, many interesting applications. For example,\u00a0you\u00a0can do image search. You just put a query. I want an image of sleeping. It will return all these images. See\u00a0that\u2019s\u00a0cross modality because the query is in natural language and the return result is an image. And you can also do image captioning, for example.<\/p>\n
Host: Okay.<\/b><\/p>\n
Jianfeng\u00a0Gao: It can be an image.<\/p>\n
Host: Right.<\/b><\/p>\n
Jianfeng\u00a0Gao: And the system will generate a description of the image automatically. This is very useful for, let\u2019s say, blind people.<\/p>\n
Host: Yeah.<\/b><\/p>\n
Jianfeng\u00a0Gao: Yeah.<\/p>\n
Host: Well, help me think though<\/b>,<\/b>\u00a0about other applications.<\/b><\/p>\n
Jianfeng\u00a0Gao: Other applications, as I said\u2026<\/p>\n
Host: Yeah.<\/b><\/p>\n
Jianfeng\u00a0Gao:\u00a0\u2026for blind people, we have a big project called the Seeing AI.<\/p>\n
Host: Right.<\/b><\/p>\n
Jianfeng\u00a0Gao: The idea is,\u00a0let\u2019s say if you are blind, you\u2019re walking on the street and you\u2019re wearing\u00a0a\u00a0glass. The glass would take pictures of the surroundings for you and immediately tell you,\u00a0oh, there\u2019s a car, there\u2019s a boy\u2026<\/p>\n
Host: So captioning audio?<\/b><\/p>\n
Jianfeng\u00a0Gao: Audio. And tell you what happens around you. Another project we are working on is called Visual Language Navigation. The idea is we build a 3D environment. It\u2019s a simulation, but it\u2019s a 3D environment. And they put a robot there. It\u2019s an agent.\u00a0And you can ask the agent to achieve a task by giving the agent natural language instructions: okay, go upstairs, turn left, open the door, grab a cup of coffee for me. Something like that. This is going to be very, very useful for scenarios like mixed-reality,\u00a0and\u00a0HoloLens.<\/p>\n
Host: I was just going to say<\/b>,<\/b>\u00a0you must be working with a lot of the researchers in VR and AR.<\/b><\/p>\n
Jianfeng\u00a0Gao: Yes. These are sort of potential applications, but we are at the early stage of developing this core technology in the simulated environment.<\/p>\n
Host: Right. So you\u2019re upstream in the VL category and as it trickles down into the various other applications people can adapt the technology to what they\u2019re working on.<\/b><\/p>\n
Jianfeng\u00a0Gao: Exactly.<\/p>\n
Host: Let\u2019s talk about the third area<\/b>,<\/b>\u00a0and I think this is one of the most fascinating right now<\/b>,<\/b>\u00a0and that\u2019s Conversational AI. I\u2019ve had a couple people on the podcast already who\u2019ve talked\u00a0<\/b>a little bit\u00a0<\/b>about this<\/b>.<\/b>\u00a0Riham Mansour and Patrice Simard, who\u2019s head of the Machine Teaching Group.<\/b><\/p>\n
Jianfeng\u00a0Gao: Yeah.<\/p>\n
Host: But I\u2019d like you to tell us about your work on the neural approaches to Conversational AI and how they\u2019re instantiating in the form of question answering agents, task oriented dialog systems, or what we might call bespoke AI, and bots\u2026 chatbots.<\/b><\/p>\n
Jianfeng\u00a0Gao: Yeah, these are all obviously different types of dialogs. Social chatbots\u00a0is extremely interesting. Do you know Microsoft\u00a0Xiaoice?<\/p>\n
Host: I know of it.<\/b><\/p>\n
Jianfeng\u00a0Gao: Yeah, it\u2019s a very popular social chatbot, it has attracted more than\u00a0six hundred\u00a0million users.<\/p>\n
Host: And is this in China or worldwide?<\/b><\/p>\n
Jianfeng\u00a0Gao: It\u2019s deployed in five different countries. So it has Chinese version, it has Japanese version, English version. It does have five different languages.<\/p>\n
Host:\u00a0<\/b>Wow<\/b>.<\/b><\/p>\n
Jianfeng\u00a0Gao: Yeah, it\u2019s very interesting.<\/p>\n
Host: Do you have it?<\/b><\/p>\n
Jianfeng\u00a0Gao: I have it on my WeChat.<\/p>\n
Host: All right, so tell me about it.<\/b><\/p>\n
Jianfeng\u00a0Gao: Yeah, this is AI agent, but the design goal of this\u00a0social chatbot\u00a0is different from let\u2019s say task-oriented bot. Task oriented is mainly to help you accomplish a particular task. For example, you can use it to book a movie ticket, reserve a table in the restaurant\u2026<\/p>\n
Host: Get directions\u2026<\/b><\/p>\n
Jianfeng\u00a0Gao: Yeah, get directions. And the social chatbot is designed as an AI companion, which can eventually establish emotional connections with the user.<\/p>\n
Host: Wow.<\/b><\/p>\n
Jianfeng\u00a0Gao: So you can treat it as a friend, as your friend.<\/p>\n
Host: So an AI friend instead of an imaginary friend.<\/b><\/p>\n
Jianfeng\u00a0Gao: Yes, it\u2019s an AI friend. It can chat with you about all sorts of topics. It can also help you accomplish your tasks if they\u2019re simple enough.<\/p>\n
Host:\u00a0<\/b>Right now<\/b>\u00a0I want to dive a little deeper on the topic of neural symbolic AI and this is proposing an approach to AI that borrows from mathematical theory on how the human brain encodes and processes symbols. And we\u2019ve talked about it a little bit, but what are you hoping<\/b>\u00a0that you\u2019ll\u00a0<\/b>accomplish with neural symbolic AI that we aren\u2019t accomplishing now?<\/b><\/p>\n
Jianfeng\u00a0Gao: As I said, the key difference between this approach with just the regular deep learning model is the capability of reasoning. The deep learning model is like black box\u00a0you cannot open. So you take input and get output. This model can,\u00a0on-the-fly,\u00a0identify the necessary components and\u00a0assemble\u00a0them on-the-fly.\u00a0That\u2019s the key difference.\u00a0In the old deep learning model, it\u2019s just one model: black box. Now it\u2019s not a black box. It\u2019s actually exactly like what people are thinking.<\/p>\n
Host:\u00a0<\/b>Mmm<\/b>-hmm.<\/b><\/p>\n
Jianfeng\u00a0Gao: When you face a problem, first if all you divide and conquer, right? You divide a complex problem into smaller ones. Then,\u00a0for each smaller one you identify, you\u2019re searching your memory, identify the solution. And you\u00a0assemble\u00a0all these solutions together to solve a problem. This problem could be unseen before. It could be a new problem.<\/p>\n
Host: Right.<\/b><\/p>\n
Jianfeng\u00a0Gao: That\u2019s the power of the neural symbolic approach.<\/p>\n
Host: So it sounds like, and I think this kind of goes back to the mission statement of your group, is that you are working with deep learning toward artificial general intelligence?<\/b><\/p>\n
Jianfeng\u00a0Gao: Yeah. This is a very significant step toward that, and it\u2019s about the knowledge re-usability, right? By learning the capability of decomposing complex problem into simpler ones, you know how to solve a new complex problem and reuse the existing technologies. This is the way we solve that\u00a0problem.<\/p>\n
Host: Okay.<\/b><\/p>\n
Jianfeng\u00a0Gao: I think the neural symbolic approach tries\u00a0to mimic the way people solve problems.<\/p>\n
Host: Right.<\/b><\/p>\n
Jianfeng\u00a0Gao: People\u2026 as I said, it\u2019s like System\u00a0One, System\u00a0Two\u2026 For these sophisticated problems, people\u2019s system is like System 2.<\/p>\n
Host: Right.<\/b><\/p>\n
Jianfeng\u00a0Gao:\u00a0You need to analyze the problem, identify the key steps, and\u00a0then,\u00a0for each step,\u00a0identify the solution.<\/p>\n
Host: All right, so our audience is very technical and I don\u2019t know if you could go in to a bit of a deeper dive on how you\u2019re doing this\u00a0<\/b>\u2013\u00a0<\/b>computationally, mathematically<\/b>\u00a0\u2013\u00a0<\/b>to construct these neural symbolic architectures?<\/b><\/p>\n
Jianfeng\u00a0Gao: Yeah, there are many different ways,\u00a0and the learning challenge is that we have a lot of data, but we don\u2019t have the labels for the intermediate steps.\u00a0So the model needs to learn these intermediate steps automatically. In some sense, these are hidden variables. There are many different ways of learning this.<\/p>\n
Host: Right.<\/b><\/p>\n
Jianfeng\u00a0Gao: So there are different approaches. One approach is called reinforcement learning. You try to assemble different ways to generate an answer and if it doesn\u2019t give you an answer,\u00a0you trace back and try different combinations. So\u00a0yeah,\u00a0that\u2019s one way of learning this. As long as the model has the capability of learning all sorts of combinations in very efficient ways, we can solve this problem. The idea is, if you think about how people solve sophisticated problems, when we\u2019re young,\u00a0we learn to solve these simple problems. Then we learn the skill. Then we combine\u00a0these\u00a0basic skills to solve more sophisticated ones. We try to mimic the human learning pattern using the neural symbolic models.<\/p>\n
Host:\u00a0<\/b>Mmm<\/b>-hmm.<\/b><\/p>\n
Jianfeng\u00a0Gao: So in that case, you don\u2019t need to label a lot of data. You\u00a0label some. Eventually, the model learns two things. One is, it learns to solve all these basic tasks, and more importantly, the model is going to learn how to assemble these basic skills to solve more sophisticated tasks.<\/p>\n
Host: The idea of pre-training models is getting a lot of attention right now and has been framed as \u201cAI in the big leagues\u201d or \u201ca new AI paradigm\u201d so talk about the work going on across the industry in pre-trained models and what MSR is bringing to the game<\/b>.<\/b><\/p>\n
Jianfeng\u00a0Gao: The goal of these pre-training models is to learn a universal representation of the natural language. Then there are two strategies of learning to the universal representation. One is to train the model on large amounts of data. If you get all the data in the world you can be pretty sure the model trained is universal.<\/p>\n
Host: Right.<\/b><\/p>\n
Jianfeng\u00a0Gao: The other is multi-task learning. And the\u00a0Unified\u00a0Language\u00a0Model is using the multi-task learning in the pre-training stage.<\/p>\n
Host: Okay.<\/b><\/p>\n
Jianfeng\u00a0Gao: We group the language model into three different categories. Given the left and right to predict the word in the middle, that\u2019s one task. The other task is, given\u00a0the\u00a0input sentence, produce\u00a0the\u00a0output sentence. Second. The third tasks is, given a sequence, you always want to predict the next word based on the history. So these are three very different tasks that cover a lot of natural language processing scenarios. And we use multi-task learning for this\u00a0Unified\u00a0Language\u00a0Model.\u00a0Given the training data we, you know, use three different objective functions to learn jointly…<\/p>\n
Host: Okay.<\/b><\/p>\n
Jianfeng\u00a0Gao: \u2026the model parameters. The\u00a0main\u00a0advantage of the\u00a0Unified\u00a0Language\u00a0Model is that it can be applied to both natural language understanding tasks and the natural language generation tasks.<\/p>\n
(music plays)<\/i><\/b><\/p>\n
Host: AI is arguably the most powerful technology to emerge in the last century and it\u2019s becoming ubiquitous in this century. Given the nature of the work you do<\/b>,<\/b>\u00a0and the potential to cause big disruptions both in technology and in the culture<\/b>,<\/b>\u00a0or society, is there anything that keeps you up at night? And if so, how are you working to anticipate and mitigate the negative consequences that might result from any of the work you\u2019re putting out?<\/b><\/p>\n
Jianfeng\u00a0Gao: Yeah, there are a lot of open questions. Especially, at Microsoft,\u00a0we are building AI products for millions of users, right? All our users are very different. Take Microsoft\u00a0Xiaoice, the chatbot system,\u00a0as an\u00a0example. In order to, you know, have a very engaging conversation,\u00a0sometimes the\u00a0Xiaoice\u00a0system will tell you some joke. You may find the joke very interesting, funny, but other people\u00a0may\u00a0find the joke offensive. That\u2019s about culture. It\u2019s very difficult to find the trade-off. You want the conversation interesting enough so that you can engage with the people, but you also don\u2019t want to offend people. So there are a lot of guidance about who is in\u00a0control. For example, if you want to switch a topic, do you allow your agent to switch a topic or agent always follow the topic\u2026<\/p>\n
Host: Of the user…<\/b><\/p>\n
Jianfeng\u00a0Gao: \u2026of the user? And generally, people agree that, for all the human\/machine systems, human needs to be in control all the time.\u00a0But in reality\u00a0there are\u00a0a lot of exceptions\u00a0for\u00a0what happens if the agent notices the user is going to hurt herself.<\/p>\n
Host: Right.<\/b><\/p>\n
Jianfeng\u00a0Gao: For example, in one situation,\u00a0we found\u00a0that\u00a0the user talked to\u00a0Xiaoice\u00a0for seven hours. It\u2019s already 2 am in the morning.\u00a0Xiaoice\u00a0forced\u00a0the\u00a0user to take a break. We have a lot of,\u00a0sort of,\u00a0rules embedded into the system to make sure that we build a system for good. People are not going to misuse the AI technology for something that is not good.<\/p>\n
Host: So are\u00a0<\/b>those<\/b>, like you say, you\u2019re actually building those kinds of things in like,\u00a0<\/b>\u201c<\/b>go to bed. It\u2019s past your bedtime\u2026?<\/b>\u201d<\/b><\/p>\n
Jianfeng\u00a0Gao:\u00a0Mmm-hmm. Or something like that, yeah.\u00a0I just remind you.<\/p>\n
Host: Right. So let\u2019s drill in a little on this topic just because I think one of the things that we think of when we think of dystopic manifestations of a technology that could convince us that it\u2019s human\u2026 Where does the psychological\u2026<\/b><\/p>\n
Jianfeng\u00a0Gao:\u00a0I\u2026.\u00a0I think the entire research committee is working together to set up some rules, to set up the right expectations for our users. For example, one rule I think,\u00a0I believe is true,\u00a0is that you should never confuse users. She\u2019s talking to a bot\u2026 or real human. You should never confuse users.<\/p>\n
Host: Forget about\u00a0<\/b>Xiaoice<\/b>\u00a0for now and just talk about the other stuff you\u2019re working on. Are there any<\/b>,<\/b>\u00a0sort of<\/b>,<\/b>\u00a0big issues in your mind that don\u2019t have to do with<\/b>, you know,<\/b>\u00a0users being too long with a chatbot or whatever, but kinds of unintended consequences that might occur from any of the other work?<\/b><\/p>\n
Jianfeng\u00a0Gao:\u00a0Well, for example,\u00a0with respect\u00a0to the deep learning model, right?<\/p>\n
Host: Right.<\/b><\/p>\n
Jianfeng\u00a0Gao: Deep learning model is a very powerful of predicting things. People use deep learning\u00a0models\u00a0for recommendations all the time, but there\u2019s a very serious limitation of these models, is that the model can learn correlation, but not causation. For example, if I want to hire\u00a0a\u00a0software developer, then I\u2019ve\u00a0got a lot of candidates. I ask the system to give me a recommendation. The deep learning model gives\u00a0me a recommendation,\u00a0and says,\u00a0oh, this guy\u2019s good. And then I ask the system,\u00a0why? Because the candidate is a male. And people\u00a0are,\u00a0your system is wrong; it\u2019s biased. But actually, the system is not wrong. The way we use the system is wrong.\u00a0Because\u00a0the system learns the strong correlation between the gender and the job title, but\u00a0there\u2019s\u00a0no causality. The system does not have the causality at all. A famous example is,\u00a0you know,\u00a0there\u2019s a strong correlation between the rooster\u2019s crow and the sunrise,\u00a0but it does not cause the sunrise at all! These are the problems of these deep learning models. People need to be aware of the limitations of\u00a0the\u00a0models so that\u00a0they\u00a0do not misuse them.<\/p>\n
Host: So one step further, are there ways that you can move towards causality?<\/b><\/p>\n
Jianfeng\u00a0Gao: Yes, there are a lot of ongoing works. There\u2019s a recent book called\u00a0The Book of Why.<\/i><\/p>\n
Host:\u00a0<\/b>The Book of Why<\/i><\/b>.<\/b><\/p>\n
Jianfeng\u00a0Gao: Yeah,\u00a0The Book of Why\u00a0<\/i>by Judea Pearl. There are a lot of new models he\u2019s developing. One of the popular models is called the Bayesian\u00a0network. Of course, the Bayesian\u00a0network can be used in many applications, but he believes this at least\u00a0is\u00a0a promising tool to implement the causal models.<\/p>\n
Host: I\u2019m getting a reading list from this podcast! It\u2019s awesome. Well, we\u2019ve talked about your professional path,\u00a0<\/b>Jianfeng<\/b>. Tell us a little bit about your personal history. Where\u2019d you grow up? Whe<\/b>re<\/b>\u00a0did you get interested in computer science and how did you end up in AI research?<\/b><\/p>\n
Jianfeng\u00a0Gao: I was born in Shanghai. I grew up in Shanghai and I studied design back to college. So I was not a computer science student at all. I learned to program only because I want to date a girl at that time. So I needed money!<\/p>\n
Host: You learned to code so you could date a girl\u2026 I love it!<\/b><\/p>\n
Jianfeng\u00a0Gao: Then, when I was graduating in\u00a0the year\u00a01999, Microsoft Research founded a lab in China and I sent them my resume and I got a chance to interview and they accepted my application. That\u2019s it. Now, after that, I started\u00a0to work\u00a0on\u00a0AI. Before that, I knew little about AI.<\/p>\n
Host: Okay, back up a little. What was your degree in? Design?<\/b><\/p>\n
Jianfeng\u00a0Gao: I got undergraduate in design. Bachelor degree in design. Then I got electronic\u2026 I got a Double E.<\/p>\n
Host: So electronic engineering?<\/b><\/p>\n
Jianfeng\u00a0Gao: Yeah,\u00a0then\u00a0computer science\u00a0a little bit later\u00a0because I got interested in computer science after\u2026 Finally I got a computer science degree.<\/p>\n
Host: A PhD?<\/b><\/p>\n
Jianfeng\u00a0Gao: A PhD, yeah.<\/p>\n
Host: Did you do that in Shanghai or Beijing?<\/b><\/p>\n
Jianfeng\u00a0Gao: Shanghai.<\/p>\n
Host: So 1999, you came to Microsoft Research.<\/b><\/p>\n
Jianfeng\u00a0Gao: Yeah, in China.<\/p>\n
Host: Okay, and then you came over here, or\u2026<\/b><\/p>\n
Jianfeng\u00a0Gao: Then in 2005 I moved to Redmond and joined a product group at that time. My mission at that time was to build the first natural user interface for Microsoft Windows Vista. And we couldn\u2019t make it! And after one year, I joined the Microsoft Research here\u2026<\/p>\n
Host: All right!<\/b><\/p>\n
Jianfeng\u00a0Gao: \u2026as there are a lot more fundamental work to do before can build a real system for users.<\/p>\n
Host: \u201cLet\u2019s go upstream a little\u2026\u201d Okay.<\/b><\/p>\n
Jianfeng\u00a0Gao: Then I worked for eight years at Microsoft Research in the EPIC group.<\/p>\n
Host: And now you\u2019re\u00a0<\/b>P<\/b>artner\u00a0<\/b>R<\/b>esearch\u00a0<\/b>M<\/b>anager for the Deep Learning\u00a0<\/b>G<\/b>roup\u2026<\/b><\/p>\n
Jianfeng\u00a0Gao: Yeah. Yeah, yeah, yeah\u2026<\/p>\n
Host: What\u2019s one interesting thing that people don\u2019t know about you? Maybe it\u2019s a personal trait or a hobby or side quest, that may have influenced your career as a researcher?<\/b><\/p>\n
Jianfeng\u00a0Gao: I remember,\u00a0when I interviewed for Microsoft Research,\u00a0during the interview, I failed almost all the questions and finally I said okay, it\u2019s hopeless. I went home, and the next\u00a0day I got a phone call saying you\u2019re hired. In retrospect, I\u00a0think I\u00a0did not give\u00a0the right\u00a0answer, I asked the right questions during the interview. I think it is very important for researchers to learn how to ask the right questions!<\/p>\n
Host: That\u2019s funny. How do you get a wrong answer in an interview?<\/b><\/p>\n
Jianfeng\u00a0Gao:\u00a0Because\u00a0I was asked all the questions about the speech and natural language. I had no idea at all. I remember, at that time, he asked me to figure out an algorithm called Viterbi. I never heard of that. Then I actually asked a lot of questions. And he answered part of\u00a0them. Then later he said, I cannot answer more questions because if I answer this question, you will get the answer. That shows I asked the right questions!<\/p>\n
Host: Let\u2019s close with some thoughts on the potential ahead.\u00a0<\/b>And h<\/b>ere\u2019s your chance to talk to would be researchers out there who will take the AI baton and run with it for the next couple of decades. What advice or direction would you give to your future colleagues<\/b>,<\/b>\u00a0or even your future successors?<\/b><\/p>\n
Jianfeng\u00a0Gao: I think, first of all, you need to be passionate about research. It\u2019s critical to identify the problem you really want to devote your lifetime to work on. That\u2019s number one. Number two: after you identify this problem you want to work\u00a0on,\u00a0stay focused. Number three: keep your eyes open. That\u2019s my advice.<\/p>\n
Host: Is that how you did yours?<\/b><\/p>\n
Jianfeng\u00a0Gao: I think so!<\/p>\n
Host:\u00a0<\/b>Jianfeng<\/b>\u00a0Gao,\u00a0<\/b>thank you for joining us today!<\/b><\/p>\n
Jianfeng\u00a0Gao: Thanks for having me.<\/p>\n
(music plays)\u00a0<\/i><\/b><\/p>\n
To learn more about Dr.\u00a0<\/i><\/b>Jianfeng<\/i><\/b>\u00a0Gao and how researchers are going deeper on deep learning, visit Microsoft.com\/research<\/a><\/i><\/b><\/p>\n","protected":false},"excerpt":{"rendered":"Dr. Jianfeng Gao is a veteran computer scientist, an IEEE Fellow and the current head of the Deep Learning Group at Microsoft Research. He and his team are exploring novel approaches to advancing the state-of-the-art on deep learning in areas like NLP, computer vision, multi-modal intelligence and conversational AI. Today, Dr. Gao gives us an overview of the deep learning landscape and talks about his latest work on Multi-task Deep Neural Networks, Unified Language Modeling and vision-language pre-training. He also unpacks the science behind task-oriented dialog systems as well as social chatbots like Microsoft Xiaoice, and gives us some great book recommendations along the way!<\/p>\n","protected":false},"author":39507,"featured_media":632754,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"https:\/\/player.blubrry.com\/id\/54624086\/","msr-podcast-episode":"104","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"categories":[240054],"tags":[],"research-area":[13561,13556,13562,13545],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-632745","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-msr-podcast","msr-research-area-algorithms","msr-research-area-artificial-intelligence","msr-research-area-computer-vision","msr-research-area-human-language-technologies","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"https:\/\/player.blubrry.com\/id\/54624086\/","podcast_episode":"104","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[952050,144931],"related-projects":[171429],"related-events":[],"related-researchers":[],"msr_type":"Post","featured_image_thumbnail":"","byline":"","formattedDate":"January 29, 2020","formattedExcerpt":"Dr. Jianfeng Gao is a veteran computer scientist, an IEEE Fellow and the current head of the Deep Learning Group at Microsoft Research. He and his team are exploring novel approaches to advancing the state-of-the-art on deep learning in areas like NLP, computer vision, multi-modal…","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/632745"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/39507"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=632745"}],"version-history":[{"count":5,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/632745\/revisions"}],"predecessor-version":[{"id":886947,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/632745\/revisions\/886947"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/632754"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=632745"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=632745"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=632745"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=632745"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=632745"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=632745"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=632745"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=632745"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=632745"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=632745"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=632745"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}