{"id":602724,"date":"2019-08-21T07:56:58","date_gmt":"2019-08-21T14:56:58","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=602724"},"modified":"2022-11-07T12:31:13","modified_gmt":"2022-11-07T20:31:13","slug":"machine-reading-comprehension-with-dr-t-j-hazen","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/podcast\/machine-reading-comprehension-with-dr-t-j-hazen\/","title":{"rendered":"Machine reading comprehension with Dr. T.J. Hazen"},"content":{"rendered":"
<\/a><\/p>\n The ability to read and understand unstructured text, and then answer questions about it, is a common skill among literate humans. But for machines? Not so much. At least not yet! And not if Dr. T.J. Hazen<\/a>, Senior Principal Research Manager in the Engineering and Applied Research group at MSR Montreal<\/a>, has a say. He\u2019s spent much of his career working on machine speech and language understanding, and particularly, of late, machine reading comprehension, or MRC<\/a>.<\/p>\n On today\u2019s podcast, Dr. Hazen talks about why reading comprehension is so hard for machines, gives us an inside look at the technical approaches applied researchers and their engineering colleagues are using to tackle the problem, and shares the story of how an a-ha moment with a Rubik\u2019s Cube inspired a career in computer science and a quest to teach computers to answer complex, text-based questions in the real world.<\/p>\n T.J. Hazen: Most of the questions are fact-based questions like, who did something, or when did something happen? And most of the answers are fairly easy to find. So, you know, doing as well as a human on a task is fantastic, but it only gets you part of the way there. What happened is, after this was announced that Microsoft had this great achievement in machine reading comprehension, lots of customers started coming to Microsoft saying, how can we have that for our company? And this is where we\u2019re focused right now. How can we make this technology work for real problems that our enterprise customers are bringing in?<\/p>\n Host: You\u2019re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I\u2019m your host, Gretchen Huizinga.<\/strong><\/p>\n Host: The ability to read and understand unstructured text, and then answer questions about it, is a common skill among literate humans. But for machines? Not so much. At least not yet! And not if Dr. T.J. Hazen, Senior Principal Research Manager in the Engineering and Applied Research group at MSR Montreal, has a say. He\u2019s spent much of his career working on machine speech and language understanding, and particularly, of late, machine reading comprehension, or MRC.<\/strong><\/p>\n On today\u2019s podcast, Dr. Hazen talks about why reading comprehension is so hard for machines, gives us an inside look at the technical approaches applied researchers and their engineering colleagues are using to tackle the problem, and shares the story of how an a-ha moment with a Rubik\u2019s Cube inspired a career in computer science and a quest to teach computers to answer complex, text-based questions in the real world. That and much more on this episode of the Microsoft Research Podcast.<\/strong><\/p>\n (music plays)<\/strong><\/p>\n Host: T.J. Hazen, welcome to the podcast!<\/strong><\/p>\n T.J. Hazen: Thanks for having me.<\/p>\n Host: Researchers like to situate their research, and I like to situate my researchers so let\u2019s get you situated. You are a Senior Principal Research Manager in the Engineering and Applied Research group at Microsoft Research in Montreal. Tell us what you do there. What are the big questions you\u2019re asking, what are the big problems you\u2019re trying to solve, what gets you up in the morning?<\/strong><\/p>\n T.J. Hazen: Well, I\u2019ve spent my whole career working in speech and language understanding, and I think the primary goal of everything I do is to try to be able to answer questions. So, people have questions and we\u2019d like the computer to be able to provide answers. So that\u2019s sort of the high-level goal, how do we go about answering questions? Now, answers can come from many places.<\/p>\n Host: Right.<\/strong><\/p>\n T.J. Hazen: A lot of the systems that you\u2019re probably aware of like Siri for example, or Cortana or Bing or Google, any of them\u2026<\/p>\n Host: Right.<\/strong><\/p>\n T.J. Hazen: \u2026the answers typically come from structured places, databases that contain information, and for years these models have been built in a very domain-specific way. If you want to know the weather, somebody built a system to tell you about the weather.<\/p>\n Host: Right.<\/strong><\/p>\n T.J. Hazen: And somebody else might build a system to tell you about the age of your favorite celebrity and somebody else might have written a system to tell you about the sports scores, and each of them can be built to handle that very specific case. But that limits the range of questions you can ask because you have to curate all this data, you have to put it into structured form. And right now, what we\u2019re worried about is, how can you answer questions more generally, about anything? And the internet is a wealth of information. The internet has got tons and tons of documents on every topic, you know, in addition to the obvious ones like Wikipedia. If you go into any enterprise domain, you\u2019ve got manuals about how their operation works. You\u2019ve got policy documents. You\u2019ve got financial reports. And it\u2019s not typical that all this information is going to be curated by somebody. It\u2019s just sitting there in text. So how can we answer any question about anything that\u2019s sitting in text? We don\u2019t have a million or five million or ten million librarians doing this for us\u2026<\/p>\n Host: Right.<\/strong><\/p>\n T.J. Hazen: \u2026uhm, but the information is there, and we need a way to get at it.<\/p>\n Host: Is that what you are working on?<\/strong><\/p>\n T.J. Hazen: Yes, that\u2019s exactly what we\u2019re working on. I think one of the difficulties with today\u2019s systems is, they seem really smart\u2026<\/p>\n Host: Right?<\/strong><\/p>\n T.J. Hazen: Sometimes. Sometimes they give you fantastically accurate answers. But then you can just ask a slightly different question and it can fall on its face.<\/p>\n Host: Right.<\/strong><\/p>\n T.J. Hazen: That\u2019s the real gap between what the models currently do, which is, you know, really good pattern matching some of the time, versus something that can actually understand what your question is and know when the answer that it\u2019s giving you is correct.<\/p>\n Host: Let\u2019s talk a bit about your group, which, out of Montreal, is Engineering and Applied Research. And that\u2019s an interesting umbrella at Microsoft Research. You\u2019re technically doing fundamental research, but your focus is a little different from some of your pure research peers. How would you differentiate what you do from others in your field?<\/strong><\/p>\n T.J. Hazen: Well, I think there\u2019s two aspects to this. The first is that the lab up in Montreal was created as an offshoot of an acquisition. Microsoft bought Maluuba, which was a startup that was doing really incredible deep learning research, but at the same time they were a startup and they needed to make money. So, they also had this very talented engineering team in place to be able to take the research that they were doing in deep learning and apply it to problems where it could go into products for customers.<\/p>\n Host: Right.<\/strong><\/p>\n T.J. Hazen: When you think about that need that they had to actually build something, you could see why they had a strong engineering team.<\/p>\n Host: Yeah.<\/strong><\/p>\n T.J. Hazen: Now, when I joined, I wasn\u2019t with them when they were a startup, I actually joined them from Azure where I was working with outside customers in the Azure Data Science Solution team, and I observed lots of problems that our customers have. And when I saw this new team that we had acquired and we had turned into a research lab in Montreal, I said I really want to be involved because they have exactly the type of technology that can solve customer problems and they have this engineering team in place that can actually deliver on turning from a concept into something real.<\/p>\n Host: Right.<\/strong><\/p>\n T.J. Hazen: So, I joined, and I had this agreement with my manager that we would focus on real problems. They were now part of the research environment at Microsoft, but I said that doesn\u2019t restrict us on thinking about blue sky, far-afield research. We can go and talk to product teams and say what are the real problems that are hindering your products, you know, what are the difficulties you have in actually making something real? And we could focus our research to try to solve those difficult problems. And if we\u2019re successful, then we have an immediate product that could be beneficial.<\/p>\n Host: Well in any case, you\u2019re swimming someplace in a \u201cwe could do this immediately\u201d but you have permission to take longer, or is there a mandate, as you live in this engineering and applied research group?<\/strong><\/p>\n T.J. Hazen: I think there\u2019s a mandate to solve hard problems. I think that\u2019s the mandate of research. If it wasn\u2019t a hard problem, then somebody\u2026<\/p>\n Host: \u2026would already have a product.<\/strong><\/p>\n T.J. Hazen: \u2026in the product team would already have a solution, right? So, we do want to tackle hard problems. But we also want to tackle real problems. That\u2019s, at least, our focus of our team. And there\u2019s plenty of people doing blue sky research and that\u2019s an absolute need as well. You know, we can\u2019t just be thinking one or two years ahead. Research should be also be thinking five, ten, fifteen years ahead.<\/p>\n Host: So, there\u2019s a whole spectrum there.<\/strong><\/p>\n T.J. Hazen: So, there\u2019s a spectrum. But there is a real need, I think, to fill that gap between taking an idea that works well in a lab and turning it into something that works well in practice for a real problem. And that\u2019s the key. And many of the problems that have been solved by Microsoft have not just been blue sky ideas, but they\u2019ve come from this problem space where a real product says, ahh, we\u2019re struggling with this. So, it could be anything. It can be, like, how does Bing efficiently rank documents over billions of documents? You don\u2019t just solve that problem by thinking about it, you have to get dirty with the data, you have to understand what the real issues are. So, many of these research problems that we\u2019re focusing on, and we\u2019re focusing on, how do you answer questions out of documents when the questions could be arbitrary, and on any topic? And you\u2019ve probably experienced this, if you are going into a search site for your company, that company typically doesn\u2019t have the advantage of having a big Bing infrastructure behind it that\u2019s collecting all this data and doing sophisticated machine learning. Sometimes it\u2019s really hard to find an answer to your question. And, you know, the tricks that people use can be creative and inventive but oftentimes, trying to figure out what the right keywords are to get you to an answer is not the right thing.<\/p>\n Host: You work closely with engineers on the path from research to product. So how does your daily proximity to the people that reify your ideas as a researcher impact the way you view, and do, your work as a researcher?<\/strong><\/p>\n T.J. Hazen: Well, I think when you\u2019re working in this applied research and engineering space, as opposed to a pure research space, it really forces you to think about the practical implications of what you\u2019re building. How easy is it going to be for somebody else to use this? Is it efficient? Is it going to run at scale? All of these problems are problems that engineers care a lot about. And sometimes researchers just say, let me solve the problem first and everything else is just engineering. If you say that to an engineer, they\u2019ll be very frustrated because you don\u2019t want to bring something to an engineer that works ten times slower than needs to be, uses ten times more memory. So, when you\u2019re in close proximity to engineers, you\u2019re thinking about these problems as you are developing your methods.<\/p>\n Host: Interesting, because those two things, I mean, you could come up with a great idea that would do it and you pay a performance penalty in spades, right?<\/strong><\/p>\n T.J. Hazen: Yeah, yeah. So, sometimes it\u2019s necessary. Sometimes you don\u2019t know how to do it and you just say let me find a solution that works and then you spend ten years actually trying to figure out how to make it work in a real product.<\/p>\n Host: Right.<\/strong><\/p>\n T.J. Hazen: And I\u2019d rather not spend that time. I\u2019d rather think about, you know, how can I solve something and have it be effective as soon as possible?<\/p>\n (music plays)<\/p>\n Host: Let\u2019s talk about human language technologies<\/a>. They\u2019ve been referred to by some of your colleagues as \u201cthe crown jewel of AI<\/a>.\u201d Speech and language comprehension is still a really hard problem. Give us a lay of the land, both in the field in general and at Microsoft Research specifically. What\u2019s hope and what\u2019s hype, and what are the common misconceptions that run alongside the remarkable strides you actually are making?<\/strong><\/p>\n T.J. Hazen: I think that word we mentioned already: understand. That\u2019s really the key of it. Or comprehend is another way to say it. What we\u2019ve developed doesn\u2019t really understand, at least when we\u2019re talking about general purpose AI. So, the deep learning mechanisms that people are working on right now that can learn really sophisticated things from examples. They do an incredible job of learning specific tasks, but they really don\u2019t understand what they\u2019re learning.<\/p>\n Host: Right.<\/strong><\/p>\n T.J. Hazen: So, they can discover complex patterns that can associate things. So in the vision domain, you know, if you\u2019re trying to identify objects, and then you go in and see what the deep learning algorithm has learned, it might have learned features that are like, uh, you know, if you\u2019re trying to identify a dog, it learns features that would say, oh, this is part of a leg, or this is part of an ear, or this is part of the nose, or this is the tail. It doesn\u2019t know what these things are, but it knows they all go together. And the combination of them will make a dog. And it doesn\u2019t know what a dog is either. But the idea that you could just feed data in and you give it some labels, and it figures everything else out about how to associate that label with that, that\u2019s really impressive learning, okay? But it\u2019s not understanding. It\u2019s just really sophisticated pattern-matching. And the same is true in language. We\u2019ve gotten to the point where we can answer general-purpose questions and it can go and find the answer out of a piece of text, and it can do it really well in some cases, and like, some of the examples we\u2019ll give it, we\u2019ll give it \u201cwho\u201d questions and it learns that \u201cwho\u201d questions should contain proper names or names of organizations. And \u201cwhen\u201d questions should express concepts of time. It doesn\u2019t know anything about what time is, but it\u2019s figured out the patterns about, how can I relate a question like \u201cwhen\u201d to an answer that contains time expression? And that\u2019s all done automatically. There\u2019s no features that somebody sits down and says, oh, this is a month and a month means this, and this is a year, and a year means this. And a month is a part of a year. Expert AI systems of the past would do this. They would create ontologies and they would describe things about how things are related to each other and they would write rules. And within limited domains, they would work really, really well if you stayed within a nice, tightly constrained part of that domain. But as soon as you went out and asked something else, it would fall on its face. And so, we can\u2019t really generalize that way efficiently. If we want computers to be able to learn arbitrarily, we can\u2019t have a human behind the scene creating an ontology for everything. That\u2019s the difference between understanding and crafting relationships and hierarchies versus learning from scratch. We\u2019ve gotten to the point now where the algorithms can learn all these sophisticated things, but they really don\u2019t understand the relationships the way that humans understand it.<\/p>\nEpisode 86, August 21, 2019<\/h3>\n
Related:<\/h3>\n
\n
\nTranscript<\/h3>\n