{"id":624426,"date":"2019-12-04T08:59:15","date_gmt":"2019-12-04T16:59:15","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=624426"},"modified":"2019-12-04T09:06:13","modified_gmt":"2019-12-04T17:06:13","slug":"metalearned-neural-memory-teaching-neural-networks-how-to-remember","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/metalearned-neural-memory-teaching-neural-networks-how-to-remember\/","title":{"rendered":"Metalearned Neural Memory: Teaching neural networks how to remember"},"content":{"rendered":"<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/12\/MSR_NeuralMemory_V5_1400x788.gif\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-625128\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/12\/MSR_NeuralMemory_V5_1400x788.gif\" alt=\"Animation depicting Metalearned Neural Memory network.\" width=\"1400\" height=\"788\" \/><span class=\"sr-only\"> (opens in new tab)<\/span><\/a>Memory is an important part of human intelligence and the human experience. It grounds us in the current moment, helping us understand where we are and, consequently, what we should do next. Consider the simple example of reading a book. The ultimate goal is to understand the story, and memory is the reason we\u2019re able to do so. Memory allows us to efficiently store the information we encounter and later recall the details we\u2019ve previously read, whether that be moments earlier or weeks, to piece together the full narrative. Memory is equally important in deep learning, especially when the goal is to create models with advanced capabilities. In the fields of natural language understanding and processing, for example, memory is crucial for modeling long-term dependencies and building representations of partially observable states.<\/p>\n<p>In a <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/metalearned-neural-memory\/\">paper<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> published at the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/neurips.cc\/\">33rd Conference on Neural Information Processing Systems (NeurIPS)<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, we demonstrate how to use a deep neural network itself as a memory mechanism. We propose a new model, Metalearned Neural Memory (MNM), in which we store data in the parameters of a deep network and use the function defined by that network to recall the data.<\/p>\n<p><span style=\"color: #000000;\">Deep networks\u2014powerful and flexible function approximators capable of generalizing from training data or memorizing it\u2014have seen limited use as memory modules, as writing information into network parameters is slow. Deep networks require abundant data and many steps of gradient descent to learn. Fortunately, recent progress in few-shot learning and metalearning has shown how we might overcome this challenge. Methods from these fields can discover update procedures that optimize neural parameters from many fewer examples than standard stochastic gradient descent. It\u2019s through metalearning techniques that MNM learns to remember: It learns how to read from and write to memory, as opposed to using hard-coded read\/write operations like most existing computational memory mechanisms.<\/span><\/p>\n<div id=\"attachment_624432\" style=\"width: 526px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/mnm_schema-scaled.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-624432\" class=\" wp-image-624432\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/mnm_schema-scaled.jpg\" alt=\"Figure describing Metalearned Neural Memory (MNM)\" width=\"516\" height=\"324\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/mnm_schema-scaled.jpg 2560w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/mnm_schema-300x188.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/mnm_schema-1024x642.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/mnm_schema-768x482.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/mnm_schema-1536x963.jpg 1536w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/mnm_schema-2048x1284.jpg 2048w\" sizes=\"(max-width: 516px) 100vw, 516px\" \/><p id=\"caption-attachment-624432\" class=\"wp-caption-text\"><span class=\"sr-only\"> (opens in new tab)<\/span><\/a> Figure 1: Metalearned Neural Memory (MNM) employs a feedforward network for memory and a recurrent neural network controller for writing to and reading from the memory. In the above illustration of the model, green and blue arrows indicate data flows for writing and reading operations, respectively.<\/p><\/div>\n<h3><strong>Learning how to remember: <\/strong><strong>Reading, writing, and <\/strong><strong>metalearning<\/strong><\/h3>\n<p>With MNM, we combine a fully connected feedforward network for memory with a recurrent neural network (RNN) controller. The controller writes to and reads from the neural memory.<\/p>\n<p>During reading, the feedforward network acts as a function that maps keys to values. We pass a <em>read-out<\/em> key from the controller as input to the neural memory, then take the memory\u2019s corresponding output vector as the read-out value. These values are what the neural memory \u201cretrieves\u201d; they\u2019re passed back into the controller for use in downstream tasks. In the process of writing, we update the memory function\u2019s parameters\u2014minimizing the error between an output vector resulting from a specific <em>write-in<\/em> key and a target value\u2014such that the memory function will retrieve the desired values accurately. Both the target value and the write-in key come from the controller. The controller thereby decides what it wants to store in memory and how it wants to trigger it.<\/p>\n<p>The parameter updates for writing to memory happen continually, at both training <em>and<\/em> test time. The controller parameters\u2014which are distinct from the memory parameters\u2014are trained end-to-end using a task objective reflective of the particular model and a meta objective for learning good update strategies. Importantly, gradients of the meta and task objectives include the memory write computations; through them, the controller <em>learns<\/em> to change the memory parameters to accommodate new data.<\/p>\n<h3><strong>A novel gradient-free learned local update rule<\/strong><\/h3>\n<p>We trained MNM using two alternative memory update procedures: a gradient-based method and a novel gradient-free learned local update rule. We refer to MNM trained with the former as <em>MNM-g<\/em> and MNM trained with the latter as <em>MNM-p<\/em>.<\/p>\n<p>Utilizing standard gradient descent to update parameters for memory writing is an obvious choice, but the approach, as we discovered, has its weaknesses. It may require <em>multiple<\/em> sequential gradient steps to store information reliably, and it requires computing multiple orders of gradients\u2014those for updating the neural memory and the gradients of <em>those<\/em> gradients to optimize the meta objective. These higher-order gradients are computationally expensive, and their values tend to vanish to zero, yielding no training signal. Our learned local update rule can be optimized for fast writing while avoiding these challenges.<\/p>\n<div id=\"attachment_624435\" style=\"width: 434px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/local_update.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-624435\" class=\" wp-image-624435\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/local_update.jpg\" alt=\"Diagram showing how Metalearned Neural Memory uses two alternative update procedures\" width=\"424\" height=\"672\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/local_update.jpg 484w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/local_update-189x300.jpg 189w\" sizes=\"(max-width: 424px) 100vw, 424px\" \/><p id=\"caption-attachment-624435\" class=\"wp-caption-text\"><span class=\"sr-only\"> (opens in new tab)<\/span><\/a> Figure 2: Metalearned Neural Memory uses two alternative update procedures\u2014a gradient-based method and a novel gradient-free learned local update rule\u2014to update parameters for memory writing in neural memory. Pictured above is the overall data flow in neural memory with the learned local update procedure. The main neural memory is in blue and the backward feedback prediction functions (BFPF) modules, used for predictions, are in orange.<\/p><\/div>\n<p>In executing the learned local update rule, we decouple the computation of each layer of the neural memory into separate forward propagation, shown in blue in Figure 2, and backward feedback prediction functions (BFPF), shown in orange. For a neural memory layer <em>l<\/em>, the BFPF makes a prediction for an expected activation z<sup>&#8216;<em>l<\/em><\/sup> based on the target value. Using the perceptron learning rule, we update the weights of the layer as follows, where <span style=\"display: inline !important; float: none; background-color: #ffffff; color: #333333; cursor: text; font-family: Georgia,'Times New Roman','Bitstream Charter',Times,serif; font-size: 16px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;\">z<\/span><sup><em>l<\/em><\/sup> and <span style=\"display: inline !important; float: none; background-color: #ffffff; color: #333333; cursor: text; font-family: Georgia,'Times New Roman','Bitstream Charter',Times,serif; font-size: 16px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;\">z<\/span><sup><em>l-1<\/em><\/sup> are activations of the current and previous layers:<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/MNM-equation1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-624624\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/MNM-equation1.png\" alt=\"Metalearned Neural Memory equation\" width=\"315\" height=\"75\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/MNM-equation1.png 378w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/MNM-equation1-300x71.png 300w\" sizes=\"(max-width: 315px) 100vw, 315px\" \/><span class=\"sr-only\"> (opens in new tab)<\/span><\/a>For clarity, we\u2019ve omitted time index and the update rate here (see our <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/metalearned-neural-memory\/\">paper<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> for the full equation).<\/p>\n<p>The perceptron update rule uses the predicted activation as a true target and approximates the gradient via the outer product:<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/MNM-equation2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-624627\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/MNM-equation2.png\" alt=\"Metalearned Neural Memory equation\" width=\"232\" height=\"99\" \/><span class=\"sr-only\"> (opens in new tab)<\/span><\/a>Since this update is fully differentiable, we can easily adjust the parameters of the BFPF functions, as well.<\/p>\n<p>With the proposed learned local update rule, MNM writes to its weights simultaneously and locally, and its full computation graph for the forward pass need not be tracked for writing. This makes the proposed update method very efficient and easier to apply to more complex neural architectures for memory like RNNs.<\/p>\n<h3><strong>Interpreting learned memory function<\/strong><\/h3>\n<p>When put to the test against the bAbI question-answering benchmark, an industry standard for gauging long-term memory and reasoning that <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.nature.com\/articles\/nature20101\">has generally proven difficult for neural models with soft lookup tables to solve<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, MNM-g and MNM-p both outperform several state-of-the-art models.<\/p>\n<p>The tasks in the benchmark\u201420 in total\u2014require MNM to read a story with multiple characters word-by-word before answering a question about it. Since the story is read before the question is known, the model has to track all of the characters in memory. By comparing the different combination of memory input keys and output values occurring during the tasks, we can visualize what is being read from and written to the memory; we observe very similar memory reading\/writing styles in MNM-g and MNM-p.<\/p>\n<p>Take MNM-g on the following bAbI story and question, for example:<\/p>\n<blockquote><p><em>Mary travelled to the garden. Mary moved to the bedroom. Sandra went to the bathroom. <strong>Daniel travelled to the bathroom.<\/strong> Mary travelled to the office. <strong>Daniel moved to the hallway.<\/strong> John moved to the kitchen. <strong>Daniel went to the garden.<\/strong><\/em><\/p>\n<p><em>Where is Daniel?<\/em><\/p><\/blockquote>\n<p>In Figure 3, each cell represents the similarity between the write-in keys or target values already generated for the words in the previously read sentences (x-axis) and the write-in keys or target values being generated as the model reads each word in the last sentence (y-axis), respectively. The higher the similarity, the brighter the color. A comparison of the current and past write-in keys can show where the memory is writing to; a comparison of the current and past target values can reveal what\u2019s being written. Together, they can tell us what key-value association is being made in the memory with the write operation.<\/p>\n<p>In looking at the write-in keys, there is a clear word-by-word alignment, for example, between the words in the sentences \u201cDaniel went to the garden\u201d and \u201cDaniel moved to the hallway,\u201d indicated by the brightly colored diagonal. The write-in keys are similar for the concepts related to the same character. When examining the target values, we spot a similar alignment except for the word pair \u201cgarden\u201d and \u201challway.\u201d For this pair, the target values are different while their keys are the same. This means that the model is creating a new association of \u201cDaniel\u201d-\u201cgarden\u201d by replacing the old one of \u201cDaniel\u201d-\u201challway\u201d to maintain a coherent memory structure as it continues to read the story.<\/p>\n<div id=\"attachment_624438\" style=\"width: 569px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/memory_writing.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-624438\" class=\" wp-image-624438\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/memory_writing.jpg\" alt=\"A visualization of the memory writing.\" width=\"559\" height=\"270\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/memory_writing.jpg 1774w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/memory_writing-300x145.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/memory_writing-1024x495.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/memory_writing-768x371.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/memory_writing-1536x743.jpg 1536w\" sizes=\"(max-width: 559px) 100vw, 559px\" \/><p id=\"caption-attachment-624438\" class=\"wp-caption-text\"><span class=\"sr-only\"> (opens in new tab)<\/span><\/a> Figure 3: Analysis of the input keys and output values occurring during bAbI question-answering tasks provides insight into the inner workings of the Metalearned Neural Memory model. Above is a visualization of the memory writing.<\/p><\/div>\n<p>We also investigated the behavior of a reinforcement learning agent with neural memory in a Grid World setup. Below we have shown the agent (orange), augmented with MNM-g, exploring the Grid World for the goal (red) on the left and its top five memory recalls on the right. Such analysis can show the agent\u2019s memory recall patterns while it\u2019s exploring the Grid World environment.<\/p>\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/1b.gif\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-624600\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/1b.gif\" alt=\"GIF 1 showing Metalearned Neural Memory\" width=\"677\" height=\"158\" \/><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/2b.gif\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-624603\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/2b.gif\" alt=\"GIF 2 showing Metalearned Neural Memory\" width=\"677\" height=\"157\" \/><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/3b.gif\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-624606\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/3b.gif\" alt=\"GIF 3 showing Metalearned Neural Memory\" width=\"677\" height=\"158\" \/><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/4b.gif\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-624612\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/4b.gif\" alt=\"GIF 4 showing Metalearned Neural Memory\" width=\"677\" height=\"158\" \/><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/5b.gif\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-624618\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/11\/5b.gif\" alt=\"G5F 1 showing Metalearned Neural Memory\" width=\"677\" height=\"158\" \/><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><strong>Many a memory<\/strong><\/p>\n<p>The idea of using a neural network as a memory store is not entirely novel. It goes back at least as far as <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.semanticscholar.org\/paper\/Neural-networks-and-physical-systems-with-emergent-Hopfield\/cbd1ade5b869b13d1853aa0753b82fb35c26bcba\">John J. Hopfield&#8217;s 1988 work in associative memory<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. To our knowledge, though, we\u2019re the first to adopt metalearning techniques to store information rapidly in deep networks. A similar idea, more closely related to Hopfield\u2019s, has emerged in <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/arxiv.org\/abs\/1910.02720\">recent related work<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n<p>Deep learning research is just scratching the surface when it comes to memory. We believe that just as the human brain possesses different types of memory, such as working and procedural, there is a variety of memory types to uncover in pushing the state of the art in deep learning. We consider MNM a part of that push, and while we used just a feedforward network, we envision applying neural memory to more complex architectures like RNNs and graph neural networks, building in different inductive biases to help model what we see in the data.<\/p>\n<p>To experiment with the MNM code and tasks, check out our <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/bitbucket.org\/tsendeemts\/mnm\/src\/master\/\">PyTorch implementation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Memory is an important part of human intelligence and the human experience. It grounds us in the current moment, helping us understand where we are and, consequently, what we should do next. Consider the simple example of reading a book. The ultimate goal is to understand the story, and memory is the reason we\u2019re able [&hellip;]<\/p>\n","protected":false},"author":38679,"featured_media":625131,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"categories":[194467],"tags":[],"research-area":[13556],"msr-region":[256048],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-624426","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artifical-intelligence","msr-research-area-artificial-intelligence","msr-region-global","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[437514],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[629145],"related-projects":[],"related-events":[609480],"related-researchers":[{"type":"user_nicename","value":"Alessandro Sordoni","user_id":37230,"display_name":"Alessandro Sordoni","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/alsordon\/\" aria-label=\"Visit the profile page for Alessandro Sordoni\">Alessandro Sordoni<\/a>","is_active":false,"last_first":"Sordoni, Alessandro","people_section":0,"alias":"alsordon"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/12\/MSR_NeuralMemory_V5_1400x788-960x540.png\" class=\"img-object-cover\" alt=\"\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/12\/MSR_NeuralMemory_V5_1400x788-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/12\/MSR_NeuralMemory_V5_1400x788-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/12\/MSR_NeuralMemory_V5_1400x788-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/12\/MSR_NeuralMemory_V5_1400x788-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/12\/MSR_NeuralMemory_V5_1400x788-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/12\/MSR_NeuralMemory_V5_1400x788-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/12\/MSR_NeuralMemory_V5_1400x788-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/12\/MSR_NeuralMemory_V5_1400x788-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/12\/MSR_NeuralMemory_V5_1400x788-1280x720.png 1280w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/12\/MSR_NeuralMemory_V5_1400x788.png 1400w\" sizes=\"(max-width: 960px) 100vw, 960px\" \/>","byline":"Tsendsuren Munkhdalai, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/alsordon\/\" title=\"Go to researcher profile for Alessandro Sordoni\" aria-label=\"Go to researcher profile for Alessandro Sordoni\" data-bi-type=\"byline author\" data-bi-cN=\"Alessandro Sordoni\">Alessandro Sordoni<\/a>, Tong Wang, and Adam Trischler","formattedDate":"December 4, 2019","formattedExcerpt":"Memory is an important part of human intelligence and the human experience. It grounds us in the current moment, helping us understand where we are and, consequently, what we should do next. Consider the simple example of reading a book. The ultimate goal is to&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/624426"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/38679"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=624426"}],"version-history":[{"count":17,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/624426\/revisions"}],"predecessor-version":[{"id":625251,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/624426\/revisions\/625251"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/625131"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=624426"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=624426"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=624426"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=624426"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=624426"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=624426"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=624426"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=624426"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=624426"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=624426"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=624426"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}