{"id":583189,"date":"2019-05-07T10:00:50","date_gmt":"2019-05-07T17:00:50","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=583189"},"modified":"2019-05-07T10:34:25","modified_gmt":"2019-05-07T17:34:25","slug":"incentivizing-information-explorers-when-theyd-really-rather-exploit","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/incentivizing-information-explorers-when-theyd-really-rather-exploit\/","title":{"rendered":"Incentivizing information explorers (when they\u2019d really rather exploit)"},"content":{"rendered":"

\"Incentivizing<\/a><\/p>\n

Everyone is familiar by now with recommendation systems such as on Netflix for movies, Amazon for products, Yelp for restaurants and TripAdvisor for travel. Indeed, quality recommendations are a crucial part of the value provided by these businesses. Recommendation systems encourage users to share feedback on their experiences and aggregates the feedback in order to provide users with higher quality recommendations \u2013 and, more generally, higher quality experiences \u2013 moving forward.<\/p>\n

From the point of view of online recommendation systems in which users both consume and contribute information, it could be said that users play a dual role; they are information explorers and information exploiters. We\u2019ve each of us been one or the other at various moments in our decision-based sorties online, shopping for a specific item, deciding on a service provider, and so on. The tradeoff between exploration and exploitation is a well-known subject in machine learning and economics. Information exploiters seek out the best choice given the information available to date. Information explorers on the other hand would seem willing to try a lesser known \u2013 or even unknown \u2013 option, for the sake of gathering more information. But an explorer does this at the real risk of a negative experience.<\/p>\n

In an ideal world, a recommendation system would not only control the information flow to the users in a beneficial way in the short-term, it would also strive to incentivize exploration<\/em> in the longer term via what we call information asymmetry<\/em>: the fact that the system knows more than any one agent. It would do this despite users\u2019 incentives often tilting this delicate balance in favor of exploitation.<\/p>\n

This tension, between exploration, exploitation and users\u2019 incentives, provides an interesting opportunity for an entity \u2013 say, a social planner \u2013 who might manipulate what information is seen by users for the sake of a common good, balancing exploration of insufficiently known alternatives with the exploitation of the information amassed. And indeed, designing algorithms to trade off these two objectives is a well-researched area in machine learning and operations research.<\/p>\n

Again, there can be disadvantages to choosing to be an explorer. You risk all the downsides of your strategy (say, buying a product that you wish you hadn\u2019t, or staying in a hotel you can\u2019t wait to check out of). The upsides (your cautionary feedback) on the other hand benefit numerous users in a future yet to arrive. This is to reiterate that users\u2019 incentives are naturally skewed in favor of being an exploiter.<\/p>\n

Enter the problem of incentivizing information exploration (a.k.a., Bayesian Exploration), where the recommendation system (here called the \u201cprincipal\u201d) controls the information flow to the users (the \u201cagents\u201d) and strives to incentivize exploration via information asymmetry<\/p>\n

\u201cThink of Bayesian exploration as a protection from selection bias \u2013 a phenomenon when the population that participates in the experiment differs from the target population. People who\u2019ll rate a new Steven Seagal movie mainly come from a fairly small (and weird) population of fans, such as myself.\u201d \u2013 Alex Slivkins, Senior Researcher, Microsoft Research New York City<\/strong><\/p><\/blockquote>\n

In \u201cBayesian Exploration with Heterogenous Agents<\/a>\u201d to be presented at The Web Conference 2019<\/a> in San Francisco May 13-17, Nicole Immorlica<\/a> and Aleksandrs Slivkins<\/a> of Microsoft Research New York City, along with Jieming Mao of the University of Pennsylvania and Zhiwei Steven Wu of the University of Minnesota propose a simple model that takes as its embarkation point Bayesian exploration. The researchers allow heterogeneous users, relaxing a major assumption from prior work that users have the same preferences from one moment in time to another. The goal is to learn the best personalized recommendations. One particular challenge was how to incentivize some of the user types to take some of the actions, no matter what the principal does or how much time the principal has. The researchers considered several versions of the model, depending on whether and when the user types are reported to the principal, and they designed a near-optimal \u201crecommendation policy\u201d for each version. They also investigated how the model choice and the diversity of user types impact the set of actions that can possibly be \u201cexplored\u201d by each type.<\/p>\n

Earlier work in Bayesian exploration \u2013 much of which came from Microsoft coauthors — relies on the inherent information asymmetry between the recommendation system (the principal) and a stream of users \u2013 the self-interested agents \u2013 arriving one at a time. Each agent needs to take an action from a given set of alternatives. The principal issues a recommendation and observes the outcome. But it can\u2019t tell the agent what to do. The problem is to design a recommendation policy for the principal that learns over time to make good recommendations and<\/em> that also ensures that the agents are incentivized to follow this recommendation. A single round of this model is a version of a well-known Bayesian persuasion game from theoretical economics.<\/p>\n

In contrast, the researchers used Bayesian exploration with the heterogeneous preferences of each agent being known up front and assigned using type. For example, pets allowed versus no pets allowed in accommodations bookings. Each time an agent takes an action, the outcome depends on the action itself (for example, the selection of hotel), the \u201cstate\u201d of the world (for example, the qualities of the hotels), and the type of the agent. The state is persistent; it does not change over time. However, the state is not known initially. A Bayesian prior on the state is common knowledge. In each round, the agent type is drawn independently from a fixed and known distribution. The principal strives to learn the best possible recommendation for each agent type.<\/p>\n

\u201cUsers are both consumers and producers of information. How do we coordinate a crowd of diverse users to efficiently gather information for the population? Information asymmetry is the key.\u201d \u2013 Steven Wu, Assistant Professor, University of Minnesota<\/p>\n

The researchers considered three models, depending on whether and when the agent type would be revealed to the principal. They designed a near-optimal recommendation policy for each modeling choice. The three models envisioned were:<\/p>\n