{"id":648054,"date":"2020-04-08T08:00:54","date_gmt":"2020-04-08T15:00:54","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=648054"},"modified":"2020-06-18T07:30:43","modified_gmt":"2020-06-18T14:30:43","slug":"project-orleans-and-the-distributed-database-future-with-dr-philip-bernstein","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/podcast\/project-orleans-and-the-distributed-database-future-with-dr-philip-bernstein\/","title":{"rendered":"Project Orleans and the distributed database future with Dr. Philip Bernstein"},"content":{"rendered":"<h3><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-648063\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/Research_Podcast_PhilBernstein_Site_1400x788-1024x576.png\" alt=\"Photo of Dr. Philip Bernstein for the Microsoft Research Podcast\" width=\"1024\" height=\"576\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/Research_Podcast_PhilBernstein_Site_1400x788-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/Research_Podcast_PhilBernstein_Site_1400x788-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/Research_Podcast_PhilBernstein_Site_1400x788-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/Research_Podcast_PhilBernstein_Site_1400x788-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/Research_Podcast_PhilBernstein_Site_1400x788-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/Research_Podcast_PhilBernstein_Site_1400x788-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/Research_Podcast_PhilBernstein_Site_1400x788-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/Research_Podcast_PhilBernstein_Site_1400x788-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/Research_Podcast_PhilBernstein_Site_1400x788-1280x720.png 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/Research_Podcast_PhilBernstein_Site_1400x788.png 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/h3>\n<h3>Episode 114 | April 8, 2020<\/h3>\n<p>Forty years ago, database research was an \u201cexotic\u201d field and, because of its business data processing reputation, was not considered intellectually interesting in academic circles. But that didn\u2019t deter <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/philbe\/\">Dr. Philip Bernstein<\/a>, now a Distinguished Scientist in MSR\u2019s <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/data-management-exploration-and-mining-dmx\/\">Data Management, Exploration and Mining group<\/a>, and a pioneer in the field.<\/p>\n<p>Today, Dr. Bernstein talks about his pioneering work in databases over the years and tells us all about <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/orleans-virtual-actors\/\">Project Orleans<\/a>, a distributed systems programming framework that makes life easier for programmers who aren\u2019t distributed systems experts. He also talks about the future of database systems in a cloud scale world, and reveals where he finds his research sweet spot along the academic industrial spectrum.<\/p>\n<h3>Related:<\/h3>\n<ul type=\"disc\">\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/podcast\">Microsoft Research Podcast<\/a>: View more podcasts on Microsoft.com<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/itunes.apple.com\/us\/podcast\/microsoft-research-a-podcast\/id1318021537?mt=2\">iTunes<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>: Subscribe and listen to new podcasts each week on iTunes<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/subscribebyemail.com\/www.blubrry.com\/feeds\/microsoftresearch.xml\">Email<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>: Subscribe and listen by email<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/subscribeonandroid.com\/www.blubrry.com\/feeds\/microsoftresearch.xml\">Android<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>: Subscribe and listen on Android<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/open.spotify.com\/show\/4ndjUXyL0hH1FXHgwIiTWU\">Spotify<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>: Listen on Spotify<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/www.blubrry.com\/feeds\/microsoftresearch.xml\">RSS feed<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/note.microsoft.com\/ww-registration-microsoft-research-newsletter-s.html?wt.mc_id=S-webpage_podcast\">Microsoft Research Newsletter<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>: Sign up to receive the latest news from Microsoft Research<\/li>\n<\/ul>\n<hr \/>\n<h3>Transcript<\/h3>\n<p>Phil Bernstein:\u00a0It\u2019s very unusual to have to build a data management service where it could be a blob store, a JSON store, a relational database, it could be anything, and it could be any one of many products for each of those data structures and yet you only want to have to build this feature once and then have it run successfully no matter what underlying storage system you\u2019re plugging in.<\/p>\n<p><b>Host: You\u2019re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I\u2019m your host, Gretchen Huizinga.<\/b><\/p>\n<p><b>Host: Forty years ago, database research was an \u201cexotic\u201d field and, because of its business data processing reputation, was not considered intellectually interesting in academic circles. But that didn\u2019t deter Dr. Philip Bernstein, now a Distinguished Scientist in MSR\u2019s Data Management, Exploration and Mining group, and a pioneer in the field.<\/b><\/p>\n<p><b>Today, Dr. Bernstein\u00a0<\/b><b>talks about his pioneering work in databases<\/b><b>\u00a0over the years and tells us all about Project Orleans, a distributed systems programming framework that makes life easier for programmers\u00a0<\/b><b>who aren\u2019t distributed systems experts<\/b><b>. He also talks about the future of database systems in a cloud scale world, and reveals where he finds his research sweet spot along the academic<\/b><b>&#8211;<\/b><b>industrial spectrum. That and much more on this episode of the Microsoft Research Podcast.<\/b><\/p>\n<p><b><i>(music plays)<\/i><\/b><\/p>\n<p><b>Host: Phil Bernstein, welcome to the podcast!<\/b><\/p>\n<p>Phil Bernstein: Thank you.<\/p>\n<p><b>Host: You\u2019re a Distinguished Scientist, and a bit of an OG at Microsoft Research and you\u2019ve been at the forefront of innovation in database technology for several decades, but currently you\u2019re working at MSR under the umbrella of the Data Management, Exploration and Mining Group, or DMX. Before we dive deeper on you and your specific work, give us an overview of DMX and how database research is situated in the broader framework of Microsoft Research today.<\/b><\/p>\n<p>Phil Bernstein: Well, Microsoft has a huge database business and the database business, in general, from the very beginning in the 70s, was largely driven by research and so research has always been a very important ingredient in improving database products, is this need to innovate all the time. And that comes both from the engine side, building the core technology to manipulate large amounts of data, complex data, but also the tools to make it possible to design the database, to be able to manage it, and the DMX Group covers both. It covers base engine\u00a0technology for manipulating data, for building cloud services, and then also for tools to integrate data, to find ways to reduce the cost of ownership by reducing the level of effort on the part of database administrators. So it\u2019s the full range, and that\u2019s where the data management, exploration, being able to look around and understand your data, and mining to really get into the tools to analyze the data afterwards.<\/p>\n<p><b>Host:\u00a0<\/b><b>Well<\/b><b>, let\u2019s situate you, now. How would you describe your \u201cresearch identity\u201d in terms of what gets you excited about the work you do and what gets you up in the morning?<\/b><\/p>\n<p>Phil Bernstein: Well, I look for high impact. I\u2019m trying to figure out what to work on that\u2019s going to make a difference, and also where my incremental value is going to be high because there aren\u2019t enough people working on it or paying attention to the problem. There are two technical areas where I\u2019ve focused, mostly, over the decades. One is transaction processing, which is how to build systems like online retail or banking systems, money transfer systems, those sorts of things, and that stuff is very low level, you\u2019re very deep into the database engine. And then, on the flip side, much higher level, the integration of data. So the data\u2019s is in the database. You\u2019ve got to manage it. How do you tie it together? And I\u2019ve worked on both, and I\u2019ve kind of flip-flopped back and forth between them depending on the problem of the day and where the short- and the medium-term opportunities tend to be.<\/p>\n<p><b>Host: Well, I want to take it back for a minute because you just mentioned a couple of topics that I think are important. You\u2019ve done some seminal work in transaction processing and distributed databases, so let\u2019s go back several years. Give us a snapshot of the computing landscape when you started and then tell us what changes you\u2019ve seen over the years, what things look like for researchers in the cloud era, and why understanding the past is helpful when innovating for the future?<\/b><\/p>\n<p>Phil Bernstein: Well, it\u2019s been a long road. I mean I started my research in the mid-1970s, so it\u2019s over forty years ago.\u00a0At that point, the database business was small. I mean, it was barely a business. It was absolutely disreputable as a research topic because data management sounded like business data processing, which was believed to be not intellectually interesting. You\u2019re writing Cobol programs and that was the end of it. And hardly any of the fundamental issues had really been explored at all, and the ones that had been explored,\u00a0certainly not in any depth. So the opportunities were everywhere. And in fact, early in my career I \u2013 I had to stop working on some problems not because they weren\u2019t interesting, but because there were just too many problems to work on. I had to \u2013 had to focus more in order to get something done. Also, those were the, you know, the mainframe computers, I mean, there was no distributed anything.<\/p>\n<p><b>Host: Right.<\/b><\/p>\n<p>Phil Bernstein: We knew it was coming, but it hadn\u2019t come yet. Database management was all in a glass house and it was, you know, glass enclosed, air-conditioned room used for business data processing, period\u2026<\/p>\n<p><b>Host: Hmm.<\/b><\/p>\n<p>Phil Bernstein: \u2026no personal computers. You know, you talked to people about working on computing and that was considered very exotic.<\/p>\n<p><b>Host: Right.<\/b><\/p>\n<p>Phil Bernstein: You know, now everybody\u2019s got one and \u2013 at least \u2013 and has a pretty good feel for what they all do.<\/p>\n<p><b>Host: In their pocket.<\/b><\/p>\n<p>Phil Bernstein: Yeah, it\u2019s really different.<\/p>\n<p><b>Host: Okay, so when did you start seeing changes and how did that impact what you were doing as a researcher?<\/b><\/p>\n<p>Phil Bernstein: There\u2019ve always been changes, so it\u2019s hard to say that there was any given point where the changes were really big. I started out looking at database design for my PhD research, but as soon as I left and embarked on my own career, I got involved in distributed databases, which seemed like one of the next big things, and I worked on it for many years. I stopped for a while\u00a0and then, with cloud computing, it all came back, and I\u2019m working on it again so\u2026<\/p>\n<p><b>Host: Right.<\/b><\/p>\n<p>Phil Bernstein: \u2026it\u2019s a sort of a pendulum. These topics come and go depending on what the workloads are that are needed, what the computing environment is that has to support the data management.<\/p>\n<p><b>Host: Well, the cloud presented a massive jump in scale for distributed systems, writ large, so you say you kind of came back into it. Was it because hey, this is a big, new nut to crack and I want to be in on it?<\/b><\/p>\n<p>Phil Bernstein: Certainly, I wanted to be in on it, but I was also asked to be in on it!<\/p>\n<p><b>Host: Help<\/b><b>!<\/b><\/p>\n<p>Phil Bernstein: At the time, it was\u2026 Microsoft was starting to think about changing its database strategy, its base products, to work on commodity hardware, to scale out on large numbers of inexpensive machines running in the data center. And they kind of looked around and, you know, who is it that we have on staff who knows something about this? And I was one of the people that they tapped and so we developed a new strategy and it took many, many years for that to unfold. This was back in 2006 so it was more than fifteen years ago, but where we are now with the products, we actually landed roughly where we were trying to get back from the beginning, so\u2026<\/p>\n<p><b>Host: Well, OK, so rounding out this four-part question that I kind of just laid out and am still walking through with you because I\u2019m really interested in this.\u00a0<\/b><b>Bill Buxton was on the show and he talked about the long nose of innovation<\/b><b>,<\/b><b>\u00a0and things come and go<\/b><b>,<\/b><b>\u00a0and the idea that you can\u2019t innovate for the future unless you really understand the pas<\/b><b>t, s<\/b><b>o why<\/b><b>,\u00a0<\/b><b>from a database perspective, is understanding the past helpful when you\u2019re trying to innovate for the future?<\/b><\/p>\n<p>Phil Bernstein: The set of mechanisms that we use to solve database problems, they don\u2019t change very fast. Back in the early days, we were learning about certain base technologies for the first time, but now, there\u2019s this repertoire of ingredients that you put into solving a database problem. I\u2019m very sympathetic to graduate students who are trying to learn this stuff because, you know, I learned it slowly over a period of many years as it was unfolding, but people getting into the field, they learn it in a very compressed amount of time and they don\u2019t necessarily have a deep understanding of why things are the way they are and so when they encounter a problem, they\u2019re trying to solve it just based on an understanding of the problem and then trip over some approach that they think, oh, I\u2019ll bet that would be helpful, but then they don\u2019t realize this is actually a variation on something that has been applied in several other contexts before.<\/p>\n<p><b><i>(music plays)<\/i><\/b><\/p>\n<p><b>Host: Well, let\u2019s get specific and talk about some of your current work and there\u2019s a project you\u2019ve been working on called Orleans, which you\u2019ve called, somewhat generally, a \u201cdistributed systems programming framework\u201d or a \u201cprogramming model and run time for building cloud-native services.\u201d Both are pretty high-level, so tell us what is Orleans and what\u2019s the motivation behind it? Or the pain point that prompted it?<\/b><\/p>\n<p>Phil Bernstein: So maybe we should start with, what\u2019s the programming\u00a0framework?<\/p>\n<p><b>Host: There you go.<\/b><\/p>\n<p>Phil Bernstein: So it\u2019s a form of middle-ware. That\u2019s to say that it\u2019s generic software. It\u2019s not application-specific but it\u2019s not a low-level platform either. Generally, a framework takes a bunch of services that are available from operating systems, networking, distributed systems, and\u00a0packages them up to be easier to use by integrating them in some nice way, and so Orleans is a programming framework. That\u2019s what it does is this integration of lower level services. The problem it\u2019s addressing is that of building distributed applications that run in data centers, in the cloud, on large numbers of machines. And the reason why this is a problem is that mainstream programmers\u00a0who\u00a0have learned how to build applications are generally not distributed systems experts and there are many ways to go wrong when you try to carve up an application and get it to run on a lot of servers. It needs to be elastic. That is to say, without changing the application, you need to be able to add servers if the workload increases, or reduce the number of servers if you don\u2019t have so many customers using it. It needs to be reliable because these machines are relatively inexpensive and they actually fail at a significant rate and so you don\u2019t want the whole thing to come tumbling down every time you lose a server. So if you\u2019re going to spread the workload on multiple machines, maybe you don\u2019t do it so well and one of the machines becomes a bottleneck and sort of the whole thing grinds to a halt because this one machine is being overtaxed. So these are the kinds of problems that an application developer faces and Orleans is basically trying to factor them out so that you don\u2019t have to worry about them at all. The framework does all that. You just focus on building the application.<\/p>\n<p><b>Host: All right. So that\u2019s kind of like the \u201cproblem statement\u201d of why it exists. Tell us how you would define it.<\/b><\/p>\n<p>Phil Bernstein: Maybe, from a practical standpoint, it would be good to just mention the kinds of applications you would use it for.<\/p>\n<p><b>Host: Right.<\/b><\/p>\n<p>Phil Bernstein: And these are what are sometimes characterized as \u201cstateful interactive services.\u201d What do I mean by that?<\/p>\n<p><b>Host: Yeah.<\/b><\/p>\n<p>Phil Bernstein: Well, maybe easiest to see by example: internet of things, games, telemetry to monitor some other system, typically a computer system, social networking, mobile computing. In all of these cases, the application is managing information about something going on in the world. That\u2019s the main application function. And so the second characteristic is that these applications are all object-oriented in the technical sense. Like object-oriented programming language.<\/p>\n<p><b>Host: Yeah.<\/b><\/p>\n<p>Phil Bernstein: In internet of things, the objects are, well, they\u2019re things, you know? They\u2019re sensors, they\u2019re devices of various kinds. In games, they might be things like players, games, scoreboards and the like. Obviously, for mobile computing, they\u2019re mobile devices. So in all\u00a0these cases, your application that\u2019s running in the cloud has objects that are surrogates or models of the physical thing, or logical thing in the case of games, that are out there in the world and so what you\u2019re doing is, the application is spreading its workload across servers by spreading the objects around. Now, if you want those objects to be spread around on multiple servers, they better not share memory because they may not be co-located on the same server\u2026<\/p>\n<p><b>Host: Right.<\/b><\/p>\n<p>Phil Bernstein: \u2026which means that the only way they\u2019re going to be able to interact is to send messages to each other. And another decision that was made in Orleans, is to have the objects be single-threaded. That there\u2019s no internal parallelism in these objects. And the reason for that is that programming, in that case, is much more challenging for application developers because now you\u2019ve got parallel activities that are going at the state of this object and they can trip over each other and so they need to synchronize, and engineers, historically, have a hard time getting that right. Conceptually, it doesn\u2019t sound that bad, but when you actually have to write programs that run at high speed and they access this shared state of the object, it actually is quite hard to get it right in all cases. So Orleans said, no, we\u2019re just not going to allow that. So objects are single-threaded and they don\u2019t share memory and they communicate by exchanging messages. Now, what\u2019s new in Orleans is something called the virtual actor model. And the characteristics that I just described, of single threading, no shared memory, message-based communication, in the technical literature, that\u2019s often called an actor. It\u2019s just another word for object that has these characteristics. And in the virtual actor model, the application developer does not control when the object is instantiated, when it\u2019s activated, where it\u2019s placed on machines, all of that is handled by the framework. What Orleans does in that case is that it will first, look around to see if the object is running and if it is, then it will perform the function that was requested. If it\u2019s not running, then Orleans will pick a server on which to activate the object, will spin up the object on that server, and then will do the invocation that was requested by the application and will remember where the object\u2019s located so that future calls can go to that copy that\u2019s already running. If the object isn\u2019t used for a while, Orleans will notice that also and it will deactivate the object and free up its resources. So it\u2019s sort of like a paging system in operating systems where you bring in pages of memory as needed and then evict them when they\u2019re no longer needed. It\u2019s sort of the same thing here, but it\u2019s being done with objects. And this was a new concept when Orleans was developed.<\/p>\n<p><b>Host: And when exactly was Orleans developed?<\/b><\/p>\n<p>Phil Bernstein: The project started, I think it was like 2008, 2009, in there.<\/p>\n<p><b>Host: Let\u2019s drill in a little more technically \u2013 you\u2019ve alluded to several of the things that I think are important about the project itself \u2013 and then unpack some of the big challenges\u00a0<\/b><b>you\u2019ve addressed<\/b><b>, like scalability and reliability, in the cloud scale world.<\/b><\/p>\n<p>Phil Bernstein: Reliability and scalability are natural consequences of the virtual actor model, so let\u2019s look at scalability. Remember that if you invoke an object and it\u2019s not running, Orleans will place it on a server. So it\u2019s up to Orleans to balance the load across all these servers. Ideally, when you activate an object that was not running, you want to put it on a lightly-loaded server so that you don\u2019t overload any other servers. So Orleans is in charge of keeping the load balanced across the servers and that enables scalability. Let\u2019s look at the reliability part. Suppose a server fails. Well, obviously all the objects that were running on that server are immediately gone, but the next time any of those objects are invoked, Orleans will recognize the fact that\u00a0they\u2019re not\u00a0running anymore and so it will just resurrect the object. It will just activate it on one of the servers that is healthy, that is running, and will continue making forward progress. So the application developer doesn\u2019t have to be too concerned about balancing the load across servers and doesn\u2019t have to be worried about fault tolerance, which is something that previous actor-oriented systems all exposed to the application developer. Orleans lets you forget about that. But there is one consequence of this, which is that when an object is activated, what state is it in? You know, what does it know about itself? And that is an application programming problem because, at the moment that that server fails and the objects go away, their state in main memory is lost.<\/p>\n<p><b>Host: Hmm.<\/b><\/p>\n<p>Phil Bernstein: And so when the object is reactivated on another server, it\u2019s going to be entirely up to the application program for that object to reinitialize its state. And state is another word for data, and reading data to initialize an object is just another way of saying it needs to do data management, and that\u2019s how I got into this game was that I said, gee, I think you folks could use some help because this is a pretty big burden on the application developer to figure out how to do all of this state management.<\/p>\n<p><b>Host: Well, talk about how this open source project has evolved and grown over the last few years.\u00a0<\/b><b>How have you added to the work and why have you moved in those directions?<\/b><\/p>\n<p>Phil Bernstein: Well, we\u2019ve gained a lot by being open source. Orleans was one of the first projects that went open source. As I said a little while ago,\u00a0I got into this because I could see that application developers had to do a lot of state management and that the standard abstractions that are part of the database repertoire are relevant to building these sorts of applications so maybe I can just start adding them, you know, add indexing, add transactions, add geo-distribution, replication,\u00a0and just make it easier for the application developer. I wasn\u2019t even sure if this was research because it was just applying what I knew about data management to yet another product, if you will. But it turned out that it was research, which I didn\u2019t really see going in\u00a0and it\u2019s research for two reasons. One is that it uses storage that\u2019s actually cloud storage. It\u2019s not storage that\u2019s running on the server with the application. That\u2019s very unusual. When you build a data management system, you expect to be able to control storage. I mean that\u2019s such an important ingredient in doing data management. But here, the storage is \u2013 it\u2019s a service. And the\u00a0second is, because its plug-in, it can be anything. Again,\u00a0it\u2019s\u00a0very unusual to have to build a data management service where it could be a blob store, a JSON store, a relational database, it could be anything, and it could be any one of many products for each of those data structures and yet you only want to have to build this feature once and then have it run successfully no matter what underlying storage system you\u2019re plugging in.\u00a0And that is a pretty unique challenge. It\u2019s not something I had ever seen done before, so it has required to re-think these abstractions from the beginning\u2026. And it\u2019s interesting.<\/p>\n<p><b>Host: So what have you done, additionally, or how have you \u201cnew and improved it,\u201d as it were?<\/b><\/p>\n<p>Phil Bernstein: Well, take transactions as an example. When you build a transaction system you have to keep track of which transactions have succeeded \u2013 which is called committing the transaction \u2013 and which ones have not. And that\u2019s generally done in a log, and that log is in storage. And the rate at which you can run the transactions is heavily dependent on the rate at which you can record that information in the log. So it\u2019s a good idea to have one log and be able to simply append these descriptions of transactions that start and commit in this log. But there\u2019s a problem here, which is that cloud storage doesn\u2019t offer a log. And so every database system I know of has a log and here we\u2019re going to implement transactions and there is no log.\u00a0Um\u2026\u00a0what are we supposed to do? And you know, so we said, well, we\u2019ve got plenty of storage, so I guess we\u2019re going to have to do our own log on top of cloud storage, which is what we did, and that worked, but it created some complexity in the system that our customers didn\u2019t like very much and we had to go back and do it again a different way because they didn\u2019t really want this custom log we had built\u2026<\/p>\n<p><b>Host: Interesting.<\/b><\/p>\n<p>Phil Bernstein: \u2026and so what we did was we re-did it so that we managed the state of the transaction as part of the state of the object. So we piggyback our own log information on the storage that\u2019s used by the object.<\/p>\n<p><b>Host: Interesting.<\/b><\/p>\n<p>Phil Bernstein: And that was something we hadn\u2019t seen done before, so\u2026<\/p>\n<p><b>Host: And how was that received?<\/b><\/p>\n<p>Phil Bernstein: They liked it a lot. That one stuck and that\u2019s what\u2019s shipping.<\/p>\n<p><b>Host: Every research project has at least one \u201cnot yet.\u201d I\u2019m putting that in air quotes. Probably a lot more than one\u2026 meaning things that we don\u2019t support yet, we can\u2019t do yet, that aren\u2019t on the map yet. What are the open problems that you still face in this arena and\u00a0<\/b><b>how do you think you\u2019re getting closer to \u2013 or at least thinking about getting closer to \u2013 solving them?<\/b><\/p>\n<p>Phil Bernstein: Well, one of the big things that we don\u2019t do that we want to do is what\u2019s now called serverless operation. What that means is that when you develop the application and deploy it, you\u2019re unaware of the fact that there are many servers out there. That\u2019s not reality today with Orleans because Orleans is simply a programming framework and when you develop your application, you have to explicitly reserve servers in Azure and then deploy your application on those servers. So you\u2019re very much aware of the fact that there are servers, and they\u2019re your servers, you\u2019ve reserved them to run your application. Now, what we\u2019d like is to have this be a serverless service where you don\u2019t know about any servers. You still write your application in the way you always have, and you just drop it in the in-hopper and press a button and our infrastructure on Azure then just grabs that code, and we take care of all this server stuff of provisioning the server and uploading the code to those servers and deal with the failures and add servers and reduce the number of servers and all of that stuff in a way that\u2019s completely transparent to the person, or the group, that\u2019s running this application operationally. So serverless operation is a big one. And the other is kind of related, which is just automating system management, capacity planning. Let\u2019s say you\u2019re a game developer and you\u2019ve got\u00a0thousands,\u00a0tens of thousands, perhaps, of gamers, you know, playing. Just monitoring it, figuring out what\u2019s going wrong, looking at the behavior of the users. Right now, that\u2019s all part of the application and yet it\u2019s something that every application developer faces. So why should everybody have to do this in a custom way, on their own? Can\u2019t we do something to automate it? And then third is, I mean, there\u2019s still data management abstractions which are not built into Orleans that I would be interested in adding someday. Um you know, we\u2019ve added some. We\u2019ve got transactions, indexing, geo-distribution are in there, but there are certainly are others that we could add over time, depending on the need of the applications and competing priorities.<\/p>\n<p><b><i>(music plays)<\/i><\/b><\/p>\n<p><b>Host: Phil, you\u2019ve been in every situation along the research spectrum from academia to industrial research to product, and back, and in that sense, you\u2019re kind of a walking, talking example of human tech transfer yourself. Talk about your experiences in each of these areas and what the value is, having had experience in each of them, as you\u2019ve landed here at Microsoft Research?<\/b><\/p>\n<p>Phil Bernstein: Sure. Well, you know, I have done all this other work. I was a professor, I did a start up, I was working for a hardware company for some years in product development, and now I\u2019m in industrial research in a software company, but I\u2019ve been at Microsoft for twenty five years so that says something about which one I prefer. But let me talk about some of the common features in all this work. New ideas are coins of the realm. I mean, your job is to come up with new ideas to solve. In other cases to solve a problem that maybe others have identified. The highest impact on this kind of work is generally done in teams, so you\u2019re always working with\u00a0other people. Customer pain points are generally good motivation for research. Partnerships are often worth nurturing. There are many activities that you\u2019re required to do as a\u00a0researcher. It\u2019s\u00a0not just doing research but it\u2019s also participating in the research community. It\u2019s writing research papers. It\u2019s reviewing research papers written by others. And everybody feels under time pressure to do well at all of these things and so learning how to align your activities so they\u2019re all pointing at the same goal are important. So all that is true for everybody. But beyond that, academic research, product development and industrial research are different in many ways. Academic research tends to be entrepreneurial. That the professor is generally running their own research group. That means you\u2019re writing grant proposals. You\u2019re expected to teach. There are committees. Everybody\u2019s got to do their share of committee work. So it\u2019s a very complex job, but when it works out well, it\u2019s super exciting. It\u2019s really like running your own company, although you\u2019re doing it in the context of a research group. Product development is quite different and so is industrial research because you end up doing a larger fraction of the work yourself.\u00a0In product development, you\u2019re writing specifications, you\u2019re writing a lot of code.\u00a0Speed is a virtue. You\u2019ve got to be willing to live with the fact that there\u2019s often insufficient time to do the complete solution you want because the product\u2019s got to go out the door at a certain time and if you\u2019re not ready with your piece, well, the train\u2019s going to leave the station whether you\u2019re on board or not and so, in order to ship products, you have to learn how to steer a path to the right technical compromises of what goes into the product and what gets saved for the next version. And when you\u2019re in academia you don\u2019t have to do that, you know. You just basically include everything you want to include and, you know, it doesn\u2019t have to be product-quality so it\u2019s okay.<\/p>\n<p><b>Host: Right!<\/b><\/p>\n<p>Phil Bernstein: On the other hand, shipping to a large audience is really a kick. I mean, getting feedback from grateful customers, it\u2019s a unique emotional experience that is really wonderful when it all works, and makes working long hours super worthwhile because you\u2019ve really done something that has a tangible effect in the world. Now, where does industrial research fit in this? You know, it\u2019s somewhere in between, right? It\u2019s research, but you\u2019re doing it in an industrial setting. Well, the main\u00a0thing\u00a0is that we have more time. We are not under the same time constraint so we can actually work out the details. We have more control over selecting our problems and so we can identify problems maybe the product group isn\u2019t even ready to think about yet, and, as I said, de-risk them, you know. Get it to the point where product group can pick it up and feel like they can put it on a schedule and they know how long it\u2019s going to take and have lots of confidence that it\u2019s actually going to work in the end.<\/p>\n<p><b>Host: We talked about what gets you up in the morning, Phil, but now\u2019s the time on the podcast where I ask what keeps you up at night? So what kinds of things keep you up at night and what are you and your colleagues doing about it?<\/b><\/p>\n<p>Phil Bernstein: What I think about most is, am I working on the right problem? Really, problem selection is everything in research. If I solve it, is it going to have high impact? Is it likely to be something that a product group is going to pick up? You know, what is the barrier to actually making it real? Maybe I understand the nature of the problem very well, but I don\u2019t have any really brilliant research idea on how to solve it. And sometimes I worry, you know, whether I\u2019m just too far ahead of my time, which is a unique thing about industrial research, you know. We don\u2019t tend to work on problems that the product group can solve. They\u2019re every bit as smart as we are and they have a lot more people, and so anything that they\u2019re going to do in the next couple of years, it\u2019s really not a good idea for us to work on. We have very little added value. And we don\u2019t really want to be working on stuff that\u2019s ten years out. That\u2019s a good thing in a university, but you\u2019ve got to pay the bills. So we tend to work in this two to five year range and there are times that I just get it wrong. I just, I think this is going to be an important thing four or five years from now, and two years into it, it still seems like it\u2019s going to be four or five years and it\u2019s just, the goal posts keep moving out and I think maybe this was not the ideal place to be. So that\u2019s probably the biggest\u00a0thing\u00a0that I worry about outside of doing the work itself.<\/p>\n<p><b>Host: Right. You have a long and varied path in high tech. We\u2019ve alluded to it a bit in our conversation. But I\u2019d love to hear your story. Tell us about your roots, your journey and your ultimate path. You know, give us the Reader\u2019s Digest version&#8230;. Is Reader\u2019s Digest even a thing anymore? Give us the Twitter\u2026<\/b><\/p>\n<p>Phil Bernstein: I\u2019m old enough to know!<\/p>\n<p><b>Host: Give us the tweet version!<\/b><\/p>\n<p>Phil Bernstein: Sure, and it really is a journey. When I look back on, there are so many forks in the road where, if I had taken the other path, it would have turned out very differently and I had no idea how. So I got a PhD in computer science and I had a choice between a research lab and a university. I went to a university. I became a professor at Harvard. That all sounds very impressive except that, at the time, Harvard\u2019s computer science department was not very good and so it\u2019s um\u2026<\/p>\n<p><b>Host: You made it good, Phil.<\/b><\/p>\n<p>Phil Bernstein: It was \u2013 it was impressive to people in the real world, but in the computer science world it was like, why would you go there? And I had done a lot of consulting on the side, partly to enrich my understanding of real problems, and partly because universities don\u2019t pay very well. And that led to a gig with a start-up doing computer development and they ultimately offered me to be in charge of their whole software operation and so I left academia and I became a vice president at a start-up for two years, and after about a year and a half I decided I really hated it, umm\u2026 and that was not the right place for me, and I actually went back to a university for a couple of years, completed some research that I had been doing before that sojourn at a start-up,\u00a0and then\u00a0they shut down the university,\u00a0um\u2026\u00a0which\u00a0um\u2026\u00a0was a bit of a shock, but it was a start-up university. It was called Wang Institute of Graduate Studies. Its goal was to create a professional degree program in software engineering, which is still a very good idea, much like, you know, a law school versus a philosophy department, or a medical school versus a biology department. Anyway, I had to go do something else, so I went to work for a hardware company, Digital Equipment Corporation, and worked on their transaction processing products for a while and then their middle-ware for data integration for a while, and then they started unraveling. I seem to have a history of this that um\u2026 and I don\u2019t think I was a cause, but\u2026 because of this work I had done on meta-data management and integration, I got a call from Microsoft to be architect for a development that they were doing in this area, Microsoft Repository. And so I took it and that\u2019s what brought me to Microsoft. I worked on that product for four years, at which point it became clear to me that there were just other things that the company thought were more important and so I moved back into research and that\u2019s where I\u2019ve been ever since.<\/p>\n<p><b>Host: Are you an east coast kid?<\/b><\/p>\n<p>Phil Bernstein: I grew up on the east coast, yeah, in New York, and then went to school in Canada at the University of Toronto.<\/p>\n<p><b>Host: Oh, you did?<\/b><\/p>\n<p>Phil Bernstein: And then after that I moved to Boston and I had this long string of jobs in Boston. Harvard, the start-up, back to another university, then Digital Equipment Corporation, so all living in the same place.<\/p>\n<p><b>Host: So you did the big jump to the west coast by Microsoft?<\/b><\/p>\n<p>Phil Bernstein: Yeah.<\/p>\n<p><b>Host: Tell us something we don\u2019t know about you. I\u2019ve been asking this in the context of whether it\u2019s a personal trait or a defining life moment that may have influenced a career in research, but if I\u2019m honest, I actually just want to know what goes on in the lives of researchers outside the lab. So however you want to answer it, Phil.<\/b><\/p>\n<p>Phil Bernstein: Something about me that you wouldn\u2019t ordinarily know: I am fascinated by finance, investing. Now you might think, that oh boy, you know, he really likes to make a lot of money and all, and I\u2019d love to make a lot of money. I\u2019m actually really bad at it. I mean it\u2019s like I don\u2019t manage my own savings. I delegate that to a professional. But what I like about it, it\u2019s endlessly complex, it\u2019s always changing, and there\u2019s one success metric. There\u2019s no way to fake it. Either you\u2019re making more money or you\u2019re not, so I\u2019m just totally hooked. I mean, I read a lot about it, you know, it\u2019s a hobby. I get no, really, personal benefit. My wife makes a joke that I sound very good. You know, people ask me about investments and I sound extremely\u00a0knowledgeable and all, and then she looks at me and she says, but how come you can\u2019t make any money? Well, you know, you can\u2019t have everything.<\/p>\n<p><b>Host: Well, Phil, it\u2019s time to wrap up. Before we go, I want to give you a chance to offer some parting advice to our listeners. And many of them are just getting started on their path to high tech research and you\u2019re a veteran in the field so you\u2019re in a unique position to impart some wisdom. Knowing what you know, and having done what you\u2019ve done over the course of your career, what thoughts would you share with our audience? I\u2019ll give you the last word.<\/b><\/p>\n<p>Phil Bernstein: Thanks for the opportunity. I actually have strong opinions on this one. I think the most important thing is to know what you\u2019re optimizing and I think there are only four possibilities: money, power, fame or personal happiness.\u00a0Now, everybody wants all four,\u00a0but if you don\u2019t prioritize one of them over the others, you might not get any of them at the level that you really want. There will be many forks in the road along the way, and if every time you face that fork in the road, you choose based on a different optimization criterion, you\u2019re lowering the chances that you\u2019re going to get the one that you want most. But beyond that there are many other little snippets of advice.\u00a0I\u2019ll try to do them quickly!\u00a0Early in your career, choose your research for the long term. It\u2019s so easy to pick something because it\u2019s a hot topic, but if you want to succeed in a big way, you want to be an expert at something that\u2019s going to be super-important fifteen years from now. When you\u2019ve gotten past that apprentice\/journeyman stage, you\u2019re now considered to be an expert and now this thing is super-important and you\u2019ve had fifteen years to really become one of the best people working in that area. So choose a topic where your incremental value is higher, which means probably it\u2019s going to be an unpopular topic, which means you have to be brave. Exploit what you\u2019re good at, but also work around what you\u2019re not good at and look for opportunities to grow. Also, you want to exploit synergies with your environment. Based on what\u2019s around you, you can get research leverage from the fact that your company is really good at a certain something and therefore you have a competitive edge in working in that area. But despite all of this, you still want to be flexible. Opportunities will show up randomly and that may turn out to be the most important thing in terms of your long-term success is that you grabbed the right opportunity at the right time, which might have been leaving behind something you had actually invested quite a bit of time in. And then finally, a piece of advice I got very early in my career as a researcher, which is, if you want to be good, write a lot of research papers. If you want to be great, never publish a weak one. Because you want people, when they see your name on a paper, they want to say, oh, his or her papers are always super interesting. I\u2019ve got to read this one. If only every third paper you write is like that, much less likely to get their attention. I\u2019ll stop there.<\/p>\n<p><b>Host: I could stay here for a long time because you\u2019ve given me some advice that I could use. These are great. Phil Bernstein, thank you for coming on!<\/b><\/p>\n<p>Phil Bernstein: My pleasure. Thank you, Gretchen.<\/p>\n<p><b><i>(music plays)<\/i><\/b><\/p>\n<p><b><i>To learn more about Dr. Philip Bernstein and the latest research in database management, exploration and mining, visit Microsoft.com\/research<\/i><\/b><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Forty years ago, database research was an \u201cexotic\u201d field and, because of its business data processing reputation, was not considered intellectually interesting in academic circles. But that didn\u2019t deter Dr. Philip Bernstein, now a Distinguished Scientist in MSR\u2019s Data Management, Exploration and Mining group, and a pioneer in the field. Today, Dr. Bernstein talks about his pioneering work in databases over the years and tells us all about Project Orleans, a distributed systems programming framework that makes life easier for programmers who aren\u2019t distributed systems experts. He also talks about the future of database systems in a cloud scale world, and reveals where he finds his research sweet spot along the academic industrial spectrum.<\/p>\n","protected":false},"author":37583,"featured_media":648063,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"https:\/\/player.blubrry.com\/id\/58290609\/","msr-podcast-episode":"114","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[],"msr_hide_image_in_river":0,"footnotes":""},"categories":[240054],"tags":[],"research-area":[13563,13560,13547],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-648054","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-msr-podcast","msr-research-area-data-platform-analytics","msr-research-area-programming-languages-software-engineering","msr-research-area-systems-and-networking","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"https:\/\/player.blubrry.com\/id\/58290609\/","podcast_episode":"114","msr_research_lab":[199565],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[957177],"related-projects":[170573],"related-events":[],"related-researchers":[],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/Research_Podcast_PhilBernstein_Site_1400x788-960x540.png\" class=\"img-object-cover\" alt=\"Photo of Dr. Philip Bernstein for the Microsoft Research Podcast\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/Research_Podcast_PhilBernstein_Site_1400x788-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/Research_Podcast_PhilBernstein_Site_1400x788-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/Research_Podcast_PhilBernstein_Site_1400x788-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/Research_Podcast_PhilBernstein_Site_1400x788-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/Research_Podcast_PhilBernstein_Site_1400x788-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/Research_Podcast_PhilBernstein_Site_1400x788-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/Research_Podcast_PhilBernstein_Site_1400x788-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/Research_Podcast_PhilBernstein_Site_1400x788-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/Research_Podcast_PhilBernstein_Site_1400x788-1280x720.png 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/04\/Research_Podcast_PhilBernstein_Site_1400x788.png 1400w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"","formattedDate":"April 8, 2020","formattedExcerpt":"Forty years ago, database research was an \u201cexotic\u201d field and, because of its business data processing reputation, was not considered intellectually interesting in academic circles. But that didn\u2019t deter Dr. Philip Bernstein, now a Distinguished Scientist in MSR\u2019s Data Management, Exploration and Mining group, and&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/648054","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/37583"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=648054"}],"version-history":[{"count":5,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/648054\/revisions"}],"predecessor-version":[{"id":668121,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/648054\/revisions\/668121"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/648063"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=648054"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=648054"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=648054"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=648054"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=648054"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=648054"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=648054"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=648054"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=648054"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=648054"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=648054"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}