{"id":579994,"date":"2019-04-24T07:59:09","date_gmt":"2019-04-24T14:59:09","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=579994"},"modified":"2019-06-18T08:07:58","modified_gmt":"2019-06-18T15:07:58","slug":"froid-and-the-relational-database-query-quandry-with-dr-karthik-ramachandra","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/podcast\/froid-and-the-relational-database-query-quandry-with-dr-karthik-ramachandra\/","title":{"rendered":"Froid and the relational database query quandary with Dr. Karthik Ramachandra"},"content":{"rendered":"
<\/a><\/p>\n In the world of relational databases, structured query language, or SQL, has long been King of the Queries, primarily because of its ubiquity and unparalleled performance. But many users prefer a mix of imperative programming, along with declarative SQL, because its user-defined functions (or UDFs) allow for good software engineering practices like modularity, readability and re-usability. Sadly, these benefits have traditionally come with a huge performance penalty, rendering them impractical in most situations. That bothered Dr. Karthik Ramachandra<\/a>, a Senior Applied Scientist at Microsoft Research India<\/a>, so he\u2019s spent a great deal of his career working on improving an imperative complement to SQL in database systems.<\/p>\n Today, Dr. Ramachandra gives us an overview of the historic trade-offs between declarative and imperative programming paradigms, tells us some fantastic stories, including The Tale of Two Engineers and The UDF Story, Parts 1 and 2, and introduces us to Froid<\/a> \u2013 that\u2019s F-R-O-I-D, not the Austrian psychoanalyst \u2013 which is an extensible, language-agnostic framework for optimizing imperative functions in databases, offering the benefits of UDFs without sacrificing performance.<\/p>\n Karthik Ramachandra: To start the story right, if you look at a database like Microsoft SQL Server<\/a>, which is what the focus of our work has been so far, SQL server introduced scalar user-defined functions way back in 2000 as a means for users to be able to express their custom behavior, you know? Some of this custom logic is easier expressed using imperative code, so there was a demand for it, and they introduced this feature. It was good and happy, but in a few years, people realized that scalar UDFs are good when it comes to modularity and code re-use and some other metrics, but with respect to performance, it turns out that they\u2019re evil.<\/p>\n Host: You\u2019re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I\u2019m your host, Gretchen Huizinga.<\/strong><\/p>\n Host: In the world of relational databases, structured query language, or SQL, has long been King of the Queries, primarily because of its ubiquity and unparalleled performance. But many users prefer a mix of imperative programming, along with declarative SQL, because its user-defined functions (or UDFs) allow for good software engineering practices like modularity, readability and re-usability. Sadly, these benefits have traditionally come with a huge performance penalty, rendering them impractical in most situations. That bothered Dr. Karthik Ramachandra, a Senior Applied Scientist at Microsoft Research India, so he\u2019s spent a great deal of his career working on improving an imperative complement to SQL in database systems.<\/strong><\/p>\n Today, Dr. Ramachandra gives us an overview of the historic trade-offs between declarative and imperative programming paradigms, tells us some fantastic stories, including The Tale of Two Engineers and The UDF Story, Parts 1 and 2, and introduces us to Froid \u2013 that\u2019s F-R-O-I-D, not the Austrian psychoanalyst \u2013 which is an extensible, language-agnostic framework for optimizing imperative functions in databases, offering the benefits of UDFs without sacrificing performance. That and much more on this episode of the Microsoft Research Podcast.<\/p>\n Host: Karthik Ramachandra, welcome to the podcast.<\/strong><\/p>\n Karthik Ramachandra: Thank you, I\u2019m happy to be here!<\/p>\n Host: You\u2019re a senior applied scientist at the Microsoft Research Lab in India in Bangalore, and I\u2019m lucky to have you in the booth today. Good to see you face to face. Met you at Techfest.<\/strong><\/p>\n Karthik Ramachandra: Yup.<\/p>\n Host: Now we\u2019re in the booth.<\/strong><\/p>\n Karthik Ramachandra: Yup.<\/p>\n Host: Tell us, what does a senior applied scientist do for a living? What gets you up in the morning?<\/strong><\/p>\n Karthik Ramachandra: Well, as an applied scientist, the good thing about my job is that I get to ask the really hard questions which are not commonly asked. And the other thing is I get to work with really smart people to solve those problems. And the third thing is that the problems that I solve have the potential of impacting a huge customer base, worldwide, which is very inspiring and exciting to me, and that gets me excited every day.<\/p>\n Host: Yeah, so on the spectrum there, there\u2019s a sliding scale of pure research to applied research, or industrial research. Where do you fall on that spectrum? Because you\u2019re working in a pure research institution, really, but it has some play in it, right?<\/strong><\/p>\n Karthik Ramachandra: Yeah, so I like to place myself in the middle of the spectrum. My interest is in taking ideas and technologies that are coming out of pure research outcomes, and then coming up with a way to make them practical or make them real in real systems and reach to customers and users. So, I like being in the middle there, to be a bridge between research and practice, so that\u2019s where I think I would place myself.<\/p>\n Host: I love that, because, as we\u2019re going to find out shortly in this interview, you\u2019re also bridging some other kinds…<\/strong><\/p>\n Karthik Ramachandra: Yes.<\/p>\n Host: …of technologies together. So maybe that\u2019s your calling in life is to \u201cbe the bridge.\u201d<\/strong><\/p>\n Karthik Ramachandra: Yeah, yeah, I think so, maybe.<\/p>\n Host: Well, let\u2019s set up our podcast with a \u201cVirtual Earth 3D<\/a>\u201d view of relational databases, which is the heart and soul of your work.<\/strong><\/p>\n Karthik Ramachandra: Yes.<\/p>\n Host: Tell us about the two main programming paradigms in the relational database world and their relative strengths and weaknesses, so we\u2019re at least in a mindset to understand how your current work is reconciling them.<\/strong><\/p>\n Karthik Ramachandra: Yup. So, if you look at relational databases today, the primary way to interact with the database is through this language called SQL, or structured query language, which falls under this declarative paradigm of programming, which basically says the user needs to tell the system what they need in this declarative high-level language, and the system figures out an efficient way to do what the user has asked. So that\u2019s sort of one main paradigm, or the primary way we interact with databases today. That comes with the advantage that, you know, the users can stay at a higher level of abstraction, not having to go to the detailed implementation of how things are done. And it also allows the system to optimize and come up with efficient algorithms to solve the query or the question that the user is trying to ask.<\/p>\n Host: Yeah.<\/strong><\/p>\n Karthik Ramachandra: That is one paradigm, and on the other side, we have this imperative program style which is a slightly lower level of abstraction in the sense you are basically telling the system how to go about doing what you want it to do. And, as a result, you\u2019re sort of binding the system to implement it in the way you are telling it to do. The advantage in imperative programming languages is that you have more scope for modularizing and reusing code and so on, but there\u2019s a limited scope for the system to figure out efficient ways to do data processing.<\/p>\n Host: So declarative, you\u2019re telling the computer what to do but not how to do it\u2026<\/strong><\/p>\n Karthik Ramachandra: Yes, yes.<\/p>\n Host: \u2026or telling the database what you want, but not how.<\/strong><\/p>\n Karthik Ramachandra: Yes, yes. Yes.<\/p>\n Host: And the imperative, you have to know a bit more, or you have to be more specific on how you want it to carry it out?<\/strong><\/p>\n Karthik Ramachandra: It\u2019s more like a preference, sort of. In many cases, you can express your requirement in either paradigm. But in my opinion, it\u2019s just a matter of choice.<\/p>\n Host: Well, there you go. Because on the larger scale, we\u2019re going to be talking today about why you\u2019re leaning towards the imperative paradigm, and so let\u2019s go there right now. I usually try to set my questions up in a progressive, somewhat linear manner. But your life and career and your work all kind of go together and they\u2019re intertwined, so I\u2019m going to go a little more freeform today and have you tell us some stories. Because I know you\u2019re good at storytelling. I want to start with one that will put a finer point on this declarative\/imperative approach, the tension between the two.<\/strong><\/p>\n Karthik Ramachandra: Yes, yes.<\/p>\n Host: You call it a Tale of Two Engineers. You actually have a name for this story, which I love, being a lit major. Uh, tell us the story.<\/strong><\/p>\n Karthik Ramachandra: Yeah, yeah. So, this is a tale that I always use when I want to drive home the point about this problem. So, let\u2019s say there\u2019s this e-commerce company like an online retail firm, right? Typically, these companies have a large database of customers who are placing orders and also the order information, like what orders are placed and so on. So, it turns out that in one such company, there is like a manager who owns this data and is responsible for doing some data analytics on this data. And this company now has this new requirement that they want to introduce something like a rewards program where they want to help the loyal customers by having some offers and so on. So, they want to basically categorize their customers into, let\u2019s say, three categories like platinum, gold and regular, and they have some simple logic to do this. If you bought stuff worth some amount of money or more, then you are platinum, otherwise you are regular and so on. Based on how much business you do with them you fall into one of these buckets. So, she calls two of her engineers on her team and tells them, look, we have this new requirement, please go and implement this and give me a report which shows all the customers, and which bucket they fall into.<\/p>\n Host: Right.<\/strong><\/p>\n Karthik Ramachandra: So, it turns out these two engineers get into a fight about how to do this. I mean, this happens. This is not very uncommon.<\/p>\n Host: Software wars.<\/strong><\/p>\n Karthik Ramachandra: Yes. So, it turns out that these two engineers decide that they will do it in their own way and they both go off on their own. One of them happens to be an SQL expert, right? So, he\u2019s done a lot of SQL in his past years of work. So, he comes up with one complex SQL query which can answer this problem right away, right? So, it\u2019s one query, but it does the job, but probably only he can understand it because it\u2019s quite complex.<\/p>\n Host: Right.<\/strong><\/p>\n Karthik Ramachandra: The other engineer is a programmer. He\u2019s not an SQL expert, so he writes a simple query and writes an imperative, user-defined function which, again, does the same thing in a different way, right? So, he writes it using variables and if\/then\/else and conditional branching and so on and different constructs that are common. So now both of them come back to the manager with their respective solutions. Both of them think that their solution is better, so they come to the manager and show their solution. Well, it turns out that the manager runs both of those and decides to promote one of them and fire the other one. So, this is a more dramatic part of the story. We may not fire them, but the point is that one of them is promoted and the other one is not. Can you guess the reason?<\/p>\n Host: I cannot. Not even. I would imagine…<\/strong><\/p>\n Karthik Ramachandra: Can you guess who was fired and who was promoted?<\/p>\n Host: Well, I\u2019ve seen the slides, so I know the end of the story, but I know our listeners don\u2019t!<\/strong><\/p>\n Karthik Ramachandra: So, it turns out that the SQL expert was promoted, and the imperative programmer was fired in this case. And the reason being that the SQL query ran in a couple of minutes over the database of millions of customers and billions of orders, whereas the imperative function took a few hours to run on their database. And that was not acceptable to the business, so they had to choose the more efficient solution, despite the fact that the function had other benefits to it.<\/p>\n Host: Okay.<\/strong><\/p>\n Karthik Ramachandra: So, this sort of demonstrates that both solutions are correct. They give the right answer, right? They\u2019re not doing something wrong. But just because you wrote the program in a different way, you are penalized, right? So that\u2019s the tension there that I\u2019m trying to reconcile.<\/p>\n Host: Okay, and I love this because I already know the other end of the story, which is the cool technology you\u2019re working on right now. But let\u2019s diverge again, because there\u2019s some other setups I want to do. Normally, I wait to have you tell about yourself until the end of the podcast, but I want you to do that now, since I also happen to know that your early academic and professional experiences led directly to the research you\u2019re doing. So, tell us how the first job you got after your graduation led to a nagging dissatisfaction that got you here now.<\/strong><\/p>\n Karthik Ramachandra: Yeah. Yeah, that\u2019s actually an interesting story. So, as a part of job earlier as a software developer and a tech lead, we had this requirement from a client where we had to build a dashboard based on data that was present in a relational database backend. So, we built a nice little dashboard tool which could do these reports as the customer wanted, and we used all the good programming practices by writing modular code and following all the design patterns that are recommended in best practices for programming and so on.<\/p>\n Host: Hmm.<\/strong><\/p>\n Karthik Ramachandra: But it turned out that the tool that we built, although it was doing its job, it failed the performance requirement miserably because the scale of the data was huge, and the way we had written our tool, it could not scale to, you know, large data sets and multiple concurrent users using it at the same time. So, it was basically not able to match the performance requirement. As a result, when we did the analysis and figured out what was the reason, it turned out that we had to manually remove some of these good programming practices and undo some of these good things that we had done in terms of software engineering practices, and we had to rewrite a lot of our programs as huge SQL queries which did the job more efficiently, but in terms of maintainability and other factors, we lost out on something. So, in some sense, we had to trade off readability and modularity for the sake of performance. So, we did this, and the customer was happy after that, but what left me nagging with this whole experience was, why should we do this trade off, right? Is there a way to get both performance and this flexibility and this modularity together? I don\u2019t want to give up on either of them, right? So that\u2019s sort of what got me thinking. And when I left my job and went to grad school at IIT Bombay, I saw that my advisor was actually with another PhD student at the time, was doing something very similar. In fact, recently, they had started looking at the same problem. So, at that moment, I realized that I had come to the right place, and I just joined them and took that project forward. Karthik Ramachandra: So that\u2019s how this all started.<\/p>\n Host: So, did you actually quit your job to go back to school because you were bothered by this, or…<\/strong><\/p>\n Karthik Ramachandra: Well, uh…<\/p>\n Host: …was it more complex than that?<\/strong><\/p>\n Karthik Ramachandra: Yeah, I mean, it was not that I quit my job and went to grad school purely for this particular problem.<\/p>\n Host: Yeah.<\/strong><\/p>\n Karthik Ramachandra: I had the desire to go back to school and go deeper into some area of computer science anyway, and this fell in place really well because, this was also a problem that was at the back of my mind, and I saw that my advisor, Sudarshan, at IIT Bombay, he was also looking at the same problem with another PhD student, so it was just like something that clicked immediately.<\/p>\n Host: I\u2019m just going to say, too, this is a kind of \u201clife wisdom\u201d thing: get a job after undergrad before you go back to graduate school to find out what the real world is doing and what problems there are, and maybe it\u2019ll inspire you for…<\/strong><\/p>\n Karthik Ramachandra: Exactly, yeah. Especially for these applied research areas, right, where the problems need to be motivated by real needs.<\/p>\n Host: Yeah.<\/strong><\/p>\n Karthik Ramachandra: It was really valuable, in hindsight. I knew that I would go to grad school when I took up my first job, but in hindsight, it\u2019s clear to me very much that my experience before getting into grad school has helped a lot in the research that I\u2019ve been doing and all the work that I\u2019ve been doing.<\/p>\n (music plays)<\/p>\n Host: All right, well let\u2019s keep this story train going. You have another one, and it has two parts. I love the fact that you title your stories. This one is called The UDF Story, and we haven\u2019t addressed what UDF is, but it\u2019s integral to the imperative paradigm, so tell us the UDF Story, Part 1 and Part 2, or how UDFs went from evil to magic in less than 20 years!<\/strong><\/p>\n Karthik Ramachandra: Yup. That\u2019s a nice subtitle for my story. Well, so UDF stands for user-defined function, and it\u2019s essentially a way where, in a database system, you can have imperative programs like this return as user-defined functions which can be called from an SQL query, like from a declarative query you can call into this imperative piece of code which will get executed as part of the query. So that\u2019s what UDF stands for. So, to start the story right, if you look at a database like Microsoft SQL Server, which is what the focus of our work has been so far, SQL server introduced scalar user-defined functions way back in 2000 as a means for users to be able to express their custom behavior, you know? Some of this custom logic is easier expressed using imperative code, so there was a demand for it, and they introduced this feature. It was good and happy, but in a few years, people realized that scalar UDFs are evil when it comes to performance. So they are good when it comes to modularity and code we use and some other metrics, but with respect to performance, it turns out that they\u2019re evil, and evil is not my choice of the word, but these are articles which people have written, experts.<\/p>\n Host: And tweets and…<\/strong><\/p>\n Karthik Ramachandra: And tweets and, yeah, blog posts and all that on the internet you will find a lot of those articles. And this is not specific to SQL server. This thing is common to all relational databases.<\/p>\n Host: Sure.<\/strong><\/p>\n Karthik Ramachandra: But our focus was SQL server. So, it turned out that, yeah, it went to such an extent that we, ourselves, as Microsoft, we had to advise our customers to avoid using user-defined functions whenever performance mattered to them.<\/p>\n Host: So, so let\u2019s clarify. Evil means slow\u2026<\/strong><\/p>\n Karthik Ramachandra: Yeah, yeah, evil with respect to performance, yes.<\/p>\n Host: I mean, so \u2013 so that\u2019s just an indictment on our culture. We have zero patience. I get it, though. I mean, time is money for corporations, so it does matter. But really, we\u2019re talking slow.<\/strong><\/p>\n Karthik Ramachandra: Yes, uhh\u2026<\/p>\n Host: All right, so I interrupted the story, and you were…<\/strong><\/p>\n Karthik Ramachandra: Yeah.<\/p>\n Host: …at the point where you said, Microsoft said, \u201cDon\u2019t use UDFs.\u201d<\/strong><\/p>\n Karthik Ramachandra: Yeah, so we had blog posts in MSDN and our own blog engines where we advised customers to avoid using UDFs. So, there was a lot of this negativity which was there, and that continued even until, I mean, there are articles even until 2016 and so on where people have kept complaining about it. So, this was around 2010, I think, and that was when I had joined with my PhD at IIT Bombay. And around 2012, this other PhD student, with my advisor who had then moved to IIT Hyderabad, another IIT in India, and we started working on this idea of a way to optimize such user-defined functions. So, this was a collaboration with a bunch of people, and we came up with a publication in 2014 which was sort of one of the first papers that said, you know, you can actually optimize these user-defined functions that run in a database. And I graduated in the same year, 2014, and joined Microsoft in Madison, where we have this lab called Gray Systems Lab.<\/p>\nEpisode 73, April 24, 2019<\/h3>\n
Related:<\/h3>\n
\n
\nFinal Transcript<\/h3>\n