PROSE Articles http://approjects.co.za/?big=en-us/research/ Thu, 06 Aug 2020 23:41:02 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.2 What is the new role of research in engineering? http://approjects.co.za/?big=en-us/research/articles/what-is-the-new-role-of-research-in-engineering/ Thu, 06 Aug 2020 20:29:42 +0000 http://approjects.co.za/?big=en-us/research/?post_type=msr-blog-post&p=683313 There I was, in the fall of 2015—just one week back at Microsoft after having left the company for a few years. I stared at my computer and struggled to understand how my new team, soon to be named PROSE, actually worked. Naturally, I wanted to fit in and influence the team’s direction in a […]

The post What is the new role of research in engineering? appeared first on Microsoft Research.

]]>
There I was, in the fall of 2015—just one week back at Microsoft after having left the company for a few years. I stared at my computer and struggled to understand how my new team, soon to be named PROSE, actually worked. Naturally, I wanted to fit in and influence the team’s direction in a positive way. As a first step, I had sent out an email with what I thought were simple questions with obvious answers.

Are we former researchers now building products? Or are we a research team embedded in a product organization? And what do our managers think we are?

The response was not at all what I expected.

I thought we were obviously a group of (mostly) former researchers now working as engineers in a big Microsoft product org. Asking those questions was really just a preamble to my plan to make a case for some changes. I thought I had been hired to help figure out how to ease the transition to shipping products.

My new manager made it crystal clear that the answer was something altogether different. And that meant I had to derail my thinking from some well worn ruts and start forging a new path for myself and the team I had joined.

It promised to be quite the journey. As fate would have it, I was fortunate to have had an earlier (painful) experience in my Microsoft career that prepared me well …

Case Study #1 – The Entity Framework Fiasco

“Are you guys completely insane? What were you thinking? Only an idiot would have built it that way!”

I don’t remember the exact words, but in my minds eye they were delivered by a large and very angry man with the utmost contempt. We sat in a conference room in one of the old buildings on the Microsoft Redmond campus, and the thirty or so other people packed into the room all nodded in agreement, their eyes fixed on me. It felt like an attack, yet these were our best friends. (The next day he posted a public apology to his credit.)

The meeting, which took place about twelve years ago, sticks with me as the first time I’d ever seen a group of customers get so passionate in voicing their displeasure. I’d worked on some big products at Microsoft since joining in the mid-90’s, and while conversations within a product team sometimes devolved to less productive, higher volume tirades, it never happened with customers. Until now.

Plus, these customers were special. They were MVPs, or “Most Valuable Partners”—a unique designation of technology experts who evangelize for Microsoft products, and some of our best and smartest engineering advocates. And they were pissed.

The topic of the day was an update on a promising new technology: The Entity Framework (EF), a set of technologies in ADO.NET that let developers work with data in the form of domain-specific objects like customer addresses, without having to worry so much about the details of how the data is stored. We knew it was the right thing to build because it promised to dramatically improve data access for .NET developers, but we were blind to all that was wrong with it.

I had the dubious honor of receiving the brunt of this outrage because I had become the primary Microsoft face of EF. I straddled two worlds—one of them external, where I introduced EF to customers, and the other internal, where I led the charge of representing the customer point of view to the engineering team. And an ugly point of view it was. Version 1 of EF was so widely criticized that nearly a thousand developers signed a now-infamous “Vote of No Confidence” in the product.

Looking back now, it’s hard to reconcile the current status of EF with that first, painful version. Today there are two variants of EF, each with over 85 million NuGet downloads. The team has proudly embraced the unicorn as their mascot based on the widely acclaimed “Magic Unicorn Edition” of the software. EF is in a good place, but that’s the result of a lot of hard work to remake something which had a rocky start.

What went wrong with EF?

We can point to a lot of causes for that difficult time, but I’ll tell you about one little known fact which is at least partly at fault: EF was originally built around technology from Microsoft Research (MSR).

Some folks at MSR had a great idea for addressing one of the hardest parts of Object/Relational Mapping. As plans came together to build the new product, the product team decided to take MSR’s code and one of their researchers into the team. On the surface there were some technical difficulties, but we thought we understood them. And we all felt that the potential advantage of an innovative new approach sounded great. In fact, our goal was not just to build another Object-Relational Mapping (ORM) library but to begin a new wave of related efforts around something we called the Entity Data Model.

The problems we encountered probably aren’t too hard to predict, especially with the benefit of hindsight and Microsoft’s broader evolution in the last 10 years. Foremost is the fact that we didn’t listen to our customers up front or frequently enough along the way.

Don’t get me wrong, product teams have been guilty of this particular mistake many times over the years, and we were actually trying to connect to customers the best way we knew how. But in our case the problem was exacerbated by the nature of our partnership with MSR. We were overly focused on two things: 1) getting all that innovative research goodness into the product, and 2) addressing the code quality and engineering breadth issues that came along with a codebase that started not as something shipped in a product but rather to facilitate research.

Eventually we got over the initial transition from MSR and started connecting with customers much more effectively, enabling us to create technology that was much better at meeting their needs. Not only did the process amount to a number of changes to the initial codebase over the course of several years, but to truly build the right product, the team eventually completely rewrote everything from scratch in a second codebase which likely doesn’t have any of the original MSR code (or even the core concepts behind it).

EF is now in a good place, but I would argue that happened in-spite-of, not because-of the MSR collaboration. Does this mean I believe we should stop investing in MSR or research at Microsoft? No way! In fact, when I returned to Microsoft 5 years ago, I was tremendously excited by the prospect of bringing my perspective and experience building products to the task of helping researchers ship software. While it turns out that I was mistaken about the nature of that task, the results speak for themselves:

Case Study #2 – Parsing Text Files by Example in Power Query

“Not certain how you managed to get me excited about TEXT files but you did it!”

That was just one of the many gratifying comments heard from MVPs at a recent (virtual) meeting with a Microsoft product team. “This feature is freaking awesome!!!” said another. Also: gifs of fireworks behind the Eiffel Tower and Kramer from Seinfeld gesturing to indicate his mind was blown.

These reactions are just the most recent fruit of a collaboration between research and engineering, an experience that was the polar opposite of my EF MVP debacle.

How did we get there? What changed? What did we learn?

The most important thing to understand is that we didn’t turn those researchers I’ve been working with over the last few years into engineers. Instead, we created a team comprised of engineers and researchers working together, one that partners with product teams to bring research innovations to the hands of customers. It took persistent effort from each of these groups to make that possible.

The person behind this particular technology is my research colleague, Vu Le, who has been working on it for more than 5 years. Each time a new Microsoft product team chooses to collaborate with us, the technology has needed to morph into a new form and respond to new requirements. When Vu first brought the idea to the Power Query team, it took some convincing to show the value, but they soon came on board. And in doing so, they brought their own new requirements, which forced research to respond. The product team had a solid relationship with their customers and could help refine the ideas into something that really met their needs. Not only that, but the code being produced was the result of a joint effort with the engineers on our team. The engineers and researchers worked together to ensure it was built the right way from the start. It didn’t need to be rewritten, rethought or restarted in order to be incorporated into Power Query’s product.

To be clear, the researcher did not come up with an idea and then hand off to an engineer to implement it—the engineers on the team played a supporting role that enabled the researcher to efficiently write production-quality code. Engineers also played an important role in understanding products and users to help make sure the ideas we chose to invest in were well positioned to have impact. And, when there were elements that didn’t involve research work, the engineers picked up those tasks to allow each team member to focus on their unique area of expertise.

Rarely does a significant accomplishment come easily, and the experience of the PROSE team is no exception. At this point we have been evolving our processes for almost five years, and I expect to continue evolving for the next five years or however long we stay together. Our partnership with the Power Query team is several years old, and while there have been ups and downs, the impact is real.

We have not yet arrived at our destination, but our progress along this path brings us much closer to our goal than we were with EF, and I’m so glad to be part of a blended research and engineering group rather than an engineering team that happens to include some former researchers.

What does it mean to blend research & engineering?

So, is this the model for getting all that research goodness into the hands of our customers?

No.

There’s no doubt that things will continue to evolve and adapt. There are certainly multiple ways for researchers and engineers to successfully collaborate. But, we have learned some important lessons that will help researchers and engineers be more effective together:

  1. Accept the value and differences of both roles. Researchers and engineers bring unique strengths to a project, and the best collaborations celebrate and leverage the efforts and backgrounds of both. Early in the EF experience, we tried to turn a researcher into an engineer, but it was a bad idea. He was unhappy and eventually left. Even if he had stayed, he would have been fated to become only another engineer. We never would have realized the kind of innovation the Power Query team enjoyed with the unique contributions that can only come from years of sustained research.
  2. Bring the two roles closer to one another. When there is a big fence between engineering and research, a host of problems appear. The loss of efficiency that comes when an engineering team has to rewrite research prototype code into production form is just the beginning. Even more insidious is the fact that, as in the case of the EF, the engineering team’s views may be dominated by the viewpoint of researchers who have not been immersed in the scenarios and needs of the customers. As bad as this is for the product team, the researchers also miss out since they don’t learn about entire new areas of research that customers could help them identify.
  3. Don’t suck. “Duh!” you say, but I’m serious here. When faced with a new situation (like being the only engineer dropped into a team of researchers), it’s easy to get caught up in the differences and forget to find the similarities. When we blend researchers and engineers to create innovative software, we’re still creating software. Almost all the lessons our industry has been learning over the years still apply. We as engineers have some bright new people with different backgrounds joining the effort, but we also have a lot of hard won experience of our own to bring to the table. If there’s one critical thing I’ve taken away from the agile movement, for instance, it is continuous improvement. Take in the new parameters, add them to our existing learning, and adapt.

– Danny Simmons, August 2020

I would love to hear about your experiences at the boundary between research and engineering. Contact me at dsimmons@microsoft.com.

The post What is the new role of research in engineering? appeared first on Microsoft Research.

]]>
Getting data on the table: PROSE-powered Data Extraction http://approjects.co.za/?big=en-us/research/articles/getting-data-on-the-table-prose-powered-data-extraction/ Mon, 01 Jun 2020 16:00:06 +0000 http://approjects.co.za/?big=en-us/research/?post_type=msr-blog-post&p=675270 When the COVID-19 pandemic was in its early stages, several agencies published infection and mortality data for different geographical regions in the public domain.  This data appeared in web pages, CSV files, JSON files, and more.  There was plenty of useful data out there, but before one could use this data to generate models and […]

The post Getting data on the table: PROSE-powered Data Extraction appeared first on Microsoft Research.

]]>
When the COVID-19 pandemic was in its early stages, several agencies published infection and mortality data for different geographical regions in the public domain.  This data appeared in web pages, CSV files, JSON files, and more.  There was plenty of useful data out there, but before one could use this data to generate models and visualizations, one had to ingest the data into a tabular data frame and clean it.  The task of extracting tables from the varied data sources is often the price one has to pay before reaping the benefit of insights gained from downstream data analysis.

​Can we ease the pain in ingesting data? The PROSE team has built a SDK that provides an intelligent “read file” library call.  This is envisioned as a one-stop shop for all data ingestion needs. The underlying technology, based on program synthesis, has been developed over a time period of about 6 years.  The early investment in its research and development continues to pay dividends even today.  The “data extraction from text” technology within PROSE has surfaced in a variety of products already: PowerShell’s ConvertFrom-String, Import Flat File Wizard in SSMS, and importing data from files in Power Query.

Any product that works on data imported from a file can potentially use PROSE’s data extraction technology.  However, every product brings its own requirements on what information it can provide and consume and what user interaction model it can support.  Consequently, the PROSE “read file” library supports a very permissive interface: it is flexible in what it accepts as input, and it provides detailed output. The minimal input is the file contents. In this case, the table extraction happens completely predictively.  However, users can provide more. For example, users can provide information about the file, schema for the data, examples of the rows/columns in the expected output, and choice of delimiter.  On the output, the PROSE “file reader” provides not only the output table, but also the parameters that were used to successfully parse the file into a table.  It also provides code. A product can choose to just use some part of the output—just the code, the output table, or only the learnt parameters. The diversity in PROSE-enabled data extraction experience ranges from text-based command-line interfaces in PowerShell’s ConvertFrom-String, to UI forms in the Import Flat File Wizard in SSMS, and to a rich UI that shows and explains the generated code in Power Query.

Import Text By Examples

Power Query’s Import Text by Examples feature in action.

​Some of our recent and upcoming efforts include a Python backend which generates readable Python code for extracting data from text files,  PROSE-enabled predictive data import in VS Model Builder (available in VS 16.6 preview), and interactive data import in Azure Notebooks.  They are all powered by the same underlying core technology.

Indeed, the recent preview release of PROSE technology inside PowerQuery Text Connector had already helped users like Reid Haves (MVP) to easily ingest and transform complex data, a feature which he describes as “incredible.” PROSE technology continues to play an essential role in supporting our users and making them more productive—whenever, wherever, and however they work.

The post Getting data on the table: PROSE-powered Data Extraction appeared first on Microsoft Research.

]]>