{"id":893295,"date":"2022-10-31T15:42:00","date_gmt":"2022-10-31T22:42:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-blog-post&p=893295"},"modified":"2022-10-31T15:42:01","modified_gmt":"2022-10-31T22:42:01","slug":"power-automate-with-copilot-the-back-story","status":"publish","type":"msr-blog-post","link":"https:\/\/www.microsoft.com\/en-us\/research\/articles\/power-automate-with-copilot-the-back-story\/","title":{"rendered":"Power Automate with copilot; the back story"},"content":{"rendered":"\n
Authors: <\/strong>Will Dubyak, Chhaya Methani<\/p>\n\n\n\n With Satya\u2019s copilot announcements (Microsoft Ignite Opening (opens in new tab)<\/span><\/a> )at Ignite in the rear-view mirror, it\u2019s a good time to talk more about the kind of work and creative thinking that made it possible. If you aren\u2019t already familiar with the new ways to innovate with AI, such as the AI-based copilot to build your flow in seconds, check out the Microsoft Power Automate blog post (opens in new tab)<\/span><\/a>. The idea that a plain language prompt can be used to generate a sophisticated automated workflow is powerful, and a glimpse into what the future holds with innovative large language models. But the path to this point was anything but easy and automatic.<\/p>\n\n\n\n As anyone with a background in AI\/ML knows, the long pole in the execution tent for a good idea is training data. To train a model to generate a flow from a prompt assumes that we have lots of flows with associated prompts to show the model. <\/p>\n\n\n\n We didn\u2019t. So we needed to be creative.<\/p>\n\n\n\n Our solution took shape in 2 main dimensions. First, we devised a way to generate synthetic data for model training. We had many production flow skeletons that had been scrubbed of Personal Identifiable Information (PII), and we found ways to generate descriptions (or labels) for them to simulate the prompts a user might have generated. We also used a method to generate Natural Language (NL) utterances-flow pairs that we knew to be empirically relevant based on historical patterns in our existing Microsoft Power Automate flow data.<\/p>\n\n\n\n A Power Automate flow is made up of a trigger that \u201cactivates\u201d the flow and steps that perform actions upon that trigger. For example:<\/p>\n\n\n\n We trained the first version of the model by using training data generated through manually generated prompts for the flows. We are using OpenAI Codex, which is the engine behind the GitHub Copilot tool which generates executable code from a natural language prompt. Because large language models lend themselves to new domains, we started achieving excellent results almost immediately.<\/p>\n\n\n\n The model works by pairing a workflow with a natural language description to use as training data. The model \u2013 which we refer to internally as NL2Flow \u2013 learns the correspondence between the language and the flow and is later able to generate a new flow in response to a natural language prompt. (Interestingly, we have learned that it is working in far more than English; there was intense interest among Japanese users immediately after Ignite; even though it\u2019s not specifically trained in Japanese, it works surprisingly often!) There are many working production flows available, but very few of them have a description we can use in model training and testing. <\/p>\n\n\n\n Generating synthetic data<\/u><\/em><\/p>\n\n\n\n We augmented the data we had by generating synthetic (meaning \u201cwe created ourselves”) natural language query-flow pairs.<\/p>\n\n\n\n Note that this is a reverse of the NL2Flow models. As a practical matter, this included fine tuning of a Codex model to generate new descriptions of an existing production flow, as well as inducing variation in the flow language by paraphrasing. The objective is not just a greater volume of training flows and descriptions, but also a broader selection of triggers and actions with which to generate flows. The team took two approaches:<\/p>\n\n\n\n Flow2NL<\/u><\/em><\/p>\n\n\n\n The first effort was to use NLG (Natural Language Generation) to generate NL descriptions from anonymized production flows. <\/p>\n\n\n\n The figure below indicates the process. We input flow code to a fine-tuned Codex model and generated multiple natural language descriptions of flow activity. For economy of effort, these descriptions were submitted to judges for human review; they selected the ones they thought most accurate. On the first pass, 92% of data samples (flows) processed with this approach had agreement of 2 or more judges on at least one NL utterance that the model output.<\/p>\n\n\n\n As an example, consider this flow:<\/p>\n\n\n\n Flow Code:<\/p>\n\n\n\n The Flow2NL model generates the following paraphrased utterances, all of which result in the generation of the above flow.<\/p>\n\n\n\n Training the model with samples generated this way increases the robustness of the model to language variations. The flow chart below shows the flow of training data generated from Flow2NL pipeline, which is then used to train the NL2Flow model.<\/p>\n\n\n\n Context Grammar<\/u><\/em><\/p>\n\n\n\n As shown in Table 1, the extra data from Flow2NL helped in our efforts to produce good flow descriptions, but not as much as we needed. To achieve more diversity in flow descriptions we used a process called \u201cContext Grammar\u201d to vary flow descriptions. We iterated over all possible functions (with their corresponding prompts) needed to \u201cconstruct\u201d a flow. We created a tool called DataGen, that generates these combinations given a config file that contains the following:<\/p>\n\n\n\n For example, consider the following config file describing the grammar structure to save attachments from an email. Please note that we only show iterations over one pattern (@SaveTo@) to keep it simple. The tool can expand multiple patterns recursively.<\/p>\n\n\n\n Code Pattern:<\/p>\n\n\n\n Corresponding NL Prompts describing the above code (Note: there are many ways to describe a given code):<\/p>\n\n\n\n Save email attachments to @0@<\/p>\n\n\n\n Store every email attachment I receive to @0@<\/p>\n\n\n\n Pull attachments from outlook to @0@<\/p>\n\n\n\n In the above NL-flow pair, the parameters enclosed in @ will be sampled from the list mentioned in Steps 2 & 3. The same config describes the function values that @SaveTo@ can take. The corresponding NL part will be used to replace all occurrences of @0@.<\/p>\n\n\n\n Sampling from known patterns allows us to generate data inexpensively while still preserving relevant triggers and actions from our users. We added additional samples for under-represented connectors.<\/p>\n\n\n\n Contextual Grammar enriched the training set for the NL2Flow model. See the results section for a detailed description of the impact of including both Flow2NL and Context Grammar.<\/p>\n\n\n\n Model Training & Results<\/u><\/em><\/p>\n\n\n\n Using the two approaches, we generated about 800 new training samples with Flow2NL, and about 3,000 new samples using the Context Grammar approach. We ensured the distribution of generated flows across topics was about the same as in the production samples.<\/p>\n\n\n\n We created a test set for tracking improvements across models trained on different iterations of the data. We computed a custom similarity metric for determining flow similarity between the predicted and the ground truth code. We do a fuzzy match to compute similarity by counting the number of correctly predicted API calls (includes triggers as well as actions) divided by the total number of predicted functions. For e.g. if the ground truth for a certain flow has 5 function calls; the model predicted 6 functions and 4 of those are correct, the similarity measure would be 4\/6 = 0.66.<\/p>\n\n\n\n As we can see above, both Flow2NL and Context Grammar give the best improvements over baseline when both are added to the train set. This shows how powerful it is to add synthetic samples to improve the model strategically to improve as and where needed.<\/p>\n\n\n\ntriggeroutputs = await shared_office365.OnNewEmailV3(); \/\/ Trigger Function<\/code><\/p>\n\n\n\n
outputs_forward_email = shared_office365.ForwardEmail_V2(\u2018message_id\u2019: triggeroutputs?[\u2018body\u2019]?[\u2018MessageId\u2019]) \/\/ Forward email function<\/code><\/p>\n\n\n\n
<\/figure>\n\n\n\n
triggeroutputs = await shared_office365.OnNewEmailV3(); \/\/ Trigger Function<\/code><\/p>\n\n\n\n
\/\/ For loop on email attachments<\/code><\/p>\n\n\n\n
for (items_foreach in triggeroutputs?[\u2018body\u2019]?[\u2018attachments\u2019])<\/code><\/p>\n\n\n\n
{<\/code><\/p>\n\n\n\n
\/\/Grammar pattern for set of possible functions allowed under this group<\/code><\/p>\n\n\n\n
@SaveTo@<\/code><\/p>\n\n\n\n
}<\/code><\/p>\n\n\n\n
Source<\/td> Relative improvement in Similarity Measure<\/td><\/tr> Baseline Model + Flow2NL<\/td> 3.2%<\/td><\/tr> Baseline + Context Grammar<\/td> 9.5%<\/td><\/tr> Base + Flow2NL + Context Grammar<\/td> 15.9%<\/td><\/tr><\/tbody><\/table>