{"id":1004529,"date":"2024-02-07T14:00:00","date_gmt":"2024-02-07T22:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=1004529"},"modified":"2024-02-07T12:29:53","modified_gmt":"2024-02-07T20:29:53","slug":"ai-controller-interface-generative-ai-with-a-lightweight-llm-integrated-vm","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/ai-controller-interface-generative-ai-with-a-lightweight-llm-integrated-vm\/","title":{"rendered":"AI Controller Interface: Generative AI with a lightweight, LLM-integrated VM"},"content":{"rendered":"\n
\"This<\/figure>\n\n\n\n

The emergence of large language models (LLMs) has revolutionized the way people create text and interact with computing. However, these models are limited in ensuring the accuracy of the content they generate and enforcing strict compliance with specific formats, such as JSON and other computer programming languages. Additionally, LLMs that process information from multiple sources face notable challenges in preserving confidentiality and security. In sectors like healthcare, finance, and science, where information confidentiality and reliability are critical, the success of LLMs relies heavily on meeting strict privacy and accuracy standards. Current strategies to address these issues, such as constrained decoding and agent-based approaches, pose practical challenges, including significant performance costs or the need for direct model integration, which is difficult.<\/p>\n\n\n\n

The AI Controller Interface and program<\/h2>\n\n\n\n

To make these approaches more feasible, we created the AI Controller Interface (AICI). The AICI goes beyond the standard \u201ctext-in\/text-out\u201d API for cloud-based tools with a \u201cprompt-as-program\u201d interface. It’s designed to allow user-level code to integrate with LLM output generation seamlessly in the cloud. It also provides support for existing security frameworks, application-specific functionalities, fast experimentation, and various strategies for improving accuracy, privacy, and adherence to specific formats. By providing granular-level access to the generative AI infrastructure, AICI allows for customized control over LLM processing, whether it’s run locally or in the cloud.<\/p>\n\n\n\n

A lightweight virtual machine (VM), the AI Controller, sits atop this interface. AICI conceals the LLM processing engine\u2019s specific implementation, providing the right mechanisms to enable developers and researchers to agilely and efficiently work with the LLM, allowing them to more easily develop and experiment. With features that allow for adjustments in decision-making processes, efficient memory use, handling multiple requests at once, and coordinating tasks simultaneously, users can finely tune the output, controlling it step by step.<\/p>\n\n\n\n

An individual user, tenant, or platform can develop the AI Controller program using a customizable interface designed for specific applications or prompt-completion tasks. The AICI is designed for the AI Controller to run on the CPU in parallel with model processing on the GPU, enabling advanced control over LLM behavior without impacting its performance. Additionally, multiple AI Controllers can run simultaneously. Figure 1 illustrates the AI Controller architecture.<\/p>\n\n\n\n

\"This
Figure 1. Applications send instructions to an AI Controller, which provides a high-level API. The AICI allows the Controller to execute efficiently in the cloud in parallel with model inference.<\/figcaption><\/figure>\n\n\n\n

AI Controllers are implemented as WebAssembly VMs, most easily written as Rust programs. However, they can also be written in any language that can be compiled into or interpreted as WebAssembly. We have already developed several sample AI Controllers, available as open source (opens in new tab)<\/span><\/a>. These features provide built-in tools for controlled text creation, allowing for on-the-fly changes to initial instructions and the resulting text. They also enable efficient management of tasks that involve multiple stages or batch processing.<\/p>\n\n\n\n

High-level execution flow<\/h2>\n\n\n\n

Let\u2019s take an example to illustrate how the AI Controller impacts the output of LLMs. Suppose a user requests the completion of a task, such as solving a mathematical equation, with the expectation of receiving a numeric answer. The following program ensures the the LLM’s response is numeric. <\/em>The process unfolds as follows:<\/p>\n\n\n\n

1. Setup.<\/strong> The user or platform owner first sets up the AICI-enabled LLM engine and then deploys the provided AI Controller, DeclCtrl<\/code>, to the cloud via a REST API.<\/p>\n\n\n\n

2. Request.<\/strong> The user initiates LLM inference with a REST request specifying the AI Controller (DeclCtrl<\/code>), and a JSON-formatted declarative program, such as the following example. <\/p>\n\n\n\n

{\"steps\": [
    {\"Fixed\":{\"text\":\"Please tell me what is 122.3*140.4?\"}},
    {\"Gen\": {\"rx\":\" ^(([1-9][0-9]*)|(([0-9]*)\\.([0-9]*)))$\"}}
]}<\/code><\/pre>\n\n\n\n

Once the server receives this request, it creates an instance of the requested DeclCtrl<\/code> AI Controller and passes the declarative program into it. The AI Controller parses its input, initializes its internal state, and LLM inference begins.<\/p>\n\n\n\n

3. Token generation.<\/strong> The server generates tokens sequentially, with the AICI making calls to the DeclCtrl<\/code> AI Controller before, during, and after each token generation.<\/p>\n\n\n\n