{"id":1374,"date":"2023-06-23T14:06:15","date_gmt":"2023-06-23T14:06:15","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/startups\/blog\/?p=1374"},"modified":"2024-11-04T08:25:46","modified_gmt":"2024-11-04T16:25:46","slug":"simplifying-the-journey-of-ai-startups-through-experimentation-and-scaling","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/startups\/blog\/simplifying-the-journey-of-ai-startups-through-experimentation-and-scaling\/","title":{"rendered":"Simplifying the journey of AI startups through experimentation and scaling"},"content":{"rendered":"\n

This is part three of our three-part AI-Core Insights series. <\/em>Click here for part one<\/a>, \u201cFoundation models: To open-source or not to open-source?\u201d, and <\/em>here for part two<\/a>, \u201cDiscovering holistic infrastructure strategies for compute-intensive startups.\u201d<\/em><\/p>\n\n\n\n

On the road of LLM-driven use cases, startups are leading the way. The road can be bumpy, with hiccups in GPU allocation, allocated capacity availability, API rate limits, and more. Then there are the innumerable priorities of an LLM pipeline that need to be timed for different stages of your product build.<\/p>\n\n\n\n

In this final part of our AI Core Insights series, we\u2019ll summarize a few decisions you need to consider at various stages to make your journey easier.<\/p>\n\n\n\n

Experimenting with models<\/h2>\n\n\n\n

At the experimentation stage, you\u2019re first testing and comparing several models, both open- and closed-source. For OpenAI APIs, Microsoft for Startups<\/a> provides access to OpenAI credits worth $2,500 which can provide rapid availability of APIs for experimentation.<\/p>\n\n\n\n

A simple model catalog<\/a> can be a great way to experiment with several models with simple pipelines<\/a> and find out the best performant model for the use cases. The refreshed AzureML model catalog enlists best models from HuggingFace, as well as the few selected by Azure.<\/p>\n\n\n\n

The compute targets for this stage can be either a CPU or a GPU, with no major need of a super-performant system for scale. The GPUs can encompass V100s, A100s or RTX GPUs. For inference, the most widely used SKU is A10s and V100s, while A100s are also used in some cases. It is important to pursue alternatives to ensure scale in access, with multiple dependent variables like region availability and quota availability.<\/p>\n\n\n\n

Considerations after choosing a model <\/h2>\n\n\n\n

After completing experimentation, you\u2019ve centralized upon a use case and the right model configuration to go with it. The model configuration, however, is usually a set of models instead of just one. Here are a few considerations to keep in mind:<\/p>\n\n\n\n