{"id":983583,"date":"2023-11-17T11:07:15","date_gmt":"2023-11-17T19:07:15","guid":{"rendered":""},"modified":"2024-01-17T12:20:28","modified_gmt":"2024-01-17T20:20:28","slug":"skeleton-of-thought-parallel-decoding-speeds-up-and-improves-llm-output","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/skeleton-of-thought-parallel-decoding-speeds-up-and-improves-llm-output\/","title":{"rendered":"Skeleton-of-Thought: Parallel decoding speeds up and improves LLM output"},"content":{"rendered":"\n
\"A<\/figure>\n\n\n\n

This research was accepted by the 2024 International Conference on Learning Representations.<\/em><\/strong><\/p>\n\n\n\n

Large language models (LLMs) such as LLaMA and OpenAI\u2019s GPT-4 are revolutionizing technology. However, one of the common complaints about LLMs is their speed, <\/em>or lack thereof. In many cases, it takes a long time to get an answer from them. This limits LLMs\u2019 applications and their usefulness in latency-critical functions, such as chatbots, copilots, and industrial controllers.<\/p>\n\n\n\n

\n\t