{"id":1078164,"date":"2024-08-20T19:16:00","date_gmt":"2024-08-21T02:16:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-blog-post&p=1078164"},"modified":"2024-09-25T04:25:22","modified_gmt":"2024-09-25T11:25:22","slug":"low-bit-quantization","status":"publish","type":"msr-blog-post","link":"https:\/\/www.microsoft.com\/en-us\/research\/articles\/low-bit-quantization\/","title":{"rendered":"\u5fae\u8f6f\u4e9a\u6d32\u7814\u7a76\u9662\u591a\u9879\u521b\u65b0\u6280\u672f\uff0c\u5f25\u5408\u5927\u6a21\u578b\u4f4e\u6bd4\u7279\u91cf\u5316\u4e0e\u7ec8\u7aef\u90e8\u7f72\u95f4\u9e3f\u6c9f"},"content":{"rendered":"\n

\u7f16\u8005\u6309\uff1a\u5728\u4eba\u5de5\u667a\u80fd\u9886\u57df\uff0c\u6a21\u578b\u53c2\u6570\u7684\u589e\u591a\u5f80\u5f80\u610f\u5473\u7740\u6027\u80fd\u7684\u63d0\u5347\u3002\u4f46\u968f\u7740\u6a21\u578b\u89c4\u6a21\u7684\u6269\u5927\uff0c\u5176\u5bf9\u7ec8\u7aef\u8bbe\u5907\u7684\u7b97\u529b\u4e0e\u5185\u5b58\u9700\u6c42\u4e5f\u65e5\u76ca\u589e\u52a0\u3002\u4f4e\u6bd4\u7279\u91cf\u5316\u6280\u672f\uff0c\u7531\u4e8e\u53ef\u4ee5\u5927\u5e45\u964d\u4f4e\u5b58\u50a8\u548c\u8ba1\u7b97\u6210\u672c\u5e76\u63d0\u5347\u63a8\u7406\u6548\u7387\uff0c\u5df2\u6210\u4e3a\u5b9e\u73b0\u5927\u6a21\u578b\u5728\u8d44\u6e90\u53d7\u9650\u8bbe\u5907\u4e0a\u9ad8\u6548\u8fd0\u884c\u7684\u5173\u952e\u6280\u672f\u4e4b\u4e00\u3002\u7136\u800c\uff0c\u5982\u679c\u786c\u4ef6\u8bbe\u5907\u4e0d\u652f\u6301\u4f4e\u6bd4\u7279\u91cf\u5316\u540e\u7684\u6570\u636e\u6a21\u5f0f\uff0c\u90a3\u4e48\u4f4e\u6bd4\u7279\u91cf\u5316\u7684\u4f18\u52bf\u5c06\u65e0\u6cd5\u53d1\u6325\u3002<\/p>\n\n\n\n

\u4e3a\u4e86\u89e3\u51b3\u8fd9\u4e00\u95ee\u9898\uff0c\u5fae\u8f6f\u4e9a\u6d32\u7814\u7a76\u9662\u63a8\u51fa\u4e86\u5168\u65b0\u7684\u6570\u636e\u7f16\u8bd1\u5668 Ladder \u548c\u7b97\u6cd5 T-MAC\uff0c\u4f7f\u5f53\u524d\u53ea\u652f\u6301\u5bf9\u79f0\u7cbe\u5ea6\u8ba1\u7b97\u7684\u786c\u4ef6\u80fd\u591f\u76f4\u63a5\u8fd0\u884c\u6df7\u5408\u7cbe\u5ea6\u77e9\u9635\u4e58\u6cd5\u3002\u6d4b\u8bd5\u7ed3\u679c\u8868\u660e\uff0cLadder \u5728\u652f\u6301 GPU \u539f\u672c\u4e0d\u652f\u6301\u7684\u81ea\u5b9a\u4e49\u6570\u636e\u7c7b\u578b\u65b9\u9762\uff0c\u6700\u9ad8\u63d0\u901f\u53ef\u8fbe14.6\u500d\uff1bT-MAC \u5728\u642d\u8f7d\u4e86\u6700\u65b0\u9ad8\u901a Snapdragon X Elite \u82af\u7247\u7ec4\u7684 Surface AI PC \u4e0a\uff0c\u4f7f CPU \u4e0a\u8fd0\u884c\u7684\u5927\u6a21\u578b\u541e\u5410\u7387\u6bd4\u4e13\u7528\u52a0\u901f\u5668 NPU \u5feb\u4e24\u500d\u3002\u6b64\u5916\uff0c\u7814\u7a76\u5458\u4eec\u8fd8\u8bbe\u8ba1\u4e86 LUT Tensor Core \u786c\u4ef6\u67b6\u6784\uff0c\u8fd9\u79cd\u7cbe\u7b80\u8bbe\u8ba1\u4f7f\u786c\u4ef6\u80fd\u591f\u76f4\u63a5\u652f\u6301\u5404\u79cd\u4f4e\u6bd4\u7279\u6df7\u5408\u7cbe\u5ea6\u8ba1\u7b97\uff0c\u4e3a\u4eba\u5de5\u667a\u80fd\u786c\u4ef6\u8bbe\u8ba1\u63d0\u4f9b\u4e86\u65b0\u601d\u8def\u3002<\/p>\n\n\n\n


\n\n\n\n

\u5927\u6a21\u578b\u5df2\u7ecf\u8d8a\u6765\u8d8a\u591a\u5730\u88ab\u90e8\u7f72\u5728\u667a\u80fd\u624b\u673a\u3001\u7b14\u8bb0\u672c\u7535\u8111\u3001\u673a\u5668\u4eba\u7b49\u7aef\u4fa7\u8bbe\u5907\u4e0a\uff0c\u4ee5\u63d0\u4f9b\u5148\u8fdb\u7684\u667a\u80fd\u53ca\u5b9e\u65f6\u54cd\u5e94\u670d\u52a1\u3002\u4f46\u5305\u542b\u4e0a\u4ebf\u53c2\u6570\u7684\u5927\u6a21\u578b\u5bf9\u7ec8\u7aef\u8bbe\u5907\u7684\u5185\u5b58\u548c\u8ba1\u7b97\u80fd\u529b\u63d0\u51fa\u4e86\u6781\u9ad8\u7684\u8981\u6c42\uff0c\u4e5f\u56e0\u6b64\u9650\u5236\u4e86\u5b83\u4eec\u7684\u5e7f\u6cdb\u5e94\u7528\u3002\u4f4e\u6bd4\u7279\u91cf\u5316\u6280\u672f\u56e0\u5176\u80fd\u663e\u8457\u538b\u7f29\u6a21\u578b\u89c4\u6a21\uff0c\u964d\u4f4e\u5bf9\u8ba1\u7b97\u8d44\u6e90\u7684\u9700\u6c42\uff0c\u6210\u4e3a\u4e86\u5927\u6a21\u578b\u5728\u7aef\u4fa7\u90e8\u7f72\u548c\u5b9e\u73b0\u9ad8\u6548\u63a8\u7406\u7684\u6709\u6548\u624b\u6bb5\u3002<\/p>\n\n\n\n

\u968f\u7740\u4f4e\u6bd4\u7279\u91cf\u5316\u6280\u672f\u7684\u53d1\u5c55\uff0c\u6570\u636e\u7c7b\u578b\u65e5\u76ca\u591a\u6837\u5316\uff0c\u5982 int4\u3001int2\u3001int1 \u7b49\u4f4e\u6bd4\u7279\u6570\u636e\uff0c\u4f7f\u5f97\u5927\u6a21\u578b\u5728\u63a8\u7406\u4e2d\u8d8a\u6765\u8d8a\u591a\u5730\u91c7\u7528\u4f4e\u6bd4\u7279\u6743\u91cd\u548c\u9ad8\u6bd4\u7279\u6743\u91cd\u8ba1\u7b97\u7684\u6df7\u5408\u7cbe\u5ea6\u77e9\u9635\u4e58\u6cd5\uff08mixed-precision matrix multiplication\uff0cmpGEMM\uff09\u3002\u7136\u800c\uff0c\u73b0\u6709\u7684 CPU\u3001GPU \u7b49\u786c\u4ef6\u8ba1\u7b97\u5355\u5143\u901a\u5e38\u53ea\u652f\u6301\u5bf9\u79f0\u8ba1\u7b97\u6a21\u5f0f\uff0c\u5e76\u4e0d\u517c\u5bb9\u8fd9\u79cd\u6df7\u5408\u7cbe\u5ea6\u7684\u77e9\u9635\u4e58\u6cd5\u3002<\/p>\n\n\n\n

\u6df7\u5408\u7cbe\u5ea6\u77e9\u9635\u4e58\u6cd5\u4e0e\u4f20\u7edf\u7684\u77e9\u9635\u4e58\u6cd5\u6709\u4f55\u4e0d\u540c\uff1f<\/p>\n\n\n\n

\u5728\u4f20\u7edf\u7684\u77e9\u9635\u4e58\u6cd5\u4e2d\uff0c\u53c2\u4e0e\u8fd0\u7b97\u7684\u4e24\u7aef\u6570\u503c\u662f\u5bf9\u79f0\u7684\uff0c\u4f8b\u5982 FP16*FP16\u3001int8*int8\u3002\u4f46\u5927\u6a21\u578b\u7684\u4f4e\u6bd4\u7279\u91cf\u5316\u6253\u7834\u4e86\u8fd9\u79cd\u5bf9\u79f0\u6027\uff0c\u4f7f\u4e58\u6cd5\u7684\u4e00\u7aef\u662f\u9ad8\u6bd4\u7279\uff0c\u53e6\u4e00\u7aef\u662f\u4f4e\u6bd4\u7279\uff0c\u4f8b\u5982\u5728 1-bit \u7684 BitNet (opens in new tab)<\/span><\/a> \u6a21\u578b\u4e2d\u5b9e\u73b0\u7684 int8*int1 \u6216 int8*int2\uff0c\u4ee5\u53ca\u6d6e\u70b9\u6570\u4e0e\u6574\u6570\u7684\u6df7\u5408\u4e58\u6cd5 FP16*int4\u3002<\/p>\n\n\n\n

\u4e3a\u4e86\u5145\u5206\u53d1\u6325\u4f4e\u6bd4\u7279\u91cf\u5316\u7684\u4f18\u52bf\uff0c\u8ba9\u786c\u4ef6\u8bbe\u5907\u80fd\u591f\u76f4\u63a5\u652f\u6301\u6df7\u5408\u7cbe\u5ea6\u77e9\u9635\u4e58\u6cd5\uff0c\u786e\u4fdd\u5927\u6a21\u578b\u5728\u7aef\u4fa7\u8bbe\u5907\u4e0a\u7684\u9ad8\u901f\u6709\u6548\u8fd0\u884c\uff0c\u5fae\u8f6f\u4e9a\u6d32\u7814\u7a76\u9662\u7684\u7814\u7a76\u5458\u4eec\u9488\u5bf9\u73b0\u6709 CPU\u3001GPU \u8ba1\u7b97\u7b97\u5b50\u548c\u786c\u4ef6\u67b6\u6784\u8fdb\u884c\u521b\u65b0\uff1a<\/p>\n\n\n\n