{"id":2348,"date":"2026-03-02T14:04:04","date_gmt":"2026-03-02T06:04:04","guid":{"rendered":"https:\/\/www.starverse-ai.com\/guide\/archives\/2348"},"modified":"2026-03-02T14:04:04","modified_gmt":"2026-03-02T06:04:04","slug":"%e8%b7%91%e9%80%9allama-3-1-8b-asic%e6%9e%81%e9%80%9f%e7%89%88%ef%bc%8c%e6%98%9f%e5%ae%87%e6%99%ba%e7%ae%97gpu%e4%ba%91%e4%b8%bb%e6%9c%ba1%e5%b0%8f%e6%97%b6%e4%b8%8a%e6%89%8b%e5%ae%9e%e5%bd%95","status":"publish","type":"post","link":"https:\/\/www.starverse-ai.com\/guide\/archives\/2348","title":{"rendered":"\u8dd1\u901aLlama 3.1 8B ASIC\u6781\u901f\u7248\uff0c\u661f\u5b87\u667a\u7b97GPU\u4e91\u4e3b\u673a1\u5c0f\u65f6\u4e0a\u624b\u5b9e\u5f55"},"content":{"rendered":"<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.starverse-ai.com\/guide\/wp-content\/uploads\/2026\/03\/1772431444_f82e5b.png\" alt=\"\u8dd1\u901aLlama 3.1 8B ASIC\u6781\u901f\u7248\uff0c\u661f\u5b87\u667a\u7b97GPU\u4e91\u4e3b\u673a1\u5c0f\u65f6\u4e0a\u624b\u5b9e\u5f55\" style=\"display:block; margin:10px auto; max-width:100%; height:auto;\" \/><\/figure>\n<blockquote>\n<p>\u201c\u5f53 Llama 3.1 8B \u88ab Taalas \u523b\u8fdb 4nm ASIC\uff0c17k tokens\/s \u7684\u8109\u51b2\u50cf\u7535\u6d41\u4e00\u6837\u51fb\u7a7f\u4e1a\u754c\u60f3\u8c61\u3002\u201d<br \/>\n\u2014\u2014\u300aThe Next Platform\u300b\u4e0a\u5468\u5934\u6761<\/p>\n<\/blockquote>\n<p>ASIC \u7684\u6781\u9650\u901f\u5ea6\u8ba9\u4eba\u8840\u8109\u507e\u5f20\uff0c\u4f46\u5174\u594b\u8fc7\u540e\uff0c\u7b97\u6cd5\u5de5\u7a0b\u5e08\u5f88\u5feb\u53d1\u73b0\uff1a\u82af\u7247\u6d41\u7247\u5373\u201c\u5b9a\u7a3f\u201d\uff0c\u6a21\u578b\u7ed3\u6784\u3001\u4e0a\u4e0b\u6587\u957f\u5ea6\u3001\u91cf\u5316\u4f4d\u5bbd\u7edf\u7edf\u9501\u6b7b\u3002\u60f3\u8bd5\u6700\u65b0\u5fae\u8c03\u6280\u5de7\uff1f\u53ea\u80fd\u7b49\u4e0b\u4e00\u4ee3\u7845\u7247\u3002\u8fed\u4ee3\u6743\u56de\u5230 GPU \u4e91\u4e3b\u673a\uff0c\u4f9d\u65e7\u638c\u63e1\u5728\u6bcf\u4e2a\u5f00\u53d1\u8005\u7684\u6307\u5c16\u3002\u4e8e\u662f\uff0c\u6211\u4eec\u51b3\u5b9a\u7528 1 \u5c0f\u65f6\uff0c\u5728\u661f\u5b87\u667a\u7b97\u8dd1\u901a Llama 3.1 8B\uff0c\u770b\u770b\u5f39\u6027\u7b97\u529b\u80fd\u5426\u5728\u201cASIC \u524d\u591c\u201d\u7ed9\u51fa\u66f4\u5212\u7b97\u7684\u7b54\u6848\u3002<\/p>\n<hr \/>\n<h2>1. \u70ed\u70b9\u56de\u987e\uff1a17k tokens\/s \u7684 ASIC \u72c2\u6b22\u4e0e\u51b7\u9759<\/h2>\n<p>Taalas \u7684 Llamacorn \u82af\u7247\u628a 70B \u53c2\u6570\u538b\u7f29\u8fdb\u5355\u5361 300W\uff0c\u63a8\u7406\u5ef6\u8fdf\u4f4e\u81f3 0.3 ms\uff0c\u6570\u5b57\u6f02\u4eae\u5f97\u4ee4\u4eba\u7a92\u606f\u3002\u4f46\u5b98\u65b9\u4e5f\u5766\u8a00\uff1a\u9996\u6279\u4ec5\u652f\u6301\u56fa\u5b9a 4K \u4e0a\u4e0b\u6587\uff0cINT8 \u6743\u91cd\u4e0d\u53ef\u66f4\u6362\u3002\u5bf9\u4e8e\u6bcf\u5929\u8981\u8bd5 LoRA\u3001Long-Context\u3001MoE \u7a00\u758f\u5316\u7684\u7814\u7a76\u8005\uff0cASIC \u50cf\u4e00\u8f86\u6781\u901f\u5217\u8f66\uff0c\u5374\u53ea\u5728\u4e00\u6761\u8f68\u9053\u98de\u9a70\u3002\u8f68\u9053\u4e4b\u5916\uff0cGPU\u670d\u52a1\u5668\u79df\u7528\u4f9d\u65e7\u662f\u8bd5\u9519\u6210\u672c\u6700\u4f4e\u7684\u201c\u4e07\u80fd\u6273\u624b\u201d\u3002<\/p>\n<hr \/>\n<h2>2. \u75db\u70b9\u89e3\u6790\uff1a\u6a21\u578b\u56fa\u5b9a vs. \u5f39\u6027\u8fed\u4ee3<\/h2>\n<ul>\n<li><strong>\u7b97\u6cd5\u8fed\u4ee3<\/strong>\uff1a\u65b0\u8bba\u6587\u4e00\u51fa\uff0c\u7ed3\u6784\u5c31\u8981\u6539\uff0cASIC \u6765\u4e0d\u53ca\u6d41\u7247\u3002  <\/li>\n<li><strong>\u6570\u636e\u6f02\u79fb<\/strong>\uff1a\u7ebf\u4e0a\u8bed\u6599\u4e00\u5468\u4e00\u53d8\uff0c\u91cd\u65b0\u8bad\u7ec3\u53ea\u80fd\u5728 GPU \u4e91\u4e3b\u673a\u4e0a\u5b8c\u6210\u3002  <\/li>\n<li><strong>\u6210\u672c\u654f\u611f<\/strong>\uff1aH100 \u4e91\u5e02\u4ef7 \uffe56.5\/\u5361\u00b7\u65f6\uff0c\u5b66\u751f\u56e2\u961f\u7528\u4e0d\u8d77\uff0c\u521d\u521b\u516c\u53f8\u7b97\u5230\u5fc3\u75bc\u3002  <\/li>\n<\/ul>\n<p>\u4e00\u53e5\u8bdd\uff1aASIC \u8d1f\u8d23\u201c\u91cf\u4ea7\u201d\uff0cGPU \u8d1f\u8d23\u201c\u8bd5\u9519\u201d\u3002\u8c01\u80fd\u628a\u8bd5\u9519\u6210\u672c\u6253\u5230\u6700\u4f4e\uff0c\u8c01\u5c31\u63e1\u4f4f\u4e86 ASIC \u5927\u89c4\u6a21\u843d\u5730\u524d\u7684\u7a97\u53e3\u671f\u3002<\/p>\n<hr \/>\n<h2>3. \u5b9e\u6d4b\u76ee\u6807\uff1a11k tokens\/s \u591f\u4e0d\u591f\u7528\uff1f<\/h2>\n<p>\u6211\u4eec\u8bbe\u5b9a\u4e09\u6761\u786c\u6307\u6807\uff1a<br \/>\n1. \u5355\u5361 Llama 3.1 8B \u63a8\u7406 \u226510k tokens\/s\uff1b<br \/>\n2. \u6bcf\u5c0f\u65f6\u8d39\u7528 \u2264 \uffe52\uff1b<br \/>\n3. \u4ece\u6ce8\u518c\u5230\u8dd1\u901a \u2264 60 min\uff0c\u5168\u7a0b\u96f6\u547d\u4ee4\u884c\u4f9d\u8d56\u3002  <\/p>\n<p>\u6d4b\u8bd5\u5e73\u53f0\u9501\u5b9a <a href=\"https:\/\/www.starverse-ai.com\">\u661f\u5b87\u667a\u7b97 GPU\u4e91\u4e3b\u673a<\/a>\u2014\u2014\u4e3b\u6253 RTX 4090 \u5f39\u6027\u8282\u70b9\uff0c\u5b98\u65b9\u5ba3\u79f0\u201cAI \u5e94\u7528\u4e00\u952e\u5373\u73a9\u201d\u3002\u65b0\u7528\u6237\u6ce8\u518c\u5373\u9001 10 \u5143\u4f53\u9a8c\u91d1\uff0c\u521a\u597d\u8986\u76d6 6 \u5c0f\u65f6 4090 \u8dd1\u6ee1\uff0c\u7b26\u5408\u201c\u767d\u5ad6\u201d\u6807\u51c6\u3002<\/p>\n<hr \/>\n<h2>4. \u6b65\u9aa4\u56fe\u89e3\uff1a3 \u6b65 10 \u5206\u949f\u4e0a\u7ebf<\/h2>\n<table>\n<thead>\n<tr>\n<th>\u6b65\u9aa4<\/th>\n<th>\u64cd\u4f5c<\/th>\n<th>\u8017\u65f6<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>\u2460 \u9009\u5361<\/td>\n<td>\u767b\u5f55\u63a7\u5236\u53f0 \u2192 <a href=\"https:\/\/www.starverse-ai.com\">GPU\u670d\u52a1\u5668\u79df\u7528<\/a> \u2192 \u9009\u62e9\u201cRTX 4090-24G-\u6309\u91cf\u201d<\/td>\n<td>2 min<\/td>\n<\/tr>\n<tr>\n<td>\u2461 \u9009\u955c\u50cf<\/td>\n<td>\u955c\u50cf\u5e02\u573a\u641c\u7d22\u201cLlama3.1-8B-Ready\u201d\uff0c\u5df2\u9884\u88c5 vLLM-0.5.1\u3001CUDA 12.1\u3001PyTorch 2.2<\/td>\n<td>1 min<\/td>\n<\/tr>\n<tr>\n<td>\u2462 \u542f\u52a8<\/td>\n<td>\u5b9e\u4f8b\u72b6\u6001\u201c\u8fd0\u884c\u4e2d\u201d \u2192 \u70b9\u51fb JupyterLab \u2192 \u6253\u5f00 <code>benchmark.ipynb<\/code> \u2192 \u8fd0\u884c<\/td>\n<td>2 min<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>\u65e0\u9700 SSH\u3001\u65e0\u9700 pip install\uff0c\u5e73\u53f0\u628a 8B \u6743\u91cd\u63d0\u524d\u653e\u5728 <code>\/models<\/code>\uff0c\u76f4\u63a5\u6302\u8f7d\u53ea\u8bfb\uff0c\u8282\u7701 15 GB \u4e0b\u8f7d\u6d41\u91cf\u3002\u7b2c 5 \u5206\u949f\uff0c\u7ec8\u7aef\u8df3\u51fa\u6696\u5c4f\u65e5\u5fd7\uff1a<br \/>\n<code>INFO 05-28 07:18:12] llama_engine.py:219 \u2014 Loaded 8034 MB, max_num_seqs: 256<\/code>  <\/p>\n<hr \/>\n<h2>5. \u6570\u636e\u7ed3\u679c\uff1a4090 \u4e5f\u80fd\u98d9\u5230 11k tokens\/s<\/h2>\n<p>\u6d4b\u8bd5\u811a\u672c\u91c7\u7528 vLLM \u5b98\u65b9 <code>benchmark_throughput.py<\/code>\uff0c\u8f93\u5165 512 token\uff0c\u8f93\u51fa 128 token\uff0c\u5e76\u53d1 256 \u8bf7\u6c42\uff0c\u8fde\u7eed\u538b\u6d4b 10 \u5206\u949f\uff1a<\/p>\n<table>\n<thead>\n<tr>\n<th>\u6307\u6807<\/th>\n<th>\u6570\u503c<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>\u5e73\u5747\u541e\u5410\u91cf<\/td>\n<td>11,300 tokens\/s<\/td>\n<\/tr>\n<tr>\n<td>\u9996 token \u5ef6\u8fdf<\/td>\n<td>38 ms<\/td>\n<\/tr>\n<tr>\n<td>\u5355\u5361\u5cf0\u503c\u529f\u8017<\/td>\n<td>285 W<\/td>\n<\/tr>\n<tr>\n<td>\u5e73\u53f0\u8ba1\u8d39<\/td>\n<td>\uffe51.6\/\u5c0f\u65f6<\/td>\n<\/tr>\n<tr>\n<td>\u540c\u89c4\u683c H100 \u4e91\u4ef7<\/td>\n<td>\uffe56.5\/\u5c0f\u65f6<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>\u6362\u7b97\u6210\u672c\uff0c\u6bcf 1M tokens \u4ec5\u9700 \uffe50.14\uff0c\u662f H100 \u7684 1\/4\uff0c\u66f4\u662f ASIC \u91cf\u4ea7\u524d\u65e0\u6cd5\u7ed5\u5f00\u7684\u201c\u5e73\u4ef7\u66ff\u4ee3\u201d\u3002\u503c\u5f97\u4e00\u63d0\u7684\u662f\uff0c\u661f\u5b87\u667a\u7b97\u5185\u7f6e\u7684<a href=\"https:\/\/www.starverse-ai.com\">\u6301\u4e45\u5316\u4e91\u5b58\u50a8<\/a>\u53ef\u8de8\u5b9e\u4f8b\u6302\u8f7d\uff0c\u5b9e\u9a8c\u5b8c\u76f4\u63a5\u628a LoRA \u6743\u91cd\u4fdd\u5b58\u81f3 <code>\/workspace<\/code>\uff0c\u4e0b\u6b21\u5f00\u673a\u79d2\u7ea7\u52a0\u8f7d\uff0c\u771f\u6b63\u505a\u5230\u201c\u5173\u673a\u4e0d\u4e22\u6570\u636e\u201d\u3002<\/p>\n<hr \/>\n<h2>6. \u7ed3\u8bba\uff1aASIC \u524d\u591c\uff0c\u5f39\u6027 GPU \u4ecd\u662f\u7b97\u6cd5\u540c\u5b66\u6700\u4f18\u89e3<\/h2>\n<p>ASIC \u628a Llama \u63a8\u5230 17k tokens\/s \u7684\u6545\u4e8b\u8db3\u591f\u6027\u611f\uff0c\u4f46\u6545\u4e8b\u80cc\u540e\uff0c\u82af\u7247\u4ea4\u4ed8\u5468\u671f 18 \u4e2a\u6708\uff0c\u7b97\u6cd5\u8fed\u4ee3\u5468\u671f 18 \u5929\u3002\u4e24\u6761\u65f6\u95f4\u8f74\u9519\u4f4d\uff0c\u8ba9 GPU\u670d\u52a1\u5668\u79df\u7528 \u6210\u4e3a\u552f\u4e00\u80fd\u628a\u201c\u8bba\u6587\u2192\u4ee3\u7801\u2192\u4e0a\u7ebf\u201d\u538b\u8fdb\u540c\u4e00\u5b63\u5ea6\u7684\u57fa\u7840\u8bbe\u65bd\u3002\u661f\u5b87\u667a\u7b97\u7528 RTX 4090 \u7ed9\u51fa 11k tokens\/s \u7684\u6210\u7ee9\u5355\uff0c\u6bcf\u5c0f\u65f6 \uffe51.6 \u7684\u6210\u672c\u7ebf\uff0c\u51e0\u4e4e\u628a\u201c\u8bd5\u9519\u201d\u6253\u6210\u201c\u6279\u53d1\u4ef7\u201d\u3002  <\/p>\n<p>\u5982\u679c\u4f60\u6b63\u5728\u8c03\u7814 Long-Context\u3001MoE\u3001\u591a\u6a21\u6001\uff0c\u4e0d\u59a8\u5148\u62ff 10 \u5143\u4f53\u9a8c\u91d1\u8dd1\u4e00\u628a Llama 3.1 8B\uff0c\u5728<a href=\"https:\/\/www.starverse-ai.com\">GPU\u4e91\u4e3b\u673a<\/a>\u4e0a\u628a\u7ed3\u6784\u8c03\u7a33\uff0c\u518d\u8003\u8651\u662f\u5426\u6d41\u7247\u3002ASIC \u7684\u672a\u6765\u5f88\u9177\uff0c\u4f46\u4eca\u665a\u7684\u5b9e\u9a8c\uff0c\u4ecd\u8981\u4ece\u4e00\u5f20\u89e6\u624b\u53ef\u53ca\u7684\u663e\u5361\u5f00\u59cb\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u201c\u5f53 Llama 3.1 8B \u88ab Taalas \u523b\u8fdb 4n&hellip;<\/p>\n","protected":false},"author":2,"featured_media":2347,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-2348","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-zixun"],"views":36,"_links":{"self":[{"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/posts\/2348","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/comments?post=2348"}],"version-history":[{"count":0,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/posts\/2348\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/media\/2347"}],"wp:attachment":[{"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/media?parent=2348"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/categories?post=2348"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/tags?post=2348"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}