{"id":2320,"date":"2026-03-02T10:04:02","date_gmt":"2026-03-02T02:04:02","guid":{"rendered":"https:\/\/www.starverse-ai.com\/guide\/archives\/2320"},"modified":"2026-03-02T10:04:02","modified_gmt":"2026-03-02T02:04:02","slug":"%e8%b7%91%e9%80%9allama-3-1-8b%e6%9c%80%e6%96%b0%e5%bc%80%e6%ba%90%e5%a4%a7%e6%a8%a1%e5%9e%8b%ef%bc%8c%e6%98%9f%e5%ae%87%e6%99%ba%e7%ae%97%e5%b9%b3%e5%8f%b01%e5%b0%8f%e6%97%b60-6%e5%85%83%e6%90%9e","status":"publish","type":"post","link":"https:\/\/www.starverse-ai.com\/guide\/archives\/2320","title":{"rendered":"\u8dd1\u901aLlama 3.1 8B\u6700\u65b0\u5f00\u6e90\u5927\u6a21\u578b\uff0c\u661f\u5b87\u667a\u7b97\u5e73\u53f01\u5c0f\u65f60.6\u5143\u641e\u5b9a17k tokens\/\u79d2\u63a8\u7406"},"content":{"rendered":"<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.starverse-ai.com\/guide\/wp-content\/uploads\/2026\/03\/1772417042_ed3b9d.png\" alt=\"\u8dd1\u901aLlama 3.1 8B\u6700\u65b0\u5f00\u6e90\u5927\u6a21\u578b\uff0c\u661f\u5b87\u667a\u7b97\u5e73\u53f01\u5c0f\u65f60.6\u5143\u641e\u5b9a17k tokens\/\u79d2\u63a8\u7406\" style=\"display:block; margin:10px auto; max-width:100%; height:auto;\" \/><\/figure>\n<blockquote>\n<p>\u201c\u628a Llama 3.1 8B \u523b\u8fdb ASIC\uff0c\u63a8\u7406\u901f\u5ea6 500 tokens\/s\uff01\u201d<br \/>\n\u4e0a\u5468\uff0cTaalas \u7684\u6d41\u7247\u65b0\u95fb\u5237\u5c4f\uff0c\u4f46\u51b7\u9759\u4e0b\u6765\u4f60\u4f1a\u53d1\u73b0\uff1a\u82af\u7247\u56fa\u5316\u7684\u662f\u6743\u91cd\uff0c\u4e0d\u662f\u521b\u610f\u3002\u6a21\u578b\u4e00\u65e6\u5b9a\u578b\uff0c\u5fae\u8c03\u3001\u5bf9\u9f50\u3001\u63d2\u4ef6\u5316\u90fd\u6210\u4e86\u5962\u671b\u3002\u5bf9 AIGC \u521b\u4f5c\u8005\u548c\u72ec\u7acb\u5f00\u53d1\u8005\u800c\u8a00\uff0c\u771f\u6b63\u6027\u611f\u7684\u4e0d\u662f\u201c\u6b7b\u201d\u7684\u82af\u7247\uff0c\u800c\u662f\u80fd\u968f\u65f6\u6362\u6a21\u578b\u3001\u8c03\u53c2\u6570\u3001\u8dd1\u5b9e\u9a8c\u7684\u201c\u6d3b\u201d\u7684\u7b97\u529b\u2014\u2014\u6700\u597d\u8fd8\u4fbf\u5b9c\u5230\u53ef\u4ee5\u95ed\u773c\u5f00\u673a\u5668\u3002  <\/p>\n<\/blockquote>\n<p>\u4eca\u5929\uff0c\u6211\u4eec\u5c31\u7528\u4e00\u5f20 <a href=\"https:\/\/www.starverse-ai.com\"><strong>GPU\u670d\u52a1\u5668\u79df\u7528<\/strong><\/a> \u8d26\u5355\u544a\u8bc9\u4f60\uff1a\u5728\u661f\u5b87\u667a\u7b97\uff0c1 \u5c0f\u65f6 0.6 \u5143\u5373\u53ef\u628a Llama 3.1 8B \u63a8\u5230 <strong>17k tokens\/s<\/strong>\uff0c\u800c\u4e14\u5168\u7a0b\u53ea\u8981 10 \u884c\u547d\u4ee4\uff0c5 \u5206\u949f\u51fa\u7ed3\u679c\uff0c\u65e0\u9700\u6392\u961f\u3001\u65e0\u9700\u91c7\u8d2d\u3001\u65e0\u9700\u8fd0\u7ef4\u3002  <\/p>\n<hr \/>\n<h2>1. Taalas \u7684 ASIC \u5237\u5c4f\u4e4b\u540e\uff0c\u4e3a\u4ec0\u4e48\u6211\u4eec\u8fd8\u8981 GPU\uff1f<\/h2>\n<p>ASIC \u628a 70B \u53c2\u6570\u5199\u8fdb\u7845\u7247\uff0c\u529f\u8017\u4f4e\u5230 50W\uff0c\u4f46\u5b83\u89e3\u51b3\u7684\u662f\u201c\u5355\u4e00\u6a21\u578b\u3001\u56fa\u5b9a\u7cbe\u5ea6\u3001\u5927\u89c4\u6a21\u90e8\u7f72\u201d\u7684\u573a\u666f\u3002<br \/>\n\u800c\u771f\u5b9e\u4e16\u754c\u7684\u521b\u4f5c\u6d41\u7a0b\u91cc\uff0c\u4eca\u5929\u4f60\u8981\u7ed9\u5c0f\u8bf4\u89d2\u8272\u6362\u8bed\u6c14\uff0c\u660e\u5929\u8981\u7ed9\u5ba2\u670d Bot \u52a0\u63d2\u4ef6\uff0c\u540e\u5929\u8fd8\u8981\u8bd5\u6700\u65b0\u7684\u591a\u6a21\u6001 checkpoint\u2014\u2014\u6bcf\u4e00\u6b21\u6539\u52a8\u90fd\u610f\u5473\u7740\u91cd\u65b0\u6d41\u7247\uff1f\u65f6\u95f4\u548c\u6210\u672c\u90fd\u4e0d\u53ef\u60f3\u8c61\u3002  <\/p>\n<p>GPU \u4e91\u4e3b\u673a\u7684\u53ef\u7f16\u7a0b\u6027\u3001\u53ef\u6269\u5c55\u6027\u3001\u53ef\u8fc1\u79fb\u6027\uff0c\u4f9d\u65e7\u662f\u7b97\u6cd5\u8fed\u4ee3\u671f\u6700\u7ecf\u6d4e\u7684\u89e3\u6cd5\u3002\u5173\u952e\u662f\uff0c<strong>\u6210\u672c\u5f97\u6253\u4e0b\u6765<\/strong>\u3002  <\/p>\n<hr \/>\n<h2>2. \u5b9e\u6218\uff1a10 \u884c\u547d\u4ee4\uff0c17k tokens\/s \u63a8\u7406<\/h2>\n<p>\u6211\u4eec\u5728\u661f\u5b87\u667a\u7b97\u79df\u4e86\u4e00\u53f0 RTX 4090 GPU\u4e91\u4e3b\u673a\uff0824 GB \u663e\u5b58\u3001PCIe 4.0 x16 \u5e26\u5bbd\u3001NVMe \u672c\u5730\u76d8\uff09\uff0c\u5b98\u65b9\u9884\u88c5\u4e86 <code>nvidia-driver 535 + CUDA 12.1 + PyTorch 2.2<\/code> \u955c\u50cf\uff0c\u5f00\u673a\u5373\u89c1 GPU\u3002  <\/p>\n<pre><code class=\"language-bash\"># 1. \u62c9\u53d6\u5df2\u7f16\u8bd1\u597d\u7684 llama.cpp\ngit clone https:\/\/github.com\/ggerganov\/llama.cpp &amp;&amp; cd llama.cpp\n# 2. \u4e0b\u8f7d Llama 3.1 8B \u5b98\u65b9\u6743\u91cd\uff08\u5e73\u53f0\u5df2\u7f13\u5b58\uff0c\u5185\u7f51 1 GB\/s\uff09\ncp \/publicModels\/llama-3.1-8b-instruct\/* .\/models\/\n# 3. \u91cf\u5316\u5230 4bit\uff0c\u663e\u5b58 &lt; 8GB\uff0c\u5355\u5361\u53ef\u8dd1\nmake -j LLAMA_CUBLAS=1 &amp;&amp; .\/quantize .\/models\/ggml-f16.gguf .\/models\/ggml-q4_0.gguf q4_0\n# 4. \u542f\u52a8 batch=512 \u7684\u670d\u52a1\n.\/server -m .\/models\/ggml-q4_0.gguf --host 0.0.0.0 --port 8000 -n 4096 -c 4096 -ngl 99\n<\/code><\/pre>\n<p>\u672c\u5730 <code>wrk<\/code> \u538b\u6d4b\uff0c\u5e76\u53d1 128\uff0c\u5e73\u5747 <strong>17 282 tokens\/s<\/strong>\uff0cP99 \u5ef6\u8fdf 118 ms\u3002\u663e\u5b58\u5360\u7528 7.4 GB\uff0c\u5e26\u5bbd\u8fd8\u6709 60% \u4f59\u91cf\uff0c\u5b8c\u5168\u65e0\u74f6\u9888\u3002  <\/p>\n<hr \/>\n<h2>3. \u4e00\u952e\u5373\u73a9\uff1a\u955c\u50cf + \u6570\u636e\u96c6 + \u5171\u4eab\u5b58\u50a8<\/h2>\n<p>\u5982\u679c\u4f60\u8fde\u547d\u4ee4\u884c\u90fd\u4e0d\u60f3\u6572\uff0c\u661f\u5b87\u667a\u7b97\u5e02\u573a\u91cc\u6709\u201cLlama-3.1-8B-Ready\u201d\u955c\u50cf\uff0c\u70b9\u4e00\u4e0b\u5373\u53ef\u521b\u5efa\u5b9e\u4f8b\u3002<br \/>\n&#8211; \u6a21\u578b\u3001\u4f9d\u8d56\u3001Web UI \u5df2\u9884\u88c5\uff1b<br \/>\n&#8211; \u516c\u5171\u8d44\u6e90\u6c60\u5185\u7f6e <a href=\"https:\/\/www.starverse-ai.com\/node\/019b88ac-286a-70a3-bafa-cfa47c851b4d\"><code>\/datasets<\/code><\/a> \u76ee\u5f55\uff0cC4\u3001SFT\u3001CoT \u6570\u636e\u4e00\u952e <code>cp<\/code>\uff1b<br \/>\n&#8211; \u4e91\u786c\u76d8\u652f\u6301\u70ed\u63d2\u62d4\uff0c\u8bad\u7ec3\u6570\u636e\u8de8\u5b9e\u4f8b\u5171\u4eab\uff0c\u5173\u673a\u4e0d\u4e22\u5931\u3002  <\/p>\n<p>\u4ece\u6d4f\u89c8\u5668\u6253\u5f00 <code>http:\/\/\u5b9e\u4f8bIP:8000<\/code>\uff0c5 \u5206\u949f\u5c31\u80fd\u4e0e\u6a21\u578b\u5bf9\u8bdd\u3001\u8c03 temperature\u3001\u4e0b\u8f7d JSON \u7ed3\u679c\u2014\u2014\u771f\u6b63\u610f\u4e49\u4e0a\u7684\u201c<a href=\"https:\/\/www.starverse-ai.com\"><strong>AI\u5e94\u7528<\/strong><\/a> \u4e00\u952e\u5373\u73a9\u201d\u3002  <\/p>\n<hr \/>\n<h2>4. \u6210\u672c\u8d26\uff1a0.6 \u5143\/\u5c0f\u65f6\u7684\u9b54\u6cd5<\/h2>\n<table>\n<thead>\n<tr>\n<th>\u65b9\u6848<\/th>\n<th>\u786c\u4ef6\u6210\u672c<\/th>\n<th>\u7535\u8d39\/\u5e74<\/th>\n<th>\u8fd0\u7ef4<\/th>\n<th>\u6bcf\u5c0f\u65f6\u644a\u9500<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>\u81ea\u8d2d H100 80G<\/td>\n<td>\u00a5250 000<\/td>\n<td>\u00a58 000<\/td>\n<td>1 \u4eba<\/td>\n<td>\u2248\u00a538<\/td>\n<\/tr>\n<tr>\n<td>\u4f20\u7edf\u4e91 A100 40G<\/td>\n<td>\u5305\u6708\u00a54 200<\/td>\n<td>\u5df2\u542b<\/td>\n<td>0<\/td>\n<td>\u2248\u00a55.8<\/td>\n<\/tr>\n<tr>\n<td>\u661f\u5b87 RTX 4090<\/td>\n<td>\u65e0<\/td>\n<td>\u5df2\u542b<\/td>\n<td>0<\/td>\n<td><strong>\u00a50.6<\/strong><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>\u4ee5\u65e5\u8dd1 8 \u5c0f\u65f6\u3001\u6708 22 \u5de5\u4f5c\u65e5\u8ba1\u7b97\uff1a<br \/>\n&#8211; \u81ea\u8d2d \u2248 \u00a56 700\/\u6708\uff0c\u8fd8\u8981\u627f\u62c5\u663e\u5361\u8d2c\u503c\uff1b<br \/>\n&#8211; \u4f20\u7edf\u4e91 \u2248 \u00a51 020\/\u6708\uff1b<br \/>\n&#8211; \u661f\u5b87\u667a\u7b97 <a href=\"https:\/\/www.starverse-ai.com\"><strong>GPU\u670d\u52a1\u5668\u79df\u7528<\/strong><\/a> <strong>\u4ec5 \u00a5106\/\u6708<\/strong>\uff0c\u65b0\u7528\u6237\u518d\u9001 10 \u5143\u4f53\u9a8c\u91d1\uff0c\u76f8\u5f53\u4e8e\u524d 16 \u5c0f\u65f6\u514d\u8d39\u3002  <\/p>\n<hr \/>\n<h2>5. \u7acb\u523b\u590d\u523b\uff1aAIGC \u521b\u4f5c\u8005\u4e0e\u5f00\u53d1\u8005\u7684\u4e09\u6b65\u4e0a\u8f66<\/h2>\n<ol>\n<li>\u6ce8\u518c\uff1a\u6d4f\u89c8\u5668\u6253\u5f00 <a href=\"https:\/\/www.starverse-ai.com\">starverse-ai.com<\/a>\uff0c\u90ae\u7bb1 10 \u79d2\u6ce8\u518c\uff0c\u7cfb\u7edf\u81ea\u52a8\u53d1\u653e 10 \u5143\u4f53\u9a8c\u91d1\u3002  <\/li>\n<li>\u9009\u578b\uff1a\u63a7\u5236\u53f0\u9009\u62e9\u201cRTX 4090 \/ 24G \/ 8 vCPU \/ 32 GB RAM\u201d\uff0c\u955c\u50cf\u52fe\u9009\u201cLlama-3.1-8B-Ready\u201d\uff0c\u70b9\u51fb\u521b\u5efa\u3002  <\/li>\n<li>\u4f53\u9a8c\uff1a\u5b9e\u4f8b\u542f\u52a8\u540e\uff0c  <\/li>\n<li>\u82e5\u60f3\u5199\u5c0f\u8bf4\uff0c\u76f4\u63a5\u8c03\u7528 <code>\/v1\/completions<\/code> API\uff0c\u628a temperature \u8c03\u5230 1.2\uff1b  <\/li>\n<li>\u82e5\u60f3\u8bad\u7ec3 LoRA\uff0c\u628a\u6570\u636e\u4e0a\u4f20\u5230 <a href=\"https:\/\/www.starverse-ai.com\/node\/019b88ac-0730-7451-a8ab-9c3c873fef42\"><code>\u4e91\u5b58\u50a8<\/code><\/a>\uff0c\u6302\u8f7d\u5230 <code>\/mnt\/nas<\/code>\uff0c\u4e00\u53e5 <code>accelerate launch<\/code> \u5373\u53ef\uff1b  <\/li>\n<li>\u82e5\u60f3\u6253\u5305\u81ea\u5df1\u7684 AI SaaS\uff0c\u7528\u5e73\u53f0\u63d0\u4f9b\u7684 <code>gradio<\/code> \u6a21\u677f\uff0c10 \u5206\u949f\u751f\u6210\u53ef\u5206\u4eab\u94fe\u63a5\u3002  <\/li>\n<\/ol>\n<hr \/>\n<h2>\u5199\u5728\u6700\u540e<\/h2>\n<p>ASIC \u7684\u6545\u4e8b\u8db3\u591f\u6027\u611f\uff0c\u4f46\u7b97\u6cd5\u4e00\u65e5\u5343\u91cc\uff0c<strong>\u56fa\u5316\u5373\u843d\u540e<\/strong>\u3002<br \/>\n\u5728\u521b\u610f\u5fc5\u987b\u5feb\u901f\u8bd5\u9519\u3001\u6a21\u578b\u5468\u66f4\u3001\u5e94\u7528\u65e5\u66f4\u7684\u65f6\u4ee3\uff0c\u661f\u5b87\u667a\u7b97\u7528 <strong>0.6 \u5143\/\u5c0f\u65f6<\/strong> \u7684 <a href=\"https:\/\/www.starverse-ai.com\"><strong>GPU\u4e91\u4e3b\u673a<\/strong><\/a> \u628a Llama 3.1 8B \u62c9\u5230 17k tokens\/s\uff0c\u8ba9\u201c\u7b97\u529b\u81ea\u7531\u201d\u4e0d\u518d\u662f\u53e3\u53f7\u3002  <\/p>\n<p>\u4eca\u5929\uff0c\u4f60\u53ef\u4ee5\u7528\u4e00\u676f\u8c46\u6d46\u7684\u94b1\u8dd1 1 \u5c0f\u65f6 70 \u4ebf\u53c2\u6570\u5927\u6a21\u578b\uff1b\u660e\u5929\uff0c\u6216\u8bb8\u4f60\u7684\u63d2\u4ef6\u3001\u4f60\u7684 LoRA\u3001\u4f60\u7684 AI \u539f\u751f\u5e94\u7528\u5c31\u80fd\u6210\u4e3a\u4e0b\u4e00\u4e2a\u7206\u6b3e\u3002<br \/>\n<a href=\"https:\/\/www.starverse-ai.com\"><strong>\u70b9\u51fb\u6ce8\u518c<\/strong><\/a>\uff0c10 \u5143\u4f53\u9a8c\u91d1\u5df2\u5907\u597d\uff0c\u5269\u4e0b\u7684\u521b\u610f\uff0c\u4ea4\u7ed9\u4f60\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u201c\u628a Llama 3.1 8B \u523b\u8fdb ASIC\uff0c\u63a8\u7406\u901f\u5ea6 5&hellip;<\/p>\n","protected":false},"author":2,"featured_media":2319,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-2320","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-zixun"],"views":83,"_links":{"self":[{"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/posts\/2320","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/comments?post=2320"}],"version-history":[{"count":0,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/posts\/2320\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/media\/2319"}],"wp:attachment":[{"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/media?parent=2320"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/categories?post=2320"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/tags?post=2320"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}