{"id":2537,"date":"2026-03-04T14:20:41","date_gmt":"2026-03-04T06:20:41","guid":{"rendered":"https:\/\/www.starverse-ai.com\/guide\/archives\/2537"},"modified":"2026-03-04T14:20:41","modified_gmt":"2026-03-04T06:20:41","slug":"%e5%a4%a7%e6%a8%a1%e5%9e%8b%e6%8e%a8%e7%90%86%e5%8a%a0%e9%80%9f300%ef%bc%8c%e6%98%9f%e5%ae%87%e6%99%ba%e7%ae%97gpu%e7%a7%9f%e8%b5%81vllm%e6%a1%86%e6%9e%b6%e7%94%9f%e4%ba%a7%e9%83%a8%e7%bd%b2%e5%ae%9e","status":"publish","type":"post","link":"https:\/\/www.starverse-ai.com\/guide\/archives\/2537","title":{"rendered":"\u5927\u6a21\u578b\u63a8\u7406\u52a0\u901f300%\uff0c\u661f\u5b87\u667a\u7b97GPU\u79df\u8d41+vLLM\u6846\u67b6\u751f\u4ea7\u90e8\u7f72\u5b9e\u6218"},"content":{"rendered":"<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.starverse-ai.com\/guide\/wp-content\/uploads\/2026\/03\/1772605241_e10cb1.png\" alt=\"\u5927\u6a21\u578b\u63a8\u7406\u52a0\u901f300%\uff0c\u661f\u5b87\u667a\u7b97GPU\u79df\u8d41+vLLM\u6846\u67b6\u751f\u4ea7\u90e8\u7f72\u5b9e\u6218\" style=\"display:block; margin:10px auto; max-width:100%; height:auto;\" \/><\/figure>\n<blockquote>\n<p>\u80cc\u666f\uff1a2024 \u5e74 5 \u6708\uff0c\u67d0\u5934\u90e8 SaaS \u5ba2\u670d\u5e73\u53f0\u5728\u4e00\u6b21\u5927\u4fc3\u4e2d\uff0c\u7ebf\u4e0a LLM \u63a5\u53e3 QPS \u4ece 200 \u98d9\u5230 800\uff0cGPU \u5229\u7528\u7387\u5374\u59cb\u7ec8\u5f98\u5f8a\u5728 35% \u4ee5\u4e0b\u3002\u4e0d\u5230\u4e24\u5c0f\u65f6\uff0c\u6392\u961f\u8d85\u65f6\u7387\u98d9\u5230 18%\uff0c\u76f4\u63a5\u635f\u5931\u8ba2\u5355 40 \u4e07\u3002\u8fd0\u7ef4\u56e2\u961f\u7d27\u6025\u6269\u5bb9\uff0c\u5374\u82e6\u4e8e\u201c\u5361\u96be\u79df\u3001\u4ef7\u66f4\u9ad8\u201d\uff0c\u53ea\u80fd\u671b\u6d0b\u5174\u53f9\u3002\u540c\u4e00\u5468\uff0c\u53e6\u4e00\u5bb6\u521d\u521b\u516c\u53f8\u7528 8 \u5f20 A800 \u5c31\u5b8c\u6210\u4e86\u540c\u7b49\u5e76\u53d1\uff0c\u9996 token \u5ef6\u8fdf\u7a33\u5b9a\u5728 100 ms \u4ee5\u5185\uff0c\u6210\u672c\u53cd\u5012\u4e0b\u964d 60%\u3002\u79d8\u8bc0\u53ea\u6709\u4e00\u53e5\u8bdd\uff1a<strong>\u628a GPU \u670d\u52a1\u5668\u79df\u7ed9\u61c2\u5927\u6a21\u578b\u63a8\u7406\u7684\u4eba\uff0c\u518d\u628a\u6846\u67b6\u6362\u6210 vLLM\u3002<\/strong><\/p>\n<\/blockquote>\n<hr \/>\n<h2>\u4e00\u3001\u5f53\u201c\u5927\u6a21\u578b\u201d\u9047\u4e0a\u201c\u5c0f\u5e26\u5bbd\u201d\uff1aGPU \u5229\u7528\u7387\u4f4e\u7684\u771f\u76f8<\/h2>\n<p>\u5f88\u591a\u56e2\u961f\u4ee5\u4e3a\uff0c\u53ea\u8981\u79df\u5230\u6700\u8d35\u7684\u90a3\u5f20\u5361\uff0c\u63a8\u7406\u5c31\u80fd\u9ad8\u6795\u65e0\u5fe7\u3002\u4e8b\u5b9e\u5374\u662f\uff1a<\/p>\n<ul>\n<li>Hugging Face Transformers \u9ed8\u8ba4\u52a8\u6001 batch\uff0cpadding \u6d6a\u8d39 30% \u7b97\u529b\uff1b<\/li>\n<li>\u4f20\u7edf\u6846\u67b6\u4e00\u6b21\u53ea\u80fd\u5904\u7406\u4e00\u4e2a\u8bf7\u6c42\uff0cGPU SM \u5355\u5143\u7a7a\u8f6c\uff1b<\/li>\n<li>\u7ebf\u4e0a QPS \u7a81\u589e\uff0cKubernetes HPA \u6309\u201cPod \u6570\u201d\u800c\u975e\u201c\u541e\u5410\u201d\u4f38\u7f29\uff0c\u5bfc\u81f4\u76f2\u76ee\u5806\u5361\u3002<\/li>\n<\/ul>\n<p>\u7ed3\u679c\u5c31\u662f\uff1a<strong>\u5361\u8d8a\u5806\u8d8a\u591a\uff0c\u5229\u7528\u7387\u8d8a\u6765\u8d8a\u4f4e\uff0c\u8d26\u5355\u8d8a\u6765\u8d8a\u957f<\/strong>\u3002<\/p>\n<hr \/>\n<h2>\u4e8c\u3001\u661f\u5b87\u667a\u7b97 A800 + vLLM\uff1acontinuous batching \u628a 35% \u62c9\u5230 95%<\/h2>\n<p>\u661f\u5b87\u667a\u7b97\u5e73\u53f0\u8fd1\u671f\u4e0a\u7ebf\u7684 <strong>GPU \u4e91\u4e3b\u673a\u300cA800-80G \u65d7\u8230\u8282\u70b9\u300d<\/strong>\uff0c\u539f\u751f\u5185\u5d4c vLLM \u5f15\u64ce\uff0c\u5e76\u5728 CUDA 12.1 \u9a71\u52a8\u5c42\u505a\u4e86\u56db\u9879\u8c03\u4f18\uff1a<\/p>\n<ol>\n<li><strong>PagedAttention<\/strong>\uff1aKV-Cache \u6309\u5757\u5206\u914d\uff0c\u663e\u5b58\u788e\u7247 &lt;1%\uff1b<\/li>\n<li><strong>continuous batching<\/strong>\uff1a\u65b0\u8bf7\u6c42\u5b9e\u65f6\u63d2\u5165\uff0c\u65e0\u9700\u7b49\u65e7 batch \u8d70\u5b8c\uff1b<\/li>\n<li><strong>\u5f20\u91cf\u5e76\u884c + \u6d41\u6c34\u7ebf\u5e76\u884c<\/strong>\uff1a\u5355\u673a 8 \u5361\u7ebf\u6027\u6269\u5c55\uff0c13B \u6a21\u578b\u53ef\u8dd1 2048 token \u8f93\u5165\uff1b<\/li>\n<li><strong>StarLink \u667a\u80fd\u7f51<\/strong>\uff1a\u81ea\u7814 RDMA \u7f51\u7edc\uff0c\u5361\u95f4\u5ef6\u8fdf 2 \u03bcs\uff0c\u6bd4\u4f20\u7edf VPC \u964d\u4f4e 85%\u3002<\/li>\n<\/ol>\n<p>\u5728\u5b9e\u6d4b\u4e2d\uff0c\u6211\u4eec\u5c06 Llama2-13B-Chat \u90e8\u7f72\u5230 4 \u53f0 A800\uff0832 \u5361\uff09\uff0c\u8f93\u5165 512 token\u3001\u8f93\u51fa 128 token\uff0cQPS \u4ece 260 \u63d0\u5347\u5230 821\uff0c<strong>\u9996 token \u5ef6\u8fdf &lt; 100 ms\uff0cP99 \u5ef6\u8fdf 340 ms<\/strong>\uff0cGPU \u5229\u7528\u7387\u7a33\u5b9a\u5728 95% \u4ee5\u4e0a\uff0c<strong>\u6574\u4f53\u541e\u5410\u63d0\u5347 300%<\/strong>\u3002<\/p>\n<hr \/>\n<h2>\u4e09\u3001\u6210\u672c\u8d26\uff1a\u540c\u6837\u5e76\u53d1\uff0c\u5361\u6570 \u219360%\uff0c\u6708\u79df\u8d39\u7701 3 \u4e07<\/h2>\n<p>\u4ee5 800 QPS \u4e3a\u4f8b\uff0c\u4f20\u7edf\u65b9\u6848\u9700\u8981 20 \u5f20 A800\uff1b\u7528\u661f\u5b87\u667a\u7b97 vLLM \u955c\u50cf\u540e\uff0c\u53ea\u9700 8 \u5f20\u3002\u6309\u5e73\u53f0 <a href=\"https:\/\/www.starverse-ai.com\">GPU\u670d\u52a1\u5668\u79df\u7528<\/a> \u6708\u4ed8\u4ef7 \u00a54 200\/\u5361\u8ba1\u7b97\uff0c<strong>\u6bcf\u6708\u76f4\u63a5\u8282\u7701 50 400 \u5143<\/strong>\uff0c\u518d\u7b97\u4e0a\u8fd0\u7ef4\u3001\u7535\u8d39\u3001\u673a\u623f\u6258\u7ba1\uff0c<strong>\u7efc\u5408\u6210\u672c\u4e0b\u964d 60% \u4ee5\u4e0a<\/strong>\u3002\u5bf9\u4e8e\u52a8\u8f84 3\uff5e5 \u4e2a\u73af\u5883\u7684 AI \u5e94\u7528\u8fed\u4ee3\u56e2\u961f\uff0c<strong>\u4e00\u5e74\u53ef\u7701\u4e0b\u4e00\u8f86 Model S<\/strong>\u3002<\/p>\n<hr \/>\n<h2>\u56db\u30015 \u884c\u547d\u4ee4\u5b8c\u6210\u70ed\u66ff\u6362\uff1a\u955c\u50cf\u5df2\u9884\u88c5\uff0c\u96f6\u4ee3\u7801\u5165\u4fb5<\/h2>\n<p>\u661f\u5b87\u667a\u7b97\u5b98\u65b9\u955c\u50cf <code>starverse\/vllm:0.4.2-py310-cu121<\/code> \u5df2\u96c6\u6210\uff1a<\/p>\n<ul>\n<li>vLLM 0.4.2 \u6700\u65b0\u7a33\u5b9a\u7248\uff1b<\/li>\n<li>FastAPI \u670d\u52a1\u6a21\u677f\uff0c\u4e0e OpenAI API 100% \u517c\u5bb9\uff1b<\/li>\n<li>\u5e73\u53f0\u5185\u7f6e <a href=\"https:\/\/www.starverse-ai.com\/node\/019b88ac-286a-70a3-bafa-cfa47c851b4d\">\u6a21\u578b\u4e0e\u6570\u636e\u96c6<\/a>\uff0c13B\/70B \u6743\u91cd\u4e00\u952e\u62f7\u8d1d\u5230 <code>\/models<\/code>\u3002<\/li>\n<\/ul>\n<p><strong>\u66ff\u6362\u6b65\u9aa4<\/strong>\uff08\u57fa\u4e8e\u73b0\u6709 Kubernetes \u73af\u5883\uff09\uff1a<\/p>\n<pre><code class=\"language-bash\"># 1. \u62c9\u53d6\u955c\u50cf\ndocker pull starverse\/vllm:0.4.2-py310-cu121\n\n# 2. \u542f\u52a8\u670d\u52a1\nvllm serve \/models\/Llama-2-13b-chat \\\n  --tensor-parallel-size 2 \\\n  --max-num-seqs 256 \\\n  --max-model-len 4096\n\n# 3. \u4fee\u6539\u539f\u6709 API \u6307\u5411\u65b0 endpoint\nexport OPENAI_API_BASE=https:\/\/&lt;pod-ip&gt;:8000\/v1\n\n# 4. \u5f00\u542f continuous batching\n--enable-prefix-caching --swap-space 4\n\n# 5. \u7070\u5ea6 10% \u6d41\u91cf\uff0c\u786e\u8ba4 P99 \u5ef6\u8fdf\u4e0b\u964d\u540e\u5168\u91cf\u5207\u6362\n<\/code><\/pre>\n<p>\u5168\u7a0b\u65e0\u9700\u6539\u52a8\u4e1a\u52a1\u4ee3\u7801\uff0c<strong>\u5e73\u5747 15 \u5206\u949f\u5373\u53ef\u5b8c\u6210\u70ed\u5347\u7ea7<\/strong>\u3002<\/p>\n<hr \/>\n<h2>\u4e94\u3001\u4e3a\u4ec0\u4e48\u5fc5\u987b\u662f\u661f\u5b87\u667a\u7b97\uff1f<\/h2>\n<ol>\n<li><strong>\u6781\u81f4\u6027\u4ef7\u6bd4<\/strong>\uff1aRTX 4090 \/ A800 \/ H100 \u591a\u5361\u578b\u540c\u6c60\u8c03\u5ea6\uff0c<a href=\"https:\/\/www.starverse-ai.com\">GPU\u4e91\u4e3b\u673a<\/a> \u652f\u6301\u6309\u5c0f\u65f6\u3001\u6309\u5929\u3001\u6309\u6708\u7075\u6d3b\u8ba1\u8d39\uff1b<\/li>\n<li><strong>\u6570\u636e\u9ad8\u901f\u901a\u8def<\/strong>\uff1a\u4e91\u786c\u76d8\u3001\u4e91\u5b58\u50a8\u3001<a href=\"https:\/\/www.starverse-ai.com\/node\/019b88ac-286a-70a3-bafa-cfa47c851b4d\">\u516c\u5171\u8d44\u6e90\u5e93<\/a> \u4e09\u76d8\u5408\u4e00\uff0c\u8de8\u5b9e\u4f8b\u6302\u8f7d 0 \u62f7\u8d1d\uff1b<\/li>\n<li><strong>\u5f00\u53d1\u8005\u751f\u6001<\/strong>\uff1aJupyterLab\u3001VS Code\u3001TensorBoard \u4e00\u952e\u5373\u5f00\uff0c<a href=\"https:\/\/www.starverse-ai.com\">AI\u5e94\u7528<\/a> \u5e02\u573a\u5185\u7f6e 120+ \u70ed\u95e8\u955c\u50cf\uff1b<\/li>\n<li><strong>\u65b0\u7528\u6237\u798f\u5229<\/strong>\uff1a\u6ce8\u518c\u5373\u9001 10 \u5143\u4f53\u9a8c\u91d1\uff0c0 \u6210\u672c\u8dd1\u901a 13B \u6a21\u578b\u63a8\u7406\u3002<\/li>\n<\/ol>\n<hr \/>\n<h2>\u516d\u3001\u4e0b\u4e00\u6b65\uff1a\u628a\u7701\u4e0b\u7684 60% \u6210\u672c\u518d\u6295\u5165\u521b\u65b0<\/h2>\n<p>\u5927\u6a21\u578b\u7ade\u4e89\u8fdb\u5165\u201c\u6beb\u79d2\u7ea7\u201d\u65f6\u4ee3\uff0c\u63a8\u7406\u6210\u672c\u6bcf\u964d\u4f4e 1 \u4e07\u5143\uff0c\u5c31\u610f\u5473\u7740\u591a\u4e00\u6b21\u7b97\u6cd5\u5b9e\u9a8c\u3001\u591a\u4e00\u8f6e\u4ea7\u54c1\u8fed\u4ee3\u3002\u661f\u5b87\u667a\u7b97\u63d0\u4f9b\u7684\u4e0d\u4ec5\u662f <a href=\"https:\/\/www.starverse-ai.com\">GPU\u670d\u52a1\u5668\u79df\u7528<\/a>\uff0c\u66f4\u662f\u4e00\u5957<strong>\u4ece\u8bad\u7ec3\u5230\u63a8\u7406\u3001\u4ece\u6570\u636e\u5230\u90e8\u7f72\u7684\u5b8c\u6574 AI \u52a0\u901f\u65b9\u6848<\/strong>\u3002\u73b0\u5728\u5c31\u6253\u5f00 <a href=\"https:\/\/www.starverse-ai.com\">starverse-ai.com<\/a>\uff0c\u9886\u53d6 10 \u5143\u4f53\u9a8c\u91d1\uff0c\u7528 5 \u884c\u547d\u4ee4\u628a vLLM \u8dd1\u8d77\u6765\uff0c\u8ba9 GPU \u5229\u7528\u7387\u4ece 35% \u7ffb\u5230 95%\uff0c<strong>\u628a\u7701\u4e0b\u7684 3 \u4e07\u6708\u79df\uff0c\u771f\u6b63\u82b1\u5728\u4e0b\u4e00\u6b3e\u6740\u624b\u7ea7 AI \u5e94\u7528\u4e0a<\/strong>\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u80cc\u666f\uff1a2024 \u5e74 5 \u6708\uff0c\u67d0\u5934\u90e8 SaaS \u5ba2\u670d\u5e73\u53f0\u5728\u4e00\u6b21&hellip;<\/p>\n","protected":false},"author":2,"featured_media":2536,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-2537","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-zixun"],"views":38,"_links":{"self":[{"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/posts\/2537","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/comments?post=2537"}],"version-history":[{"count":0,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/posts\/2537\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/media\/2536"}],"wp:attachment":[{"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/media?parent=2537"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/categories?post=2537"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/tags?post=2537"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}