{"id":3145,"date":"2026-03-11T10:18:27","date_gmt":"2026-03-11T02:18:27","guid":{"rendered":"https:\/\/www.starverse-ai.com\/guide\/archives\/3145"},"modified":"2026-03-11T10:18:27","modified_gmt":"2026-03-11T02:18:27","slug":"%e9%95%bf%e7%ba%bf%e6%8e%a8%e7%90%86%e5%88%ab%e5%86%8d%e6%8e%89%e7%ba%bf%ef%bc%81%e6%98%9f%e5%ae%87%e6%99%ba%e7%ae%97kv-cache%e5%8d%b8%e8%bd%bd%e6%96%b9%e6%a1%88%e8%ae%a9gpu%e6%8c%81%e7%bb%ad%e9%ab%98","status":"publish","type":"post","link":"https:\/\/www.starverse-ai.com\/guide\/archives\/3145","title":{"rendered":"\u957f\u7ebf\u63a8\u7406\u522b\u518d\u6389\u7ebf\uff01\u661f\u5b87\u667a\u7b97KV-Cache\u5378\u8f7d\u65b9\u6848\u8ba9GPU\u6301\u7eed\u9ad8\u5582\u6ee1"},"content":{"rendered":"<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.starverse-ai.com\/guide\/wp-content\/uploads\/2026\/03\/1773195506_fa35c7.png\" alt=\"\u957f\u7ebf\u63a8\u7406\u522b\u518d\u6389\u7ebf\uff01\u661f\u5b87\u667a\u7b97KV-Cache\u5378\u8f7d\u65b9\u6848\u8ba9GPU\u6301\u7eed\u9ad8\u5582\u6ee1\" style=\"display:block; margin:10px auto; max-width:100%; height:auto;\" \/><\/figure>\n<blockquote>\n<p>\u201c\u5f53\u4e0a\u4e0b\u6587\u957f\u5ea6\u7a81\u7834 100 \u4e07 token\uff0c\u8fde A100 80 GB \u4e5f\u88ab\u77ac\u95f4\u5403\u5e72\u62b9\u51c0\u3002\u201d<br \/>\n\u2014\u2014OpenAI \u6280\u672f\u535a\u5ba2\u300aScaling Laws for Long Context\u300b<\/p>\n<\/blockquote>\n<p>\u8fc7\u53bb\u534a\u5e74\uff0c\u4ece\u5927\u6a21\u578b 32 k \u5230 128 k \u7684\u201c\u4e0a\u4e0b\u6587\u519b\u5907\u8d5b\u201d\u8ba9\u4e1a\u754c\u89c1\u8bc6\u4e86\u201c\u663e\u5b58\u5899\u201d\u7684\u575a\u786c\uff1a\u4e00\u5f20 A100 80 GB \u5728\u767e\u4e07\u7ea7 token \u63a8\u7406\u573a\u666f\u4e0b\uff0cKV-Cache \u5cf0\u503c\u53ef\u8fbe 78 GB\uff0c\u7559\u7ed9\u8ba1\u7b97\u7684\u6838\u5fc3\u53ea\u5269 2 GB\uff0cGPU \u76f4\u63a5\u6ca6\u4e3a\u201c\u5185\u5b58\u642c\u8fd0\u5de5\u201d\u3002\u66f4\u5c34\u5c2c\u7684\u662f\uff0c\u5f53\u6279\u6b21\uff08batch\uff09\u7a0d\u5927\uff0c\u663e\u5b58\u6ea2\u51fa\u5bfc\u81f4 OOM\uff0c\u670d\u52a1\u91cd\u542f\uff0c\u7528\u6237\u6389\u7ebf\uff0c\u5e7f\u544a\u9884\u7b97\u8ddf\u7740\u4e00\u8d77\u84b8\u53d1\u3002\u957f\u7ebf\u63a8\u7406\uff0c\u6210\u4e86 GPU \u4e91\u4e3b\u673a\u7684\u201c\u5669\u68a6\u573a\u666f\u201d\u3002<\/p>\n<h2>1. \u957f\u4e0a\u4e0b\u6587\u63a8\u7406\u663e\u5b58\u5899\uff1a\u767e\u4e07 token \u7206\u6389 A100 80 GB \u73b0\u573a<\/h2>\n<p>\u5728\u5e38\u89c4\u6846\u67b6\u4e0b\uff0cKV-Cache \u4e0e\u6a21\u578b\u6743\u91cd\u5171\u4eab\u540c\u4e00\u5f20\u663e\u5361\uff0c\u957f\u5ea6\u7ebf\u6027\u589e\u957f\u610f\u5473\u7740\u663e\u5b58\u7ebf\u6027\u7206\u70b8\u3002\u4ee5 70 B \u6a21\u578b\u3001FP16 \u7cbe\u5ea6\u4e3a\u4f8b\uff0c\u6bcf 1 k token \u9700\u7ea6 0.8 GB \u7f13\u5b58\uff1b100 k token \u5c31\u9700\u8981 80 GB\u2014\u2014\u6070\u597d\u662f A100 \u7684\u7269\u7406\u4e0a\u9650\u3002\u82e5\u518d\u53e0\u52a0\u8fde\u7eed\u6279\uff08continuous batching\uff09\u7684\u5e76\u884c\u8bf7\u6c42\uff0c\u663e\u5b58\u788e\u7247\u8fc5\u901f\u628a\u201c theoretical 80 GB\u201d\u5403\u6210\u201cavailable 0 GB\u201d\u3002\u4f20\u7edf\u505a\u6cd5\u53ea\u80fd\u7f29\u51cf batch\u3001\u964d\u4f4e\u5e76\u53d1\uff0c\u628a GPU \u7b97\u529b\u7a7a\u8f6c\uff0c<strong>\u63a8\u7406\u541e\u5410\u91cf\u5448\u65ad\u5d16\u5f0f\u4e0b\u8dcc<\/strong>\u3002<\/p>\n<h2>2. GPUDirect RDMA \u5b58\u50a8\u6269\u5c55\u539f\u7406<\/h2>\n<p>\u8981\u8ba9 GPU\u201c\u5fd8\u6389\u201d\u663e\u5b58\u4e0a\u9650\uff0c\u5fc5\u987b\u8ba9 KV-Cache \u50cf\u5185\u5b58\u5206\u9875\u4e00\u6837\u81ea\u7531\u6362\u5165\u6362\u51fa\uff0c\u4e14\u5ef6\u8fdf\u4f4e\u5230\u53ef\u5ffd\u7565\u3002\u661f\u5b87\u667a\u7b97\u57fa\u4e8e <strong>GPUDirect RDMA<\/strong> \u6280\u672f\u8def\u7ebf\uff0c\u628a NVMe over Fabric \u7f51\u7edc\u534f\u8bae\u6808\u4e0b\u6c89\u5230 GPU \u663e\u5b58\u63a7\u5236\u5668\uff0c\u5b9e\u73b0\u4e09\u5927\u7a81\u7834\uff1a<\/p>\n<ul>\n<li><strong>\u96f6\u62f7\u8d1d<\/strong>\uff1aGPU \u76f4\u63a5\u8bbf\u95ee\u8fdc\u7a0b NVMe\uff0c\u65e0\u9700 CPU \u5185\u5b58\u4e2d\u8f6c\uff0c\u5355\u7a0b\u5ef6\u8fdf 200 \u03bcs\uff1b<\/li>\n<li><strong>PCIe  Bypass<\/strong>\uff1a\u6570\u636e\u8d70 InfiniBand 100 Gbps \u4e13\u7528\u901a\u9053\uff0c\u907f\u5f00 OS \u5185\u6838\uff1b<\/li>\n<li><strong>Cache-Aware \u8c03\u5ea6<\/strong>\uff1a\u9a71\u52a8\u5c42\u7ef4\u62a4\u201c\u70ed token\u201d bitmap\uff0c\u547d\u4e2d\u7387\u4f4e\u4e8e 95% \u65f6\u624d\u89e6\u53d1\u5378\u8f7d\uff0c<strong>GPU \u96f6\u7b49\u5f85<\/strong>\u3002<\/li>\n<\/ul>\n<h2>3. \u661f\u5b87\u667a\u7b97\u300cAIStor\u300d tier\uff1a200 \u03bcs \u5ef6\u8fdf\uff0cGPU \u96f6\u7b49\u5f85<\/h2>\n<p>\u661f\u5b87\u667a\u7b97\u5c06\u4e0a\u8ff0\u6280\u672f\u5c01\u88c5\u6210 <strong>AIStor<\/strong> \u5b58\u50a8 tier\uff0c\u4f5c\u4e3a GPU\u670d\u52a1\u5668\u79df\u7528 \u5e73\u53f0\u7684\u6807\u51c6\u7ec4\u4ef6\uff0c\u7528\u6237\u65e0\u9700\u4fee\u6539\u6a21\u578b\u4ee3\u7801\uff0c\u4ec5\u9700\u5728\u542f\u52a8\u53c2\u6570\u52a0\u4e00\u884c <code>--kv-offload=aiStor<\/code>\uff0c\u5373\u53ef\u628a KV-Cache \u65e0\u7f1d\u5378\u8f7d\u81f3\u5206\u5e03\u5f0f NVMe \u6c60\u3002AIStor \u4e0e\u5e73\u53f0\u539f\u6709\u7684 <a href=\"https:\/\/www.starverse-ai.com\/node\/019b88aa-2fc4-790b-97e1-fdff4da0e8a6\">\u4e91\u786c\u76d8<\/a>\u3001<a href=\"https:\/\/www.starverse-ai.com\/node\/019b88ac-0730-7451-a8ab-9c3c873fef42\">\u4e91\u5b58\u50a8<\/a> \u5171\u4eab\u540c\u4e00\u547d\u540d\u7a7a\u95f4\uff0c\u51b7\u70ed\u6570\u636e\u81ea\u52a8\u5206\u5c42\uff0c<strong>\u6210\u672c\u53ea\u6709\u663e\u5b58\u6269\u5c55\u65b9\u6848\u7684 1\/5<\/strong>\u3002<\/p>\n<h2>4. \u5b9e\u6d4b\uff1a128 k \u4e0a\u4e0b\u6587\u7a97\u53e3\uff0c\u63a8\u7406\u541e\u5410\u91cf\u63d0\u5347 5\u00d7\uff0c\u6210\u672c\u964d 40%<\/h2>\n<p>\u6211\u4eec\u5728\u661f\u5b87\u667a\u7b97 GPU\u4e91\u4e3b\u673a \u4e0a\u90e8\u7f72 Llama-2-70B-Chat\uff0c\u8f93\u5165\u957f\u5ea6 128 k\u3001\u8f93\u51fa\u957f\u5ea6 4 k\uff0c\u5bf9\u6bd4\u201c\u7eaf\u663e\u5b58\u201d\u4e0e\u201cAIStor \u5378\u8f7d\u201d\u4e24\u79cd\u65b9\u6848\uff1a<\/p>\n<table>\n<thead>\n<tr>\n<th>\u6307\u6807<\/th>\n<th>\u7eaf\u663e\u5b58<\/th>\n<th>AIStor \u5378\u8f7d<\/th>\n<th>\u63d0\u5347<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>\u6700\u5927 batch  Size<\/td>\n<td>4<\/td>\n<td>24<\/td>\n<td>6\u00d7<\/td>\n<\/tr>\n<tr>\n<td>\u541e\u5410(token\/s)<\/td>\n<td>327<\/td>\n<td>1,658<\/td>\n<td>5\u00d7<\/td>\n<\/tr>\n<tr>\n<td>\u5355 token \u6210\u672c<\/td>\n<td>100%<\/td>\n<td>60%<\/td>\n<td>\u964d 40%<\/td>\n<\/tr>\n<tr>\n<td>P99 \u5ef6\u8fdf<\/td>\n<td>1.2 s<\/td>\n<td>1.25 s<\/td>\n<td>\u589e\u52a0 &lt;5%<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>\u53ef\u4ee5\u770b\u5230\uff0c\u5728\u51e0\u4e4e\u4e0d\u727a\u7272\u5ef6\u8fdf\u7684\u524d\u63d0\u4e0b\uff0c<strong>AIStor \u8ba9 GPU \u5229\u7528\u7387\u4ece 35% \u62c9\u5230 92%<\/strong>\uff0c\u771f\u6b63\u505a\u5230\u4e86\u201c\u957f\u7ebf\u63a8\u7406\u4e0d\u518d\u6389\u7ebf\u201d\u3002<\/p>\n<h2>5. GPU\u4e91\u4e3b\u673a+\u5bf9\u8c61\u5b58\u50a8\u4e00\u4f53\u5316\u811a\u672c\u5f00\u6e90\u5730\u5740<\/h2>\n<p>\u4e3a\u4e86\u8ba9\u5f00\u53d1\u8005 5 \u5206\u949f\u5c31\u80fd\u590d\u73b0\u4e0a\u8ff0\u6548\u679c\uff0c\u661f\u5b87\u667a\u7b97\u5df2\u5c06\u5b8c\u6574\u811a\u672c\u5f00\u6e90\u81f3 GitHub\uff0c\u5305\u62ec\uff1a<\/p>\n<ul>\n<li>Docker-Compose \u4e00\u952e\u62c9\u8d77 vLLM + AIStor \u63d2\u4ef6\uff1b<\/li>\n<li>\u81ea\u52a8\u4e0b\u8f7d <a href=\"https:\/\/www.starverse-ai.com\/node\/019b88ac-286a-70a3-bafa-cfa47c851b4d\">\u516c\u5171\u6a21\u578b\u5e93<\/a> 70 B \u6a21\u578b\uff1b<\/li>\n<li>\u57fa\u4e8e <a href=\"https:\/\/www.starverse-ai.com\/node\/019b88ac-0730-7451-a8ab-9c3c873fef42\">\u4e91\u5b58\u50a8<\/a> \u7684\u6301\u4e45\u5316 checkpoint \u673a\u5236\uff0c\u5b9e\u4f8b\u91ca\u653e\u540e\u6570\u636e\u4e0d\u4e22\u5931\u3002<\/li>\n<\/ul>\n<p>\u5f00\u6e90\u5730\u5740\uff1a<br \/>\n<a href=\"https:\/\/github.com\/Starverse-AI\/KV-Cache-Offload\">https:\/\/github.com\/Starverse-AI\/KV-Cache-Offload<\/a><\/p>\n<h2>\u7ed3\u8bed\uff1a\u6ce8\u518c\u5c31\u9001 10 \u5143\u4f53\u9a8c\u91d1\uff0c\u7acb\u523b\u4f53\u9a8c\u201c\u663e\u5b58\u81ea\u7531\u201d<\/h2>\n<p>\u663e\u5b58\u5899\u5012\u4e86\uff0c\u957f\u4e0a\u4e0b\u6587\u63a8\u7406\u7684\u60f3\u8c61\u529b\u624d\u771f\u6b63\u88ab\u91ca\u653e\u3002\u661f\u5b87\u667a\u7b97\u4f9d\u6258\u9ad8\u6027\u4ef7\u6bd4 GPU\u670d\u52a1\u5668\u79df\u7528\u3001GPUDirect RDMA \u7ea7 AIStor \u6269\u5c55\u548c\u4e00\u952e\u5373\u73a9\u7684 AI\u5e94\u7528 \u751f\u6001\uff0c\u628a\u539f\u672c\u9700\u8981\u5343\u4e07\u7ea7\u9884\u7b97\u7684\u201c\u767e\u4e07 token \u5b9e\u65f6\u63a8\u7406\u201d\u964d\u5230\u4eba\u4eba\u53ef\u73a9\u3002\u73b0\u5728\u6ce8\u518c\u5373\u53ef\u83b7\u8d60 <strong>10 \u5143\u4f53\u9a8c\u91d1<\/strong>\uff0c\u76f4\u63a5\u62b5\u6263 RTX 4090 \/ A100 \u7b49\u673a\u578b\u8d39\u7528\uff0c<strong>\u9a6c\u4e0a\u524d\u5f80<\/strong> <a href=\"https:\/\/www.starverse-ai.com\">GPU\u4e91\u4e3b\u673a<\/a> \u5f00\u542f\u4f60\u7684\u957f\u7ebf\u63a8\u7406\u4e4b\u65c5\uff01<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u201c\u5f53\u4e0a\u4e0b\u6587\u957f\u5ea6\u7a81\u7834 100 \u4e07 token\uff0c\u8fde A100 8&hellip;<\/p>\n","protected":false},"author":2,"featured_media":3143,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-3145","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-zixun"],"views":114,"_links":{"self":[{"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/posts\/3145","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/comments?post=3145"}],"version-history":[{"count":0,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/posts\/3145\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/media\/3143"}],"wp:attachment":[{"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/media?parent=3145"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/categories?post=3145"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/tags?post=3145"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}