{"id":2563,"date":"2026-03-04T16:19:18","date_gmt":"2026-03-04T08:19:18","guid":{"rendered":"https:\/\/www.starverse-ai.com\/guide\/archives\/2563"},"modified":"2026-03-04T16:19:18","modified_gmt":"2026-03-04T08:19:18","slug":"deepspeed-zero-3-%e4%b8%87%e4%ba%bf%e6%a8%a1%e5%9e%8b%e8%ae%ad%e7%bb%83%e8%b8%a9%e5%9d%91%e8%ae%b0%ef%bc%9a%e4%b8%ba%e4%bb%80%e4%b9%88%e9%80%89%e5%af%b9-gpu%e4%ba%91%e4%b8%bb%e6%9c%ba-%e7%bd%91","status":"publish","type":"post","link":"https:\/\/www.starverse-ai.com\/guide\/archives\/2563","title":{"rendered":"DeepSpeed + Zero-3 \u4e07\u4ebf\u6a21\u578b\u8bad\u7ec3\u8e29\u5751\u8bb0\uff1a\u4e3a\u4ec0\u4e48\u9009\u5bf9 GPU\u4e91\u4e3b\u673a \u7f51\u7edc\u66f4\u91cd\u8981\uff1f"},"content":{"rendered":"<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/www.starverse-ai.com\/guide\/wp-content\/uploads\/2026\/03\/1772612358_bb184d.png\" alt=\"DeepSpeed + Zero-3 \u4e07\u4ebf\u6a21\u578b\u8bad\u7ec3\u8e29\u5751\u8bb0\uff1a\u4e3a\u4ec0\u4e48\u9009\u5bf9 GPU\u4e91\u4e3b\u673a \u7f51\u7edc\u66f4\u91cd\u8981\uff1f\" style=\"display:block; margin:10px auto; max-width:100%; height:auto;\" \/><\/figure>\n<blockquote>\n<p>\u201c1750 \u4ebf\u53c2\u6570\u7684\u6a21\u578b\uff0c\u5355\u673a 8\u00d7A100 \u8bad\u7ec3 3 \u5929\uff0cAll-Reduce \u540c\u6b65\u4e00\u6b21\u5374\u8981 47 \u79d2\uff1f\u8fd9\u4e0d\u662f\u7b97\u6cd5\u95ee\u9898\uff0c\u662f\u7f51\u7edc\u5728\u62d6\u540e\u817f\u3002\u201d<br \/>\n\u2014\u2014 \u67d0\u5934\u90e8\u5927\u6a21\u578b\u56e2\u961f\u5185\u90e8\u590d\u76d8\u7eaa\u8981<\/p>\n<\/blockquote>\n<p>\u8fc7\u53bb\u4e00\u5e74\uff0c\u4e07\u4ebf\u7ea7\u53c2\u6570\u6a21\u578b\u4ece PPT \u8d70\u8fdb repo\uff0cDeepSpeed+ZeRO-3 \u628a\u663e\u5b58\u62c6\u5f97\u6bd4\u62ab\u8428\u8fd8\u8584\uff0c\u5374\u8ba9\u300c\u901a\u4fe1\u5899\u300d\u66b4\u9732\u5f97\u66f4\u5f7b\u5e95\uff1a\u68af\u5ea6\u5207\u7247\u5728 10 Gbps \u4ee5\u592a\u7f51\u4e0a\u6765\u56de\u642c\u8fd0\uff0cGPU \u7a7a\u8f6c 30% \u65f6\u95f4\uff0c\u7b97\u529b\u70e7\u94b1\u53d8\u6210\u201c\u7f51\u7edc\u70e7\u94b1\u201d\u3002\u6211\u4eec\u8e29\u5b8c\u5751\u624d\u53d1\u73b0\uff0c\u9009\u5bf9 GPU\u4e91\u4e3b\u673a \u7f51\u7edc\uff0c\u6bd4\u591a\u4e70\u4e24\u5f20\u5361\u66f4\u5212\u7b97\u3002<\/p>\n<hr \/>\n<h2>1. \u5927\u6a21\u578b\u5e76\u884c\u8bad\u7ec3 IO \u74f6\u9888\u5b9e\u5f55\uff1a\u5e26\u5bbd 10Gbps\u219250Gbps \u541e\u5410\u5bf9\u6bd4<\/h2>\n<p>\u5728\u516c\u6709\u4e91\u5e38\u89c1 10 Gbps \u7ecf\u5178\u7f51\u7edc\u73af\u5883\uff0c\u7528 DeepSpeed-3D \u5e76\u884c\u8bad\u7ec3 175 B \u6a21\u578b\uff0c\u5b9e\u6d4b\u6570\u636e\u5982\u4e0b\uff1a<\/p>\n<table>\n<thead>\n<tr>\n<th>\u9636\u6bb5<\/th>\n<th>10 Gbps \u7ecf\u5178\u7f51\u7edc<\/th>\n<th>50 Gbps \u79c1\u6709\u7f51\u7edc<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>All-Reduce \u8017\u65f6\/\u6b65<\/td>\n<td>47 s<\/td>\n<td>9.8 s<\/td>\n<\/tr>\n<tr>\n<td>GPU \u5229\u7528\u7387<\/td>\n<td>62 %<\/td>\n<td>93 %<\/td>\n<\/tr>\n<tr>\n<td>\u6709\u6548\u541e\u5410 (token\/s\/GPU)<\/td>\n<td>1.2 k<\/td>\n<td>4.7 k<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>\u5e26\u5bbd\u7ffb 5 \u500d\uff0c\u8bad\u7ec3\u65f6\u95f4\u76f4\u63a5\u7f29\u77ed 4.8 \u500d\u2014\u2014\u663e\u5b58\u4e0d\u53d8\uff0c\u5361\u6570\u4e0d\u53d8\uff0c\u53ea\u662f\u628a\u300cGPU\u670d\u52a1\u5668\u79df\u7528\u300d\u65f6\u987a\u624b\u52fe\u9009\u4e86\u66f4\u9ad8\u7f51\u7edc\u89c4\u683c\uff0c\u5c31\u6361\u5230 4\u00d7 \u7b97\u529b\u7ea2\u5229\u3002\u53ef\u89c1\uff0cIO \u624d\u662f\u5927\u6a21\u578b\u7b2c\u4e00\u751f\u4ea7\u529b\u3002<\/p>\n<hr \/>\n<h2>2. \u661f\u5b87\u667a\u7b97 RDMA 200Gbps \u96c6\u7fa4\uff0cAll-Reduce \u5ef6\u8fdf &lt; 2\u03bcs<\/h2>\n<p>\u661f\u5b87\u667a\u7b97\u65b0\u4e00\u4ee3 GPU\u4e91\u4e3b\u673a \u5168\u90e8\u63a5\u5165 <strong>RoCEv2 RDMA \u4e92\u8054<\/strong>\uff0c\u5355\u673a 8\u00d7RTX 4090 \u6216 A100 \u901a\u8fc7 200 Gbps \u65e0\u963b\u585e\u4e0a\u8054\uff0cAll-Reduce \u5185\u6838\u5ef6\u8fdf\u538b\u5230 2 \u5fae\u79d2\u4ee5\u5185\uff1b\u5bf9\u6bd4\u4f20\u7edf 50 Gbps TCP\uff0c\u96c6\u4f53\u901a\u4fe1\u65f6\u95f4\u518d\u964d 65%\u3002<br \/>\n\u8fd9\u610f\u5473\u7740\uff1a<br \/>\n&#8211; \u540c\u6837 175 B \u6a21\u578b\uff0c\u8282\u70b9\u6570\u53ef\u51cf\u5c11 30 %\uff1b<br \/>\n&#8211; \u5343\u5361\u89c4\u6a21\u4e0b\uff0c\u6bcf\u6b65\u540c\u6b65 &lt; 150 ms\uff0c\u68af\u5ea6\u7d2f\u79ef\u7a97\u53e3\u66f4\u5927\uff0c\u6536\u655b\u66f4\u7a33\uff1b<br \/>\n&#8211; \u591a\u79df\u9694\u79bb+ECN \u6d41\u63a7\uff0c\u4fdd\u8bc1\u4f60\u7684\u8bad\u7ec3\u6d41\u4e0d\u88ab\u4eba\u201c\u5077\u201d\u5e26\u5bbd\u3002  <\/p>\n<p>\u4e00\u53e5\u8bdd\uff0c<strong>\u628a\u7f51\u7edc\u5f53\u663e\u5b58\u7528<\/strong>\uff0c\u624d\u662f\u5927\u6a21\u578b\u65f6\u4ee3\u7684\u6b63\u786e\u59ff\u52bf\u3002<\/p>\n<hr \/>\n<h2>3. \u5e73\u53f0\u955c\u50cf\u5185\u7f6e DeepSpeed \u81ea\u52a8\u8c03\u4f18\uff0c\u4e00\u952e\u5373\u73a9<\/h2>\n<p>\u672c\u5730\u88c5\u73af\u5883\u5e38\u8e29\u7684\u5751\uff1aNCCL \u7248\u672c\u4e0d\u5bf9\u3001GDR \u672a\u5f00\u542f\u3001PCIe \u62d3\u6251\u4e71\u6389\u2026\u2026\u661f\u5b87\u667a\u7b97\u628a\u5b98\u65b9 DeepSpeed \u955c\u50cf\u505a\u6210\u300c\u5f00\u673a\u5373\u8bad\u7ec3\u300d\uff1a<br \/>\n&#8211; \u7cfb\u7edf\u76d8\u9884\u88c5 CUDA 12.1\u3001PyTorch 2.1\u3001DeepSpeed 0.12\uff0cNCCL \u5df2\u6253 RDMA \u8865\u4e01\uff1b<br \/>\n&#8211; \u542f\u52a8\u811a\u672c\u81ea\u52a8\u8bfb\u53d6 <code>\/etc\/rdma\/network_topology.json<\/code>\uff0c\u5e2e\u4f60\u5199\u597d <code>ds_config<\/code> \u91cc\u7684 <code>reduce_bucket_size<\/code> \u4e0e <code>stage3_prefetch_bucket_size<\/code>\uff1b<br \/>\n&#8211; \u591a\u673a\u8bad\u7ec3\u65f6\uff0c\u5e73\u53f0\u6839\u636e\u5b9e\u4f8b\u540d\u81ea\u52a8\u751f\u6210 <code>hostfile<\/code>\uff0cSSH \u514d\u5bc6\u5df2\u914d\u597d\uff0c\u771f\u6b63\u505a\u5230\u300c\u79df\u673a\u5668-\u8dd1\u4ee3\u7801\u300d\u4e24\u6b65\u5230\u4f4d\u3002  <\/p>\n<p>\u5bf9\u4e8e\u53ea\u60f3\u300cAI\u5e94\u7528\u300d\u5feb\u901f\u843d\u5730\u7684\u5f00\u53d1\u8005\uff0c\u955c\u50cf\u5e02\u573a\u8fd8\u63d0\u4f9b Stable Diffusion\u3001ChatGLM-6B\u3001CodeLlama \u7b49\u4e00\u952e\u5305\uff0c\u76f4\u63a5\u4ee5 Gradio \u5f62\u5f0f\u66b4\u9732 7860 \u7aef\u53e3\uff0c\u5341\u5206\u949f\u5c31\u80fd\u5bf9\u5916\u670d\u52a1\u3002<\/p>\n<hr \/>\n<h2>4. \u6210\u672c\u66f2\u7ebf\uff1a\u540c\u6837 175B \u6a21\u578b\uff0c\u666e\u901a\u4e91 28 \u4e07\u5143 \u2192 \u661f\u5b87 16 \u4e07\u5143<\/h2>\n<p>\u6211\u4eec\u4ee5\u8bad\u7ec3 300 B token\u3001DeepSpeed-ZeRO-3\u30011024\u00d74090 \u5361\u65f6\u4e3a\u4f8b\u7b97\u7b14\u8d26\uff1a<\/p>\n<table>\n<thead>\n<tr>\n<th>\u9879\u76ee<\/th>\n<th>\u666e\u901a\u4e91 10 Gbps<\/th>\n<th>\u661f\u5b87\u667a\u7b97 200 Gbps RDMA<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>GPU \u5355\u4ef7<\/td>\n<td>2.0 \u5143\/\u5361\u65f6<\/td>\n<td>1.2 \u5143\/\u5361\u65f6<\/td>\n<\/tr>\n<tr>\n<td>\u7f51\u7edc\u8d39\u7528<\/td>\n<td>0<\/td>\n<td>\u5df2\u542b\u5728\u5361\u65f6\u8d39<\/td>\n<\/tr>\n<tr>\n<td>\u603b\u5361\u65f6<\/td>\n<td>1024 \u00d7 292 h<\/td>\n<td>1024 \u00d7 175 h<\/td>\n<\/tr>\n<tr>\n<td>\u603b\u8d39\u7528<\/td>\n<td>\u2248 28.2 \u4e07\u5143<\/td>\n<td>\u2248 16.1 \u4e07\u5143<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>\u7f51\u7edc\u63d0\u901f\u540e\uff0c\u8bad\u7ec3\u6b65\u6570\u540c\u6bd4\u51cf\u5c11 40 %\uff0c\u76f4\u63a5\u7701\u4e0b 12 \u4e07\u3002\u5982\u679c\u518d\u53e0\u52a0\u5e73\u53f0\u300c\u5348\u591c\u5f39\u6027\u300d1.5 \u6298\u8d44\u6e90\uff0c<strong>\u6210\u672c\u53ef\u518d\u8170\u65a9<\/strong>\u3002\u9ad8\u6821\u53ca\u521d\u521b\u56e2\u961f\u6ce8\u518c\u5373\u9001 10 \u5143\u4f53\u9a8c\u91d1\uff0c\u8db3\u591f\u514d\u8d39\u8dd1\u5b8c 6\u00d74090 \u4e00\u6574\u665a\u6d88\u878d\u5b9e\u9a8c\u3002<\/p>\n<hr \/>\n<h2>5. \u7f51\u7edc\u62d3\u6251\u56fe\u89e3 + \u6027\u80fd\u6d4b\u8bd5\u811a\u672c\u5f00\u6e90<\/h2>\n<p>\u4e3a\u4e86\u8ba9\u7ed3\u679c\u53ef\u590d\u73b0\uff0c\u6211\u4eec\u628a\u6d4b\u8bd5\u811a\u672c\u548c\u62d3\u6251\u56fe\u5b8c\u6574\u5f00\u6e90\uff1a<br \/>\n&#8211; GitHub \u4ed3\u5e93\uff1a<a href=\"https:\/\/github.com\/starverse-ai\/benchmark\">starverse-ai\/benchmark<\/a><br \/>\n&#8211; \u5305\u542b <code>ds_config_zero3.json<\/code>\u3001<code>all_reduce_perf.py<\/code>\u3001NCCL \u73af\u5883\u53d8\u91cf\u6a21\u677f\uff1b<br \/>\n&#8211; \u63d0\u4f9b\u57fa\u4e8e Ansible \u7684\u591a\u673a\u4e00\u952e\u6267\u884c playbook\uff0c5 \u5206\u949f\u5c31\u80fd\u5728\u4f60\u7684 GPU\u670d\u52a1\u5668\u79df\u7528 \u5b9e\u4f8b\u4e0a\u8dd1\u51fa\u540c\u6837\u66f2\u7ebf\u3002  <\/p>\n<p>\u5982\u679c\u4f60\u6b63\u5728\u89c4\u5212\u4e0b\u4e00\u6ce2\u5927\u6a21\u578b\u8bad\u7ec3\uff0c\u4e0d\u59a8\u5148\u62ff\u811a\u672c\u8dd1\u5206\uff0c\u518d\u51b3\u5b9a\u9009\u591a\u5c11\u5361\u3001\u7528\u54ea\u5bb6\u7684 GPU\u4e91\u4e3b\u673a\u3002\u6570\u636e\u4e0d\u4f1a\u8bf4\u8c0e\uff0c\u7f51\u7edc\u624d\u662f\u9690\u85cf\u6210\u672c\u7684\u5927\u5934\u3002<\/p>\n<hr \/>\n<h2>\u7ed3\u8bed\uff1a\u5148\u9009\u5bf9\u7f51\u7edc\uff0c\u518d\u8c08\u5927\u6a21\u578b\u68a6\u60f3<\/h2>\n<p>DeepSpeed \u628a\u663e\u5b58\u5899\u524a\u5e73\uff0c\u5374\u628a\u901a\u4fe1\u5899\u5792\u5f97\u66f4\u9ad8\u3002\u4e0e\u5176\u76f2\u76ee\u52a0\u5361\uff0c\u4e0d\u5982\u5148\u8ba9\u6570\u636e\u8dd1\u5728 200 Gbps \u7684\u300c\u9ad8\u901f\u516c\u8def\u300d\u4e0a\u3002\u661f\u5b87\u667a\u7b97\u7528 RDMA \u7f51\u7edc\u3001\u5f00\u7bb1\u5373\u7528\u7684 AI \u955c\u50cf\u548c\u4f4e\u81f3 1.2 \u5143\/\u5361\u65f6\u7684\u4ef7\u683c\uff0c\u628a\u4e07\u4ebf\u53c2\u6570\u8bad\u7ec3\u4ece\u201c\u8d35\u65cf\u9879\u76ee\u201d\u53d8\u6210\u201c\u5de5\u7a0b\u5e38\u89c4\u201d\u3002<br \/>\n\u73b0\u5728\u6ce8\u518c <a href=\"https:\/\/www.starverse-ai.com\">\u661f\u5b87\u667a\u7b97<\/a>\uff0c\u5373\u53ef\u9886\u53d6 10 \u5143\u4f53\u9a8c\u91d1\uff0c0 \u6210\u672c\u9a8c\u8bc1\u4f60\u7684\u4e0b\u4e00\u4e2a AI\u5e94\u7528\u3002\u522b\u8ba9 10 Gbps \u7ecf\u5178\u7f51\u62d6\u4f4f 4090 \u7684\u7b97\u529b\uff0c\u5927\u6a21\u578b\u65f6\u4ee3\uff0c\u7f51\u7edc\u5148\u884c\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u201c1750 \u4ebf\u53c2\u6570\u7684\u6a21\u578b\uff0c\u5355\u673a 8\u00d7A100 \u8bad\u7ec3 3 \u5929\uff0c&hellip;<\/p>\n","protected":false},"author":2,"featured_media":2562,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-2563","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-zixun"],"views":46,"_links":{"self":[{"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/posts\/2563","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/comments?post=2563"}],"version-history":[{"count":0,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/posts\/2563\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/media\/2562"}],"wp:attachment":[{"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/media?parent=2563"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/categories?post=2563"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.starverse-ai.com\/guide\/wp-json\/wp\/v2\/tags?post=2563"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}